tex/thesis/introduction/introduction.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125

\chapter{Introduction}

\section{Motivation}

%Main points:
%  - software is taking on more important roles 
%  - software bugs are dangerous
%  - software bugs are expensive
%  - we have to minimise the number of bugs we have

In today's increasingly technological world, software bugs can have
significant consequences, ranging from the relatively minor
frustration caused to average users to causing deaths. There have been
several incidents in recent years in which a bug in a software system
has led directly to injury or death. X-ray machines which provide too
high a dose of radiation, cars which continue to accelerate against
the driver's wishes and other dangerous situations have all come about
as a direct result of software bugs.

Software bugs also have significant financial costs, with Hailpern and
Santhanam static that debugging, verification and testing can easily
comprise 50\% to 75\% of the total development cost of a
system\cite{Hailpern01softwaredebugging}.

In order to limit the number of bugs we have it has become commonplace
to employ sophisticated approaches to testing. Many different types of
testing are done to attempt to ensure that software is bug-free. These
tests, although extremely useful, are inherently incapable of
\emph{guaranteeing} that software is free of bugs. This is especially
a problem in critical software systems, such as those found in
aeroplanes or large industrial machinery, where software failure can
have catastrophic consequences.

In order to provide a guarantee that software is free of bugs we must
in some way \emph{verify} the software before running it. While it is
possible to write programs in a way that is easier to verify, it is
still a difficult and expensive process to provide verification.

There has been a continuous stream of research into automatically
analysing programs in order to identify bugs from the late 70s to the
present\cite{CousotCousot77-1,EasyChair:117}. This work is broadly
classed \emph{static analysis} and this thesis contributes to this
work.

Due to Rice's Theorem\cite{Rice} we know that it is impossible to
perform a perfect static analysis in general. In order to overcome
this many approaches to static analysis have been developed, including
the framework of abstract interpretation presented by Cousot and
Cousot\cite{CousotCousot77-1}. In this framework we consider a program
in a simplified, \emph{abstract}, domain. This allows us to perform a
sound analysis in general at the expense of precision.

Two abstract domains which are particularly well-known and useful are
those of \emph{interval} and \emph{zones}. The interval abstract
domain consists of restricting variables in the range $[a,b]$. This
domain can be useful for avoiding such program errors as division by
zero errors and out of bounds memory accesses. The abstract domain of
zones allows us to further place a bound on the difference between
variables $x - y \le c$. This allows us to be more precise in our
analysis, thereby avoiding false-positives, at the cost of speed.

The process of determining appropriate bounds for program variables
has traditionally been performed by performing several Kleene
iterations, followed by a \emph{widening} and \emph{narrowing}. These
widening and narrowing operators ensure termination, but once again at
the cost of precision. As an alternative to these operators we can, in
some instances, use a game-theoretic approach to compute the abstract
semantics of a program\cite{EasyChair:117}. This approach is less
general than widening and narrowing, but performs precise abstract
interpretation over zones.

In this thesis we present an implementation of this game-theoretic
algorithm, along with our own enhancements which improve its
performance on real-world data.


\section{Contribution}

In this thesis we present an implementation of the strategy-iteration
based static analyser presented by Gawlitza et
al.\cite{EasyChair:117}. Our implementation has several enhancements
which significantly improve the practical performance of the analyser
on real-world programs.

Theoretical contribution:
\begin{enumerate}
\item
  We present a demand-driven strategy improvement algorithm for
  solving monotonic, expansive equation systems involving $\min$ and
  $\max$ operators.
\end{enumerate}

Systems contribution:
\begin{enumerate}
\item
  We develop a solver for monotonic, expansive equation systems based
  on the work of Gawlitza et al.\cite{EasyChair:117} with several
  improvements.
\item
  We analyse the performance of our improved solver on a set of
  equation systems to demonstrate the effect of our improvements.
\item
  We integrate our solver into the LLVM framework to perform analysis
  over Zones\cite{mine:padoII}.
\item
  We analyse the performance of our LLVM analysis on various program
  inputs.
\end{enumerate}

\section{Structure of the thesis}

In Chapter \ref{chap:litreview} we review the background literature in
the field of static analysis, laying the basis for the rest of the
thesis. %TODO: more

We present our theoretical contribution in \ref{chap:contribution}: a
demand-driven strategy improvement algorithm. We present the algorithm
along with an evaluation of its performance on realistic problems.

Our implementation is discussed in Chapter
\ref{chap:implementation}. We discuss several aspects of our
implementation before evaluating the results of our analysis on
several programs.