summaryrefslogtreecommitdiff
path: root/tex/lit-review/main.tex
blob: 8f5c5ceee23723c6d58d8245399e9d98e26f8d12 (about) (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
\documentclass{article}

\usepackage[a4paper]{geometry}
\usepackage{hyperref}
\usepackage{amsmath}

\title{INFO5993 \\ Literature Review}
\author{Carlo Zancanaro}

\begin{document}
\maketitle

\section{Introduction}

Static analysis is concerned with determining as much as possible about a
program from it without actually running the program. Being able to analyse the
program statically can provide a wealth of information about the program's
characteristics without executing it at all.

The uses for static analysis vary significantly, from program optimisation
(eg. constant propagation) to verification of program semantics and bug-finding.
In critical applications (for example: medial equipment and flight computers)
static analysis techniques are used to ensure that programs are free from errors
which would cause catastrophic failure.

Of particular use in static analysis is the determination of integral values
within a program. By determining which possible values any integer variables in
the program can take at each code point it is possible to detect a significant
number of errors relating to improper integral values (divide-by-zero,
array-out-of-bounds, etc.). Such errors are among the most common in real-world
software development, so their detection is of much use to software developers.

\section{Abstract Interpretation}

\subsection{General Framework}

Abstract interpretation is a technique introduced in \cite{CousotCousot77-1} in
which a program's semantics are examined within an abstract domain. This
abstract domain is a simplification of the concrete domain in which the program
actually executes, but is constructed in such a way as to retain some properties
of the original program for analysis. We perform our analysis by "executing" our
program within this abstract domain which we have constructed.

An example of an abstract domain is considering numbers as either positive
$(+)$, negative $(-)$, zero $(0)$, unknown $(\pm)$ or undefined $(?)$. Using this
abstraction we can still define some semantics which reflect the program itself
(as an example: $(+) \times (-) = (-)$, or the product of a positive number with
a negative number is a negative number). Each abstract domain must also contain
a "top" $(\top)$ and a "bottom" $(\bot)$ element. The existence of these two
elements turn our abstract domain into a complete lattice, which ensures some
helpful properties of our abstract domain.

By constructing appropriate abstract domains we can infer properties of the
program's concrete semantics which are relevant to our application. The above
abstract domain, for example, can be used to detect divide-by-zero errors
(although it is clear from the above that not all divide-by-zero errors will be
detected) or negative array-indexing (one form of an out of bounds error).

A number of abstract domains have been conceived and formalised in literature.
These are considered in the following subsection. An important factor to
consider in the following is that as abstract domains get more "detailed" they
become more computationally expensive to analyse.

\subsection{Abstract Domains}

Cousot and Cousot, in \cite{CousotCousot77-1} introduce an abstract domain
providing an upper and lower bound to each integral value. This domain allows
them to reason about numbers with greater precision than simply considering
positive/negative/zero.

The abstract domain of "zones", are handled in \cite{mine:padoII}. These zones
provide a way to perform analysis on the bounds of variable differences (ie. to
place bounds as $a - b \leq c$) and are also referred to as Difference Bound
Matrices (which are used in other model-checking applications
\cite{Yovine98modelchecking}).

The domain of zones is then extended by the same author into the "Octagon
Abstract Domain" in \cite{mine:hosc06} which additionally allows bounds on the
sum of variables ($\pm a \pm b \leq c$).

All of these above domains are actually shown to be instances of the polyhedra
abstract domain in \cite{DBLP:conf/vmcai/SankaranarayananSM05}. The polyhedra
domain expresses bounds on $a_1 x_1 + a_2 x_2 ... + a_n x_n + c \ge 0$, where
$a_i$ are fixed in advance (a predefined polyhedral shape). This allows for
ranges to be expressed as $1 \times x_1 + c \ge 0$ provides a single bound for a
single variable (two such bounds is, of course, a range). Similarly for zones
one can define a polyhedra $1 \times x_1 + (-1) \times x_2 + c \ge 0$ which is
equivalent. The octagon domain is $(\pm 1) \times x_1 + (\pm 1) \times x_2 + c
\ge 0$.

\section{Fixpoints}

When performing static analysis there is a great need to perform some sort of
fixpoint solving to determine the possible resultant values for some operations.
When looping, for instance, it is necessary to "iterate" (in some sense) to find
what values each variable may take during a loop iteration.

The most basic method for doing so is simply to perform a complete iteration in
the abstract domain, but doing so has no guarantee of termination and would not
be expected to be performant. In \cite{CousotCousot77-1} an approach of
"widening" and "narrowing" is introduced.

This widening/narrowing method performs some fixed number of iteration steps.
If, after this fixed number, the iteration has not stabilised then it assumes
that it will not stabilise and will instead continue infinitely, so any affected
variables are "widened" to infinity. A "narrowing" step is then carried out
wherein variables are brought back into the finite domain by some terminating
condition on the loop.

This approach is quite general, and so is applicable to many instances of real
programs, but is a crude approximation of the reality of the program. In
specific instances it is possible to perform a better approximation. In
particular the case of monotonic, expansive operators is considered in \cite{}.  % something from thomas here
With this approach code points are converted into fixpoint equations with
$minimum$ and $maximum$ expressions. These fixpoint equations can then be solved
in order to give information about the program's abstract semantics. These
equations are solved using the technique of strategy iteration \cite{}  % something about strategy iteration here
combined with a fast fixpoint-iteration algorithm (such as the one presented in
\cite{DBLP:tr/trier/MI96-11}).

%Other test citation: \cite{DBLP:tr/trier/MI96-11}

\section{Conclusion}

\bibliographystyle{abbrv}
\bibliography{references.bib}
\end{document}