1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
|
\documentclass{article}
\usepackage[a4paper]{geometry}
\usepackage{hyperref}
\usepackage{amsmath}
\title{INFO5993 \\ Literature Review}
\author{Carlo Zancanaro}
\begin{document}
\maketitle
\section{Introduction}
Static analysis is concerned with determining as much as possible about a
program from it without actually running the program. Being able to analyse the
program statically can provide a wealth of information about the program's
characteristics without executing it at all.
The uses for static analysis vary significantly, from program optimisation
(eg. constant propagation) to verification of program semantics and bug-finding.
In critical applications (for example: medial equipment and flight computers)
static analysis techniques are used to ensure that programs are free from errors
which would cause catastrophic failure.
Of particular use in static analysis is the determination of integral values
within a program. By determining which possible values any integer variables in
the program can take at each code point it is possible to detect a significant
number of errors relating to improper integral values (divide-by-zero,
array-out-of-bounds, etc.). Such errors are among the most common in real-world
software development, so their detection is of much use to software developers.
\section{Abstract Interpretation}
\subsection{General Framework}
Abstract interpretation is a technique introduced in \cite{CousotCousot77-1} in
which a program's semantics are examined within an abstract domain. This
abstract domain is a simplification of the concrete domain in which the program
actually executes, but is constructed in such a way as to retain some properties
of the original program for analysis. We perform our analysis by "executing" our
program within this abstract domain which we have constructed.
An example of an abstract domain is considering numbers as either positive
$(+)$, negative $(-)$, zero $(0)$, unknown $(\pm)$ or undefined $(?)$. Using this
abstraction we can still define some semantics which reflect the program itself
(as an example: $(+) \times (-) = (-)$, or the product of a positive number with
a negative number is a negative number). Each abstract domain must also contain
a "top" $(\top)$ and a "bottom" $(\bot)$ element. The existence of these two
elements turn our abstract domain into a complete lattice, which ensures some
helpful properties of our abstract domain, including the existence of solutions
to fixpoint-equations (\cite{tarski}).
By constructing appropriate abstract domains we can infer properties of the
program's concrete semantics which are relevant to our application. The above
abstract domain, for example, can be used to detect divide-by-zero errors
(although it is clear from the above that not all divide-by-zero errors will be
detected) or negative array-indexing (one form of an out of bounds error).
A number of abstract domains have been conceived and formalised in literature.
These are considered in the following subsection. An important factor to
consider in the following is that as abstract domains get more "detailed" they
become more computationally expensive to analyse.
\subsection{Abstract Domains}
Cousot and Cousot, in \cite{CousotCousot77-1} introduce an abstract domain
providing an upper and lower bound to each integral value. This domain allows
them to reason about numbers with greater precision than simply considering
positive/negative/zero.
The abstract domain of "zones", are handled in \cite{mine:padoII}. These zones
provide a way to perform analysis on the bounds of variable differences (ie. to
place bounds as $a - b \leq c$) and are also referred to as Difference Bound
Matrices (which are used in other model-checking applications, such as
\cite{Yovine98modelchecking}).
The domain of zones is then extended by the same author into the "Octagon
Abstract Domain" in \cite{mine:hosc06} which additionally allows bounds on the
sum of variables ($\pm a \pm b \leq c$).
All of these above domains are actually shown to be instances of the polyhedra
abstract domain in \cite{DBLP:conf/vmcai/SankaranarayananSM05}. The polyhedra
domain expresses bounds on $a_1 x_1 + a_2 x_2 ... + a_n x_n + c \ge 0$, where
$a_i$ are fixed in advance (a predefined polyhedral shape). This allows for
ranges to be expressed as $1 \times x_1 + c \ge 0$ provides a single bound for a
single variable (two such bounds is, of course, a range). Similarly for zones
one can define a polyhedra $1 \times x_1 + (-1) \times x_2 + c \ge 0$ which is
equivalent. The octagon domain is $(\pm 1) \times x_1 + (\pm 1) \times x_2 + c
\ge 0$.
Non-integral abstract domains have also been investigated in papers such as
\cite{springerlink}, which considers the abstract domain
of strings with abstract values being regular expressions, but it seems the
majority of the literature is concerned with providing bounds for integral, or
at the very least numerical, expressions. The reasons for this are fairly
obvious when one considers the ubiquity of integers and the relative simplicity
of the numerical domain compared to many others.
\section{Fixpoints}
When performing static analysis there is a great need to perform some sort of
fixpoint solving to determine the possible resultant values for some operations.
When looping, for instance, it is necessary to "iterate" (in some sense) to find
what values each variable may take during a loop iteration.
The most basic method for doing so is simply to perform a complete iteration in
the abstract domain, but doing so within the integers has no guarantee of
termination (due to the integers being an unbounded set) and would not be
expected to be performant. In \cite{CousotCousot77-1} an approach of "widening"
and "narrowing" is introduced.
This widening/narrowing method performs some fixed number of iteration steps.
If, after this fixed number, the iteration has not stabilised then it assumes
that it will not stabilise and will instead continue infinitely, so any affected
variables are "widened" to infinity. A "narrowing" step is then carried out
wherein variables are brought back into the finite domain by some terminating
condition on the loop.
This approach is quite general, and so is applicable to many instances of real
programs, but is a crude approximation of the reality of the program (albeit,
with surprisingly good results). In specific instances it is possible to perform
a better approximation. In particular the case of addition, multiplication (by
positive constants), minimum and maximum is considered in
\cite{Gawlitza:2007:PFC:1762174.1762203}. With this approach code points are
converted into fixpoint equations with $minimum$ and $maximum$ expressions.
These fixpoint equations can then be solved in order to give information about
the program's abstract semantics. These equations are solved using a form of
policy iteration (referred to as strategy iteration) over the $maximum$
expressions present in each term. A second step consisting of ordinary
fixpoint-iteration over the resulting $minimum$-expressions is then performed.
When fixpoint-iteration is known to terminate then it is possible to get
significant speed gains by processing each expression according to their
dependencies. The approach presented in \cite{DBLP:tr/trier/MI96-11}, in
particular, improves the real-world performance of their fixpoint-equation
solver. It should be noted, however, that such evaluation strategies can do
nothing to affect the worst-case complexity as it is always possible to
construct a system in which evaluation of each variable is necessary at every
stage until the system stabilises.
\section{Conclusion}
Static analysis is of great practical importance to real world developers. It
gives us the ability to create tools which automatically identify common errors
in programs due to developer errors and so allows us to provide greater
guarantees about the behaviour of our software.
Unfortunately complete static analysis of a program is an undecidable problem,
and so we must perform abstract interpretation of a program's semantics in order
to provide useful results of analysis. Many different abstract domains exist in
which to consider programs, with varying precision and computational complexity.
In particular the abstract domain of polyhedra (and the related domains of
zones and ranges) are of great use when performing analysis of a program's
integral semantics.
\bibliographystyle{abbrv}
\bibliography{references.bib}
\end{document}
|