summaryrefslogtreecommitdiff
path: root/tex/thesis/litreview/litreview.tex
diff options
context:
space:
mode:
Diffstat (limited to 'tex/thesis/litreview/litreview.tex')
-rw-r--r--tex/thesis/litreview/litreview.tex213
1 files changed, 119 insertions, 94 deletions
diff --git a/tex/thesis/litreview/litreview.tex b/tex/thesis/litreview/litreview.tex
index 38bd9dc..983bb44 100644
--- a/tex/thesis/litreview/litreview.tex
+++ b/tex/thesis/litreview/litreview.tex
@@ -23,7 +23,7 @@ In order to perform any static analysis of a program it is necessary
to have some notion of the \emph{semantics}, or meaning, of a
program. The semantics of a program can be captured in several ways,
but for our purposes we consider \emph{denotational}, or
-\emph{mathematical}, \emph{semantics}\cite{SCOTT70B}.
+\emph{mathematical}, \emph{semantics} \cite{SCOTT70B}.
Denotational semantics views a program as a (partial) function built
up by the composition of many smaller functions. By expressing our
@@ -52,7 +52,7 @@ Denotational semantics provide a formal abstract framework for us to
analyse programs mathematically, but they do not provide a meaningful
way to perform automated analysis of programs. As they perfectly
capture the semantics of a program they do not avoid the consequences
-of Rice's Theorem\cite{Rice} or the Halting Problem\cite{Halting}
+of Rice's Theorem \cite{Rice} or the Halting Problem \cite{Halting}
which preclude us from performing precise analysis of programs in
general.
@@ -61,7 +61,7 @@ general.
In order to work around the limitations on precise static analysis in
general, Cousot et al.~ developed a framework for performing sound
program analysis called ``abstract
-interpretation''\cite{CousotCousot77-1}. In this framework we consider
+interpretation'' \cite{CousotCousot77-1}. In this framework we consider
a program's semantics under an abstraction. The original semantics we
call ``concrete semantics'' and the abstracted semantics we call
``abstract semantics''. The abstract semantics consist of replacing
@@ -110,8 +110,8 @@ abstract state. The function $\gamma: \states_\alpha \to 2^\states$
transforms an abstract state into a set of concrete states. We require
these two functions to satisfy the properties of a Galois connection:
\begin{align*}
- a &\le \gamma(b) & \alpha(a) &\le b & \forall a, b
- \subseteq \states.
+ a \le \gamma(b) &\iff \alpha(a) \le b & \forall a \subseteq
+ 2^\states, b \in \states_\alpha.
\end{align*}
\noindent As a consequence of this, the result of abstracting $a \in
\states$ must at least contain $a$. In particular we are looking for
@@ -122,75 +122,21 @@ Abstract interpretation follows on from the idea of denotational
semantics, and it is possible to determine the abstract semantics of
an operation based on its concrete semantics. We use
$\semantics{\cdot}^\#$ to represent the abstract semantics of a
-statement, and define the transformation as $\semantics{S}^\# = \alpha
+statement, and define the transformation as $\semantics{s}^\# = \alpha
\circ \semantics{s} \circ \gamma$.
-Much has been written on the topic of abstract domains, ranging from
-the relatively simple, such as signs and
-intervals\cite{CousotCousot77-1} to the more complex
-zones\cite{mine:padoII}, octagons\cite{mine:hosc06} and more general
-convex polyhedra\cite{DBLP:conf/vmcai/SankaranarayananSM05}. For this
-thesis we are only concerned with \emph{intervals} and \emph{zones}.
-
-%\todo{widening}
-
-\subsection{Intervals}
-The abstract domain of \emph{intervals} was introduced by Cousot et
-al.\cite{CousotCousot77-1} along with abstract interpretation. In this
-domain we constrain each variable $x$ to lie within a range $[a, b]
-\in \CZ^2$. This domain is finer than signs, so it allows more
-information about the program to be determined. Another equivalent
-formulation of intervals is that we are finding bounds such that
-\begin{align*}
- x &\le b \text{ and,} \\
- -x &\le a.
-\end{align*}
-%\noindent We are finding \emph{upper bounds} for $x$ and $-x$ at each
-%location in the program.
-
-
-\subsection{Zones}
-
-\emph{Zones} are an abstract domain introduced by
-Min\'e\cite{mine:padoII} which generalise intervals to include also
-bounding the differences between variables. For each variable $x$
-zones constrain the variable and its negation in the same way as intervals,
-\begin{align*}
- x &\le b \text{ and,} \\
- -x &\le a.
-\end{align*}
-Zones additionally provide us with bounds on \emph{differences}
-between variables. So for each pair of variables $(x,y)$, zones
-provide us with a bound
-\begin{align*}
- x - y & \le c.
-\end{align*}
-
-
-
-
-\section{Min/max problem}
-
-In \newcite{Gawlitza:2007:PFC:1762174.1762203} a conversion from a
-program's Control Flow Graph (henceforth CFG) to a system of equations
-involving only monotonic, expansive operators is presented. This
-conversion results in an equation system which can be solved in order
-to perform abstract interpretation over zones.
-
-\subsection{Partial-ordering Formulation}
-
-To begin our construction of the equation systems we need to formalise
-one of a CFG. We consider a CFG to be a tuple $G = (N, E, s, st)$. A
-finite set of \emph{control-points} $N$, a finite set $E \subseteq N
-\times N$ of \emph{edges} between these control points (ie. if $(u,v)
-\in E$ then an edge exists in the CFG from $u$ to $v$). A mapping $s:
-N \times N \to \Stmt$ taking each edge in the CFG to a
-\emph{statement} $\in \Stmt$. $st \in N$ denotes the special
+We can now formalise the notion of a Control Flow Graph (henceforth
+CFG) and define its abstraction. We consider a CFG to be a tuple $G =
+(N, E, s, st)$. A finite set of \emph{control-points} $N$, a finite
+set $E \subseteq N \times N$ of \emph{edges} between these control
+points (ie. if $(u,v) \in E$ then an edge exists in the CFG from $u$
+to $v$). A mapping $s: N \times N \to \Stmt$ taking each edge in the
+CFG to a \emph{statement} $\in \Stmt$. $st \in N$ denotes the special
\emph{start control-point}.
-We can now begin our construction of our abstract semantics $V^\#[v],
-\forall v \in N$.
+We denote our abstract semantics of a node in our CFG by $V^\#[v],
+\forall v \in N$ such that the following hold:
\begin{align}
V^\# [st] &\ge \alpha(\Z^n) \label{eqn:start-node} \\
V^\# [v] &\ge \llb s_{(u,v)} \rrb ^\# (V^\# [u]) & \forall (u,v) \in
@@ -226,24 +172,73 @@ information than each of the predecessors of $v$ in the CFG.
So, if we have the CFG fragment shown in Figure \ref{cfg:confluence}
then we know that $V^\#[v]$ must `contain', or be larger than, each of
the edges coming in to $v$. In this case there are two incoming edges
-to $v$: $(u,v)$ and $(w,v)$.
+to $v$: $(u,v)$ and $(w,v)$.
The edge $(u,v)$ leads to the state $\llb s_{(u,v)} \rrb^\#
-(V^\#[u])$. It is clear that $V^\#[v]$ must be either the same size
-or `larger' than this value, as it must either equal this constraint
-or it must be a looser constraint. This gives us the inequality
-\eqref{eqn:confluence}: $V^\#[v] \ge \llb s_{(u,v)}^\# \rrb
-(V^\#[u])$.
-
-Similarly by considering the edge $(w,v)$ and following the same argument
-we find that $V^\# [v] \ge \llb s_{(w,v)} \rrb ^\# (V^\# [w])$.
-
-For the start node in the CFG this inequality is obviously
-useless. The start node has no predecessors (by definition) and thus
-must have a value that is not dependent on other nodes. We can,
-therefore, define the start node's possible abstract states to be at
-least as big as the complete set of abstract states (as in
-\eqref{eqn:start-node}).
+(V^\#[u])$. It is clear that $V^\#[v]$ must be either the same size or
+`larger' than this value, as it must either equal this constraint or
+it must be a looser constraint. This gives us the inequality
+\eqref{eqn:confluence}: $V^\#[v] \ge \llb s_{(u,v)} \rrb ^\#
+(V^\#[u])$. Similarly by considering the edge $(w,v)$ and following
+the same argument we find that $V^\# [v] \ge \llb s_{(w,v)} \rrb ^\#
+(V^\# [w])$.
+
+This analysis thus far has avoided defining any particular abstract
+domain, but in practice we must choose an abstract domain before
+performing our analysis. Much has been written on the topic of
+abstract domains, ranging from the relatively simple, such as signs
+and intervals \cite{CousotCousot77-1} to the more complex zones
+\cite{mine:padoII}, octagons \cite{mine:hosc06} and more general
+convex polyhedra \cite{DBLP:conf/vmcai/SankaranarayananSM05}. For this
+thesis we are only concerned with \emph{intervals} and \emph{zones}.
+
+\subsection{Intervals}
+
+The abstract domain of \emph{intervals} was introduced by Cousot et
+al. \cite{CousotCousot77-1} along with abstract interpretation. In this
+domain we constrain each variable $x$ to lie within a range $[a, b]
+\in \CZ^2$. This domain is finer than signs, so it allows more
+information about the program to be determined. Another equivalent
+formulation of intervals is that we are finding bounds such that
+\begin{align*}
+ x &\le b \text{ and,} \\
+ -x &\le -a.
+\end{align*}
+%\noindent We are finding \emph{upper bounds} for $x$ and $-x$ at each
+%location in the program.
+
+
+\subsection{Zones}
+
+\emph{Zones} are an abstract domain introduced by Min\'e
+\cite{mine:padoII} which generalise intervals to include also bounding
+the differences between variables. Min\'e's work on zones builds off
+work done in the model checking of timed automata
+\cite{yovine-zones}. For each variable $x$ zones constrain the
+variable and its negation in the same way as intervals,
+\begin{align*}
+ x &\le b \text{ and,} \\
+ -x &\le a.
+\end{align*}
+Zones additionally provide us with bounds on \emph{differences}
+between variables. So for each pair of variables $(x,y)$, zones
+provide us with a bound
+\begin{align*}
+ x - y & \le c.
+\end{align*}
+
+
+
+
+\section{Min/max problem}
+
+In \cite{Gawlitza:2007:PFC:1762174.1762203} a conversion from a
+program's CFG to a system of equations involving only monotonic,
+expansive operators is presented. This conversion results in an
+equation system which can be solved in order to perform abstract
+interpretation over zones.
+
+\subsection{Partial-ordering Formulation}
\begin{example}
\label{example:partial-ordered-construction}
@@ -281,7 +276,7 @@ least as big as the complete set of abstract states (as in
\end{figure}
We now attempt to derive a system of inequalities from this for the
- \emph{least upper bound}
+ \emph{least upper bound} of $x$ at each program point.
\begin{align*}
ub: {} & ~\Vars \to \CZ \\
ub(x) &= \min\{v \in \CZ ~|~ x \le v\}.
@@ -303,7 +298,16 @@ least as big as the complete set of abstract states (as in
The edge from $A$ to $B$ is very simple. It has no constraint and
sets the value of $x$ to $1$. It corresponds to an equation of the
- form $ub(x_B) \ge 1$.
+ form $ub(x_B) \ge 1$. However, there is also the possibility that
+ $ub(x_A)$ will be unreachable, in which case $ub(x_B)$ will also be
+ unreachable. This makes the inequality
+ \begin{align*}
+ ub(x_B) \ge
+ \left\{\begin{array}{lc}
+ 1 & \text{if } ub(u_A) > -\infty \\
+ -\infty & \text{otherwise}
+ \end{array}\right.
+ \end{align*}
The edge from $B$ to itself is somewhat more complex. To begin with
it changes the value of $x$ depending on the previous value of
@@ -334,11 +338,25 @@ least as big as the complete set of abstract states (as in
ub(x_B), ub(x_C)$ subject to these constraints:
\begin{align*}
ub(x_A) &\ge \infty \\
- ub(x_B) &\ge 1 \\
+ ub(x_B) &\ge
+ \left\{\begin{array}{lc}
+ 1 & \text{if } ub(u_A) > -\infty \\
+ -\infty & \text{otherwise}
+ \end{array}\right. \\
ub(x_B) &\ge \min(ub(x_B), 9) + 1 \\
ub(x_C) &\ge ub(x_B)
\end{align*}
+ The smallest values for $ub(x_A), ub(x_B), ub(x_C)$ which satisfy
+ these constraints are:
+ \begin{align*}
+ ub(x_A) &= \infty \\
+ ub(x_B) &= 10 \\
+ ub(x_C) &= 10
+ \end{align*}
+ So, at program point $A$: $x \le \infty$; at $B$: $x \le 10$ and at
+ $C$: $x \le 10$.
+
\end{example}
@@ -347,8 +365,8 @@ least as big as the complete set of abstract states (as in
The partial-ordering formulation considered above is a helpful
presentation of the problem in terms of constraints, but an equivalent
-system can be constructed to aid us in locating the \emph{least}
-solution to the constraint system.
+equation system can be constructed to aid us in locating the
+\emph{least} solution to the constraint system.
For every partial-ordering on a set there exists an equivalent
lattice, defined in terms of a meet operator ($\vee$) and a join
@@ -372,10 +390,10 @@ to the following:
The two constraints on the upper bound of $x_B$ can be combined into
one constraint by taking the maximum of them. This is because $x \ge
\max(a, b) ~\implies~ x \ge a \text{ and } x \ge b$. The
-Knaster-Tarski Fixpoint
-%We change from using $\ge$ in our constraints to using $=$ in our
-%equation system as a consequence of the Knaster-Tarski Fixpoint
-%Theorem\cite{Knaster-Tarski}.
+Knaster-Tarski Fixpoint Theorem \cite{Knaster-Tarski} permits us to
+replace our inequalities with equalities, thereby forming a system of
+equations which can be solved for the least solution to the earlier
+constraint system.
@@ -575,6 +593,13 @@ Knaster-Tarski Fixpoint
\end{align*}
\end{example}
+\section{Min-cost flow}
+
+This matrix fomulation sufficient to calculate bounds on individual
+program variables, as in the interval domain, but in doing so there is
+a significant loss of precision. In order to reclaim some of this
+precision we can use a relational domain.
+