diff options
Diffstat (limited to 'tex/thesis')
| -rw-r--r-- | tex/thesis/abstract/abstract.tex | 17 | ||||
| -rw-r--r-- | tex/thesis/contribution/contribution.tex | 255 | ||||
| -rw-r--r-- | tex/thesis/experiments/experiments.tex | 4 | ||||
| -rw-r--r-- | tex/thesis/introduction/introduction.tex | 65 | ||||
| -rw-r--r-- | tex/thesis/litreview/litreview.tex | 8 | ||||
| -rw-r--r-- | tex/thesis/thesis.tex | 2 | 
6 files changed, 199 insertions, 152 deletions
| diff --git a/tex/thesis/abstract/abstract.tex b/tex/thesis/abstract/abstract.tex index 2326f61..fc10f32 100644 --- a/tex/thesis/abstract/abstract.tex +++ b/tex/thesis/abstract/abstract.tex @@ -5,6 +5,23 @@ significant consequences. In order to limit the effect of these bugs  software engineers employ a wide variety of tests, but these are  insufficient to guarantee that software is bug free. +Guaranteeing that software is bug-free requires some sort of formal +verification. While it is possible to manually prove software correct, +it is a difficult and expensive process, usually requiring programs to +be written in a particular way and requiring a high level of skill on +the part of the proof writer. + +There has been considerable research interest in providing a method of +automatically producing these types of proofs, but due to Rice's +Theorem we know it is not possible, in general, to prove the +correctness of programs. + +By using Abstract Interpretation we can perform an imprecise analysis +of our program, allowing us to find some, but not all, classes of bugs +in general programs. This imperfect analysis is sufficient to provide +some level of assurance about a program's correctness while still +remaining a practical approach. +  In this thesis we present an implementation of a game-theoretic static  analyser described in \cite{EasyChair:117} as a method of finding bugs  and guaranteeing correctness. We present our own enhancements to diff --git a/tex/thesis/contribution/contribution.tex b/tex/thesis/contribution/contribution.tex index fac83c9..9e90749 100644 --- a/tex/thesis/contribution/contribution.tex +++ b/tex/thesis/contribution/contribution.tex @@ -57,8 +57,8 @@ In this chapter we will begin by presenting the Work-list Depth First  Search (W-DFS) fixpoint algorithm developed by Seidl et  al.\cite{DBLP:tr/trier/MI96-11}. We will then discuss a modification  of the algorithm to allow it to perform $\max$-strategy iteration. The -chapter will conclude with our Local Demand-driven Strategy -Improvement (LDSI) algorithm to perform efficient strategy iteration +chapter will conclude with our Demand-driven Strategy +Improvement (DSI) algorithm to perform efficient strategy iteration  and fixpoint iteration. % TODO: fix me, or something  The existing algorithm as presented in Section @@ -355,10 +355,6 @@ values, as they may change their value if re-evaluated (and hence are  not certainly stable). If these values change then they, too, will  destabilise their dependent variables, and so on. -The W-DFS algorithm provides us with a \emph{local}, -\emph{demand-driven} solver for greatest fixpoints of monotonic -fixpoint problems. -  \begin{figure}[tbphH]    \begin{tikzpicture}      \node[draw] (full) at (0,-2) { @@ -415,9 +411,7 @@ other subsystem. To compute the value of $x$ in the greatest fixpoint  it is unnecessary to compute the value of the variable $d$, but it is  necessary to compute the value of $y$. The W-DFS algorithm, will only  evaluate the necessary subsystem for a requested variable, leaving the -other subsystem unevaluated. We call this algorithm \emph{local}, -because it considers only as many variables as are necessary to -compute the requested values. +other subsystem unevaluated. @@ -647,7 +641,7 @@ particular $\improve$ that we are using (see Listing \ref{algo:pmax})  satisfies this property using an anonymous function to delay the  computation. -\section{Local Demand-driven Strategy Improvement (LDSI)} +\section{Demand-driven Strategy Improvement (DSI)}  W-DFS speeds up both the $\max$-strategy iteration and the fixpoint  iteration as two independent processes, but each time we improve our @@ -659,8 +653,8 @@ parts of the system have changed we can jump in partway through the  process of finding a fixpoint, thereby avoiding repeating  calculations. -The new \emph{Local Demand-driven Strategy Improvement} algorithm, -\emph{LDSI}, seeks to do this. By adding an ``invalidate'' procedure +The new \emph{Demand-driven Strategy Improvement} algorithm, +\emph{DSI}, seeks to do this. By adding an ``invalidate'' procedure  to the fixpoint iteration we provide a method for the $\max$-strategy  to invalidate fixpoint variables, and we modify the fixpoint iteration  algorithm to report which variables it has modified with each fixpoint @@ -674,12 +668,6 @@ changed variables to determine which portions of the current strategy  are now unstable. This process continues until the $\max$-strategy  stabilises (at which point the fixpoint will also stabilise). -Both portions of the LDSI algorithm are \emph{demand driven}, and so -any necessary evaluation of strategies or fixpoint values is delayed -until the point when it is required. This means that the solver will -be \emph{local}, considering only the subset of variables necessary to -calculate the values we are interested in.. - @@ -753,7 +741,7 @@ This algorithm is presented in three parts.      \end{tabularx}      \EndGlobals    \end{algorithmic} -  \caption{Global variables used by the LDSI algorithm} +  \caption{Global variables used by the DSI algorithm}    \label{algo:ldsi:globals}  \end{algorithm} @@ -918,6 +906,114 @@ being stabilised. +The fixpoint iteration portion of the algorithm is concerned with +finding the greatest fixpoint, $\solnFP(s) : X \to D$ for the system $s += \sigma(\system)$. If we consider all valid fixpoints, $\fix(s) = \{ +\rho \mid \rho = s(\rho) \}$, then we find that $\solnFP(s) \in \fix(s)$ +and $\solnFP(s) \ge \rho ~ \forall \rho \in \fix(s)$. Our general +approach will be to consider our variable assignment $D \ge \solnFP(s)$ +and show that it decreases until $D \in \fix(s)$ and thus $D = +\solnFP(s)$. + +We have three invariants for the fixpoint portion of the DSI +algorithm. We use $\depFP^{+}(x)$ to be the set of all variables which +$x$ may depend on. $\depFP^{+}$ is defined as the transitive closure +of $\depFP$, $\depFP[x] = \{ y \in X \mid +x \in \inflFP[y] \}$. $\RFP \subseteq X$ is defined as the variables +currently on our recursion stack. +\begin{align} +  D &\ge \solnFP(s) \label{eqn:fp-inv1} \\ +% +  x &\in \inflFP[y] & \forall x \in \stableFP, y \in \depFP[x] +    \label{eqn:fp-inv3} \\ +% +  D &= \solnFP(s)[x] & \mbox{if } +      x \not \in \RFP, +      \depFP^{+}[x] \cap \RFP = \emptyset, +      x \in \stableFP +    \label{eqn:fp-inv2} +\end{align} + +(\ref{eqn:fp-inv1}) follows from monotonicity. We begin our iteration +at some variable assignment $D \ge \solnFP(s)$ and monotonically descend +from there. We know that $s(D)[x] \ge soln(s)[x] ~\forall D \ge +\solnFP(s), x \in X$, thus each evaluation of $s[x]$ in the context of +$D$ will result in a new value for $D[x]$ which is either closer to or +equal to $\solnFP(s)[x]$. As we only ever evaluate in the context of +variables taken from $D$, and $D[x]$ is set to $\infty$ when $x$ is +invalidated, we know that $D \ge \solnFP(s)$ holds. + +(\ref{eqn:fp-inv3}) follows from the use of the $\eval \fixpoint$ +function. Each variable lookup results in a direct insertion into +$\inflFP$, so during the evaluation of $x$ all relevant $\inflFP$ sets +are updated to reflect the current dependencies. + +For (\ref{eqn:fp-inv2}) we will examine what has happened when each of +the conditions is false: +\begin{itemize} +\item +  If $x \in \RFP$ then we are currently in the context of an earlier +  evaluation of $x$, so the value of $D[x]$ will be changed again by +  that earlier call. In addition, we conclude that $x \in +  \depFP^{+}[x]$ and thus, from the following condition, that $x$ will +  be re-evaluated at some point after this one. + +\item +  If $\depFP^{+}[x] \cap \RFP \ne \emptyset$ then a variable which $x$ +  depends on is in $\RFP$ at the moment. This will result in the +  re-evaluation of $x$ if the value of any of the variables in +  $\depFP^{+}[x] \cap \RFP$ have changed. + +\item +  If $x \not \in \stableFP$ then a re-evaluation of $x$ will occur +  with the next call to $\solve \fixpoint(x)$. This may result in a +  change to the value of $D[x]$, or it may leave $D[x]$ stable for the +  moment. If we denote the variable assignment before the evaluation +  of $x$ by $D_{\pre}$, and the variable assignment after the +  evaluation of $x$ by $D_{\post}$ then we find that $D_{\post} \le +  D_{\pre}$, due to the monotonicity of $s$. After each change of +  $D[x]$, everything which \emph{may} have depended on $D[x]$ is +  re-evaluated (see the previous two items), leading to a final point +  of stability when $D[x] = \solnFP(s)[x]$, by the definition of +  $\solnFP(s)[x]$ and (\ref{eqn:fp-inv1}). +\end{itemize} + +After an evaluation of $\solve \fixpoint(x)$ we know that $x \in +\stableFP$. If it is also the case that $\RFP = \emptyset$, as is the +case for top-level calls to $\solve \fixpoint$, then we know that +$D[x] = \solnFP(s)[x]$. This means that the function $\lambda x +. (\solve \fixpoint(x); D[x])$ will act as a variable assignment +solving for the greatest fixpoint of $s = \sigma(\system)$, as is +required by our $\max$-strategy iteration. As the $\max$-strategy +iteration changes $\sigma$ it also induces a change in $s = +\sigma(\system)$, so in order to maintain the correctness of this +algorithm we must show that our above invariants are maintained by +this process. + +When the $\max$-strategy iteration changes the current $\max$-strategy +$\sigma$ at the variable $x$ it changes the equation system $s = +\sigma(\system)$ that the fixpoint iteration uses. The $\max$-strategy +iteration then invalidates the affected portion of the fixpoint +iteration by doing two things: it removes the variable $x$ from the +$\stableFP$ set, and it sets $D[x] = \infty$. The invalidation of +variables is then propagated to each variable which transitively +depends on $x$: $\{ y \in X \mid x \in \depFP^{+}[y] \} = \{ y \in X +\mid y \in \inflFP^{+}[x] \}$, where $\inflFP^{+}$ is the transitive +closure of $\inflFP$. We know from (\ref{eqn:fp-inv3}) that this +transitive closure of $\inflFP$ will identify the entire subsystem +which may depend on the value of $x$. The invalidation of transitive +dependencies ensures that (\ref{eqn:fp-inv1}) holds for the changed +subsystem, as $\infty \ge z ~ \forall z \in \CZ$. From +(\ref{eqn:fp-inv1}) we can then conclude that (\ref{eqn:fp-inv2}) +holds as well, as the removal of $x$ from $\stableFP$ combined with +(\ref{eqn:fp-inv1}) leads to (\ref{eqn:fp-inv2}). These invariants now +hold for the affected sub-system and are still true for the unaffected +sub-system. Thus our invariants hold for our entire system and our +fixpoint iteration will continue to be correct in the presence of +invalidation by the $\max$-strategy iteration. + + +  \subsection{$\max$-strategy iteration} \label{sec:ldsi:max-strategy} @@ -1029,122 +1125,14 @@ Section \ref{sec:adapted-wdfs}. - -\subsection{Correctness} \label{sec:ldsi:correctness} - -The fixpoint iteration portion of the algorithm is concerned with -finding the greatest fixpoint, $\solnFP(s) : X \to D$ for the system $s -= \sigma(\system)$. If we consider all valid fixpoints, $\fix(s) = \{ -\rho \mid \rho = s(\rho) \}$, then we find that $\solnFP(s) \in \fix(s)$ -and $\solnFP(s) \ge \rho ~ \forall \rho \in \fix(s)$. Our general -approach will be to consider our variable assignment $D \ge \solnFP(s)$ -and show that it decreases until $D \in \fix(s)$ and thus $D = -\solnFP(s)$. - -We begin by stating three invariants of the LDSI algorithm. We use -$\depFP^{+}(x)$ to be the set of all variables which $x$ may depend -on. $\depFP^{+}$ is defined as the transitive closure of $\depFP$, -$\depFP[x] = \{ y \in X \mid x \in \inflFP[y] \}$. $\RFP \subseteq X$ -is defined as the variables currently on our recursion stack. -\begin{align} -  D &\ge \solnFP(s) \label{eqn:fp-inv1} \\ -% -  x &\in \inflFP[y] & \forall x \in \stableFP, y \in \depFP[x] -    \label{eqn:fp-inv3} \\ -% -  D &= \solnFP(s)[x] & \mbox{if } -      x \not \in \RFP, -      \depFP^{+}[x] \cap \RFP = \emptyset, -      x \in \stableFP -    \label{eqn:fp-inv2} -\end{align} - -(\ref{eqn:fp-inv1}) follows from monotonicity. We begin our iteration -at some variable assignment $D \ge \solnFP(s)$ and monotonically descend -from there. We know that $s(D)[x] \ge soln(s)[x] ~\forall D \ge -\solnFP(s), x \in X$, thus each evaluation of $s[x]$ in the context of -$D$ will result in a new value for $D[x]$ which is either closer to or -equal to $\solnFP(s)[x]$. As we only ever evaluate in the context of -variables taken from $D$, and $D[x]$ is set to $\infty$ when $x$ is -invalidated, we know that $D \ge \solnFP(s)$ holds. - -(\ref{eqn:fp-inv3}) follows from the use of the $\eval \fixpoint$ -function. Each variable lookup results in a direct insertion into -$\inflFP$, so during the evaluation of $x$ all relevant $\inflFP$ sets -are updated to reflect the current dependencies. - -For (\ref{eqn:fp-inv2}) we will examine what has happened when each of -the conditions is false: -\begin{itemize} -\item -  If $x \in \RFP$ then we are currently in the context of an earlier -  evaluation of $x$, so the value of $D[x]$ will be changed again by -  that earlier call. In addition, we conclude that $x \in -  \depFP^{+}[x]$ and thus, from the following condition, that $x$ will -  be re-evaluated at some point after this one. - -\item -  If $\depFP^{+}[x] \cap \RFP \ne \emptyset$ then a variable which $x$ -  depends on is in $\RFP$ at the moment. This will result in the -  re-evaluation of $x$ if the value of any of the variables in -  $\depFP^{+}[x] \cap \RFP$ have changed. - -\item -  If $x \not \in \stableFP$ then a re-evaluation of $x$ will occur -  with the next call to $\solve \fixpoint(x)$. This may result in a -  change to the value of $D[x]$, or it may leave $D[x]$ stable for the -  moment. If we denote the variable assignment before the evaluation -  of $x$ by $D_{\pre}$, and the variable assignment after the -  evaluation of $x$ by $D_{\post}$ then we find that $D_{\post} \le -  D_{\pre}$, due to the monotonicity of $s$. After each change of -  $D[x]$, everything which \emph{may} have depended on $D[x]$ is -  re-evaluated (see the previous two items), leading to a final point -  of stability when $D[x] = \solnFP(s)[x]$, by the definition of -  $\solnFP(s)[x]$ and (\ref{eqn:fp-inv1}). -\end{itemize} - -After an evaluation of $\solve \fixpoint(x)$ we know that $x \in -\stableFP$. If it is also the case that $\RFP = \emptyset$, as is the -case for top-level calls to $\solve \fixpoint$, then we know that -$D[x] = \solnFP(s)[x]$. This means that the function $\lambda x -. (\solve \fixpoint(x); D[x])$ will act as a variable assignment -solving for the greatest fixpoint of $s = \sigma(\system)$, as is -required by our $\max$-strategy iteration. As the $\max$-strategy -iteration changes $\sigma$ it also induces a change in $s = -\sigma(\system)$, so in order to maintain the correctness of this -algorithm we must show that our above invariants are maintained by -this process. - -When the $\max$-strategy iteration changes the current $\max$-strategy -$\sigma$ at the variable $x$ it changes the equation system $s = -\sigma(\system)$ that the fixpoint iteration uses. The $\max$-strategy -iteration then invalidates the affected portion of the fixpoint -iteration by doing two things: it removes the variable $x$ from the -$\stableFP$ set, and it sets $D[x] = \infty$. The invalidation of -variables is then propagated to each variable which transitively -depends on $x$: $\{ y \in X \mid x \in \depFP^{+}[y] \} = \{ y \in X -\mid y \in \inflFP^{+}[x] \}$, where $\inflFP^{+}$ is the transitive -closure of $\inflFP$. We know from (\ref{eqn:fp-inv3}) that this -transitive closure of $\inflFP$ will identify the entire subsystem -which may depend on the value of $x$. The invalidation of transitive -dependencies ensures that (\ref{eqn:fp-inv1}) holds for the changed -subsystem, as $\infty \ge z ~ \forall z \in \CZ$. From -(\ref{eqn:fp-inv1}) we can then conclude that (\ref{eqn:fp-inv2}) -holds as well, as the removal of $x$ from $\stableFP$ combined with -(\ref{eqn:fp-inv1}) leads to (\ref{eqn:fp-inv2}). These invariants now -hold for the affected sub-system and are still true for the unaffected -sub-system. Thus our invariants hold for our entire system and our -fixpoint iteration will continue to be correct in the presence of -invalidation by the $\max$-strategy iteration. - - -We move now to the $\max$-strategy iteration. We will use $\rho: X \to -\CZ$ as $\rho = \lambda x . (\solve \fixpoint(x); D[x])$, a variable +In the following brief analysis we use $\rho: X \to \CZ$ as $\rho += \lambda x . (\solve \fixpoint(x); D[x])$ to denote a variable  assignment which will always calculate the greatest fixpoint of  $\sigma(\system)$ for the current strategy $\sigma$. Each time $\rho$  is queried for a variable's value it will also record which variables  have had their values changed, whether or not those changed values are -final, in the set $\touchedFP$. +final, in the set $\touchedFP$. This represents the effect of the +fixpoint iteration portion of the DSI algorithm.  We can establish similar invariants for the $\max$-strategy iteration  as we had for the fixpoint iteration. We denote the optimal strategy @@ -1188,9 +1176,6 @@ the $\max$-strategy. If the variable has stabilised to the same value  as it had previously then it has not changed and thus has no effect on  the $\max$-strategy. -If both the $\max$-strategy iteration and the fixpoint iteration -correctly solve their respective portions of the problem, and the -communication between them is correct, then we know that the overall -algorithm is correct. +  \subsection{Implementation} \label{sec:ldsi:implementation} diff --git a/tex/thesis/experiments/experiments.tex b/tex/thesis/experiments/experiments.tex deleted file mode 100644 index cea409d..0000000 --- a/tex/thesis/experiments/experiments.tex +++ /dev/null @@ -1,4 +0,0 @@ -\chapter{Experiments} \label{chap:experiments} - -The experiments go here. - diff --git a/tex/thesis/introduction/introduction.tex b/tex/thesis/introduction/introduction.tex index ffc6cd2..17473af 100644 --- a/tex/thesis/introduction/introduction.tex +++ b/tex/thesis/introduction/introduction.tex @@ -36,9 +36,42 @@ in some way \emph{verify} the software before running it. While it is  possible to write programs in a way that is easier to verify, it is  still a difficult and expensive process to provide verification. -Recently there has been work done in automatically analysing programs -in order to identify bugs. This work is broadly classed \emph{static -  analysis} and this thesis contributes to this work. +There has been a continuous stream of research into automatically +analysing programs in order to identify bugs from the late 70s to the +present\cite{CousotCousot77-1,EasyChair:117}. This work is broadly +classed \emph{static analysis} and this thesis contributes to this +work. + +Due to Rice's Theorem\cite{Rice} we know that it is impossible to +perform a perfect static analysis in general. In order to overcome +this many approaches to static analysis have been developed, including +the framework of abstract interpretation presented by Cousot and +Cousot\cite{CousotCousot77-1}. In this framework we consider a program +in a simplified, \emph{abstract}, domain. This allows us to perform a +sound analysis in general at the expense of precision. + +Two abstract domains which are particularly well-known and useful are +those of \emph{interval} and \emph{zones}. The interval abstract +domain consists of restricting variables in the range $[a,b]$. This +domain can be useful for avoiding such program errors as division by +zero errors and out of bounds memory accesses. The abstract domain of +zones allows us to further place a bound on the difference between +variables $x - y \le c$. This allows us to be more precise in our +analysis, thereby avoiding false-positives, at the cost of speed. + +The process of determining appropriate bounds for program variables +has traditionally been performed by performing several Kleene +iterations, followed by a \emph{widening} and \emph{narrowing}. These +widening and narrowing operators ensure termination, but once again at +the cost of precision. As an alternative to these operators we can, in +some instances, use a game-theoretic approach to compute the abstract +semantics of a program\cite{EasyChair:117}. This approach is less +general than widening and narrowing, but performs precise abstract +interpretation over zones. + +In this thesis we present an implementation of this game-theoretic +algorithm, along with our own enhancements which improve its +performance on real-world data. @@ -56,7 +89,7 @@ Theoretical contribution:  \item    We present a demand-driven strategy improvement algorithm for    solving monotonic, expansive equation systems involving $\min$ and -  $\max$ operators +  $\max$ operators.  \end{enumerate}  Systems contribution: @@ -64,13 +97,29 @@ Systems contribution:  \item    We develop a solver for monotonic, expansive equation systems based    on the work of Gawlitza et al.\cite{EasyChair:117} with several -  improvements +  improvements.  \item    We analyse the performance of our improved solver on a set of -  equation systems to demonstrate the effect of our improvements +  equation systems to demonstrate the effect of our improvements.  \item    We integrate our solver into the LLVM framework to perform analysis -  over Zones\cite{mine:padoII} +  over Zones\cite{mine:padoII}.  \item -  We analyse the performance of our LLVM analysis on various program inputs +  We analyse the performance of our LLVM analysis on various program +  inputs.  \end{enumerate} + +\section{Structure of the thesis} + +In Chapter \ref{chap:litreview} we review the background literature in +the field of static analysis, laying the basis for the rest of the +thesis. %TODO: more + +We present our theoretical contribution in \ref{chap:contribution}: a +demand-driven strategy improvement algorithm. We present the algorithm +along with an evaluation of its performance on realistic problems. + +Our implementation is discussed in Chapter +\ref{chap:implementation}. We discuss several aspects of our +implementation before evaluating the results of our analysis on +several programs. diff --git a/tex/thesis/litreview/litreview.tex b/tex/thesis/litreview/litreview.tex index 109864e..38bd9dc 100644 --- a/tex/thesis/litreview/litreview.tex +++ b/tex/thesis/litreview/litreview.tex @@ -8,7 +8,7 @@  \newcommand{\Stmt}{\mathsf{Stmt}}  \newcommand{\Vars}{\mathsf{Vars}} -\chapter{Preliminaries} \label{chap:litreview} +\chapter{Background} \label{chap:litreview}  In this chapter we review the relevant literature pertaining to static  analysis. We briefly cover the semantics of programs generally before @@ -38,9 +38,9 @@ side-effects, so it is necessary to introduce some notion of ``state''  into our semantics. By defining $\semantics{\cdot}: \states \to  \states$ for all statements $x$ we provide a mechanism for statements  to perform side effects. Elements of $\states$ each represent a -possible state for the program to be in. For example: for an -integral assignment statement $I := E$, assigning the value of an -expression $E$ to a variable $I$, have the following denotation: +possible state for the program to be in. For example: for an integral +assignment statement $I := E$, assigning the value of an expression +$E$ to a variable $I$, we have the following denotation:  \begin{align*}    \semantics{I := E} = \lambda \rho . (\rho \oplus \{I \mapsto \semantics{E}(\rho)\})  \end{align*} diff --git a/tex/thesis/thesis.tex b/tex/thesis/thesis.tex index 9b57230..d959f81 100644 --- a/tex/thesis/thesis.tex +++ b/tex/thesis/thesis.tex @@ -145,7 +145,7 @@  \input{introduction/introduction.tex}  \input{litreview/litreview.tex}  \input{contribution/contribution.tex} -\input{experiments/experiments.tex} +\input{implementation/implementation.tex}  \input{conclusion/conclusion.tex} | 
