From ab443f619d207564e4972274c571ef15da70a74c Mon Sep 17 00:00:00 2001 From: Carlo Zancanaro Date: Mon, 26 Nov 2012 20:18:42 +1100 Subject: Thesis thesis thesis. --- tex/thesis/abstract/abstract.tex | 17 +++ tex/thesis/contribution/contribution.tex | 255 +++++++++++++++---------------- tex/thesis/experiments/experiments.tex | 4 - tex/thesis/introduction/introduction.tex | 65 +++++++- tex/thesis/litreview/litreview.tex | 8 +- tex/thesis/thesis.tex | 2 +- 6 files changed, 199 insertions(+), 152 deletions(-) delete mode 100644 tex/thesis/experiments/experiments.tex diff --git a/tex/thesis/abstract/abstract.tex b/tex/thesis/abstract/abstract.tex index 2326f61..fc10f32 100644 --- a/tex/thesis/abstract/abstract.tex +++ b/tex/thesis/abstract/abstract.tex @@ -5,6 +5,23 @@ significant consequences. In order to limit the effect of these bugs software engineers employ a wide variety of tests, but these are insufficient to guarantee that software is bug free. +Guaranteeing that software is bug-free requires some sort of formal +verification. While it is possible to manually prove software correct, +it is a difficult and expensive process, usually requiring programs to +be written in a particular way and requiring a high level of skill on +the part of the proof writer. + +There has been considerable research interest in providing a method of +automatically producing these types of proofs, but due to Rice's +Theorem we know it is not possible, in general, to prove the +correctness of programs. + +By using Abstract Interpretation we can perform an imprecise analysis +of our program, allowing us to find some, but not all, classes of bugs +in general programs. This imperfect analysis is sufficient to provide +some level of assurance about a program's correctness while still +remaining a practical approach. + In this thesis we present an implementation of a game-theoretic static analyser described in \cite{EasyChair:117} as a method of finding bugs and guaranteeing correctness. We present our own enhancements to diff --git a/tex/thesis/contribution/contribution.tex b/tex/thesis/contribution/contribution.tex index fac83c9..9e90749 100644 --- a/tex/thesis/contribution/contribution.tex +++ b/tex/thesis/contribution/contribution.tex @@ -57,8 +57,8 @@ In this chapter we will begin by presenting the Work-list Depth First Search (W-DFS) fixpoint algorithm developed by Seidl et al.\cite{DBLP:tr/trier/MI96-11}. We will then discuss a modification of the algorithm to allow it to perform $\max$-strategy iteration. The -chapter will conclude with our Local Demand-driven Strategy -Improvement (LDSI) algorithm to perform efficient strategy iteration +chapter will conclude with our Demand-driven Strategy +Improvement (DSI) algorithm to perform efficient strategy iteration and fixpoint iteration. % TODO: fix me, or something The existing algorithm as presented in Section @@ -355,10 +355,6 @@ values, as they may change their value if re-evaluated (and hence are not certainly stable). If these values change then they, too, will destabilise their dependent variables, and so on. -The W-DFS algorithm provides us with a \emph{local}, -\emph{demand-driven} solver for greatest fixpoints of monotonic -fixpoint problems. - \begin{figure}[tbphH] \begin{tikzpicture} \node[draw] (full) at (0,-2) { @@ -415,9 +411,7 @@ other subsystem. To compute the value of $x$ in the greatest fixpoint it is unnecessary to compute the value of the variable $d$, but it is necessary to compute the value of $y$. The W-DFS algorithm, will only evaluate the necessary subsystem for a requested variable, leaving the -other subsystem unevaluated. We call this algorithm \emph{local}, -because it considers only as many variables as are necessary to -compute the requested values. +other subsystem unevaluated. @@ -647,7 +641,7 @@ particular $\improve$ that we are using (see Listing \ref{algo:pmax}) satisfies this property using an anonymous function to delay the computation. -\section{Local Demand-driven Strategy Improvement (LDSI)} +\section{Demand-driven Strategy Improvement (DSI)} W-DFS speeds up both the $\max$-strategy iteration and the fixpoint iteration as two independent processes, but each time we improve our @@ -659,8 +653,8 @@ parts of the system have changed we can jump in partway through the process of finding a fixpoint, thereby avoiding repeating calculations. -The new \emph{Local Demand-driven Strategy Improvement} algorithm, -\emph{LDSI}, seeks to do this. By adding an ``invalidate'' procedure +The new \emph{Demand-driven Strategy Improvement} algorithm, +\emph{DSI}, seeks to do this. By adding an ``invalidate'' procedure to the fixpoint iteration we provide a method for the $\max$-strategy to invalidate fixpoint variables, and we modify the fixpoint iteration algorithm to report which variables it has modified with each fixpoint @@ -674,12 +668,6 @@ changed variables to determine which portions of the current strategy are now unstable. This process continues until the $\max$-strategy stabilises (at which point the fixpoint will also stabilise). -Both portions of the LDSI algorithm are \emph{demand driven}, and so -any necessary evaluation of strategies or fixpoint values is delayed -until the point when it is required. This means that the solver will -be \emph{local}, considering only the subset of variables necessary to -calculate the values we are interested in.. - @@ -753,7 +741,7 @@ This algorithm is presented in three parts. \end{tabularx} \EndGlobals \end{algorithmic} - \caption{Global variables used by the LDSI algorithm} + \caption{Global variables used by the DSI algorithm} \label{algo:ldsi:globals} \end{algorithm} @@ -918,6 +906,114 @@ being stabilised. +The fixpoint iteration portion of the algorithm is concerned with +finding the greatest fixpoint, $\solnFP(s) : X \to D$ for the system $s += \sigma(\system)$. If we consider all valid fixpoints, $\fix(s) = \{ +\rho \mid \rho = s(\rho) \}$, then we find that $\solnFP(s) \in \fix(s)$ +and $\solnFP(s) \ge \rho ~ \forall \rho \in \fix(s)$. Our general +approach will be to consider our variable assignment $D \ge \solnFP(s)$ +and show that it decreases until $D \in \fix(s)$ and thus $D = +\solnFP(s)$. + +We have three invariants for the fixpoint portion of the DSI +algorithm. We use $\depFP^{+}(x)$ to be the set of all variables which +$x$ may depend on. $\depFP^{+}$ is defined as the transitive closure +of $\depFP$, $\depFP[x] = \{ y \in X \mid +x \in \inflFP[y] \}$. $\RFP \subseteq X$ is defined as the variables +currently on our recursion stack. +\begin{align} + D &\ge \solnFP(s) \label{eqn:fp-inv1} \\ +% + x &\in \inflFP[y] & \forall x \in \stableFP, y \in \depFP[x] + \label{eqn:fp-inv3} \\ +% + D &= \solnFP(s)[x] & \mbox{if } + x \not \in \RFP, + \depFP^{+}[x] \cap \RFP = \emptyset, + x \in \stableFP + \label{eqn:fp-inv2} +\end{align} + +(\ref{eqn:fp-inv1}) follows from monotonicity. We begin our iteration +at some variable assignment $D \ge \solnFP(s)$ and monotonically descend +from there. We know that $s(D)[x] \ge soln(s)[x] ~\forall D \ge +\solnFP(s), x \in X$, thus each evaluation of $s[x]$ in the context of +$D$ will result in a new value for $D[x]$ which is either closer to or +equal to $\solnFP(s)[x]$. As we only ever evaluate in the context of +variables taken from $D$, and $D[x]$ is set to $\infty$ when $x$ is +invalidated, we know that $D \ge \solnFP(s)$ holds. + +(\ref{eqn:fp-inv3}) follows from the use of the $\eval \fixpoint$ +function. Each variable lookup results in a direct insertion into +$\inflFP$, so during the evaluation of $x$ all relevant $\inflFP$ sets +are updated to reflect the current dependencies. + +For (\ref{eqn:fp-inv2}) we will examine what has happened when each of +the conditions is false: +\begin{itemize} +\item + If $x \in \RFP$ then we are currently in the context of an earlier + evaluation of $x$, so the value of $D[x]$ will be changed again by + that earlier call. In addition, we conclude that $x \in + \depFP^{+}[x]$ and thus, from the following condition, that $x$ will + be re-evaluated at some point after this one. + +\item + If $\depFP^{+}[x] \cap \RFP \ne \emptyset$ then a variable which $x$ + depends on is in $\RFP$ at the moment. This will result in the + re-evaluation of $x$ if the value of any of the variables in + $\depFP^{+}[x] \cap \RFP$ have changed. + +\item + If $x \not \in \stableFP$ then a re-evaluation of $x$ will occur + with the next call to $\solve \fixpoint(x)$. This may result in a + change to the value of $D[x]$, or it may leave $D[x]$ stable for the + moment. If we denote the variable assignment before the evaluation + of $x$ by $D_{\pre}$, and the variable assignment after the + evaluation of $x$ by $D_{\post}$ then we find that $D_{\post} \le + D_{\pre}$, due to the monotonicity of $s$. After each change of + $D[x]$, everything which \emph{may} have depended on $D[x]$ is + re-evaluated (see the previous two items), leading to a final point + of stability when $D[x] = \solnFP(s)[x]$, by the definition of + $\solnFP(s)[x]$ and (\ref{eqn:fp-inv1}). +\end{itemize} + +After an evaluation of $\solve \fixpoint(x)$ we know that $x \in +\stableFP$. If it is also the case that $\RFP = \emptyset$, as is the +case for top-level calls to $\solve \fixpoint$, then we know that +$D[x] = \solnFP(s)[x]$. This means that the function $\lambda x +. (\solve \fixpoint(x); D[x])$ will act as a variable assignment +solving for the greatest fixpoint of $s = \sigma(\system)$, as is +required by our $\max$-strategy iteration. As the $\max$-strategy +iteration changes $\sigma$ it also induces a change in $s = +\sigma(\system)$, so in order to maintain the correctness of this +algorithm we must show that our above invariants are maintained by +this process. + +When the $\max$-strategy iteration changes the current $\max$-strategy +$\sigma$ at the variable $x$ it changes the equation system $s = +\sigma(\system)$ that the fixpoint iteration uses. The $\max$-strategy +iteration then invalidates the affected portion of the fixpoint +iteration by doing two things: it removes the variable $x$ from the +$\stableFP$ set, and it sets $D[x] = \infty$. The invalidation of +variables is then propagated to each variable which transitively +depends on $x$: $\{ y \in X \mid x \in \depFP^{+}[y] \} = \{ y \in X +\mid y \in \inflFP^{+}[x] \}$, where $\inflFP^{+}$ is the transitive +closure of $\inflFP$. We know from (\ref{eqn:fp-inv3}) that this +transitive closure of $\inflFP$ will identify the entire subsystem +which may depend on the value of $x$. The invalidation of transitive +dependencies ensures that (\ref{eqn:fp-inv1}) holds for the changed +subsystem, as $\infty \ge z ~ \forall z \in \CZ$. From +(\ref{eqn:fp-inv1}) we can then conclude that (\ref{eqn:fp-inv2}) +holds as well, as the removal of $x$ from $\stableFP$ combined with +(\ref{eqn:fp-inv1}) leads to (\ref{eqn:fp-inv2}). These invariants now +hold for the affected sub-system and are still true for the unaffected +sub-system. Thus our invariants hold for our entire system and our +fixpoint iteration will continue to be correct in the presence of +invalidation by the $\max$-strategy iteration. + + + \subsection{$\max$-strategy iteration} \label{sec:ldsi:max-strategy} @@ -1029,122 +1125,14 @@ Section \ref{sec:adapted-wdfs}. - -\subsection{Correctness} \label{sec:ldsi:correctness} - -The fixpoint iteration portion of the algorithm is concerned with -finding the greatest fixpoint, $\solnFP(s) : X \to D$ for the system $s -= \sigma(\system)$. If we consider all valid fixpoints, $\fix(s) = \{ -\rho \mid \rho = s(\rho) \}$, then we find that $\solnFP(s) \in \fix(s)$ -and $\solnFP(s) \ge \rho ~ \forall \rho \in \fix(s)$. Our general -approach will be to consider our variable assignment $D \ge \solnFP(s)$ -and show that it decreases until $D \in \fix(s)$ and thus $D = -\solnFP(s)$. - -We begin by stating three invariants of the LDSI algorithm. We use -$\depFP^{+}(x)$ to be the set of all variables which $x$ may depend -on. $\depFP^{+}$ is defined as the transitive closure of $\depFP$, -$\depFP[x] = \{ y \in X \mid x \in \inflFP[y] \}$. $\RFP \subseteq X$ -is defined as the variables currently on our recursion stack. -\begin{align} - D &\ge \solnFP(s) \label{eqn:fp-inv1} \\ -% - x &\in \inflFP[y] & \forall x \in \stableFP, y \in \depFP[x] - \label{eqn:fp-inv3} \\ -% - D &= \solnFP(s)[x] & \mbox{if } - x \not \in \RFP, - \depFP^{+}[x] \cap \RFP = \emptyset, - x \in \stableFP - \label{eqn:fp-inv2} -\end{align} - -(\ref{eqn:fp-inv1}) follows from monotonicity. We begin our iteration -at some variable assignment $D \ge \solnFP(s)$ and monotonically descend -from there. We know that $s(D)[x] \ge soln(s)[x] ~\forall D \ge -\solnFP(s), x \in X$, thus each evaluation of $s[x]$ in the context of -$D$ will result in a new value for $D[x]$ which is either closer to or -equal to $\solnFP(s)[x]$. As we only ever evaluate in the context of -variables taken from $D$, and $D[x]$ is set to $\infty$ when $x$ is -invalidated, we know that $D \ge \solnFP(s)$ holds. - -(\ref{eqn:fp-inv3}) follows from the use of the $\eval \fixpoint$ -function. Each variable lookup results in a direct insertion into -$\inflFP$, so during the evaluation of $x$ all relevant $\inflFP$ sets -are updated to reflect the current dependencies. - -For (\ref{eqn:fp-inv2}) we will examine what has happened when each of -the conditions is false: -\begin{itemize} -\item - If $x \in \RFP$ then we are currently in the context of an earlier - evaluation of $x$, so the value of $D[x]$ will be changed again by - that earlier call. In addition, we conclude that $x \in - \depFP^{+}[x]$ and thus, from the following condition, that $x$ will - be re-evaluated at some point after this one. - -\item - If $\depFP^{+}[x] \cap \RFP \ne \emptyset$ then a variable which $x$ - depends on is in $\RFP$ at the moment. This will result in the - re-evaluation of $x$ if the value of any of the variables in - $\depFP^{+}[x] \cap \RFP$ have changed. - -\item - If $x \not \in \stableFP$ then a re-evaluation of $x$ will occur - with the next call to $\solve \fixpoint(x)$. This may result in a - change to the value of $D[x]$, or it may leave $D[x]$ stable for the - moment. If we denote the variable assignment before the evaluation - of $x$ by $D_{\pre}$, and the variable assignment after the - evaluation of $x$ by $D_{\post}$ then we find that $D_{\post} \le - D_{\pre}$, due to the monotonicity of $s$. After each change of - $D[x]$, everything which \emph{may} have depended on $D[x]$ is - re-evaluated (see the previous two items), leading to a final point - of stability when $D[x] = \solnFP(s)[x]$, by the definition of - $\solnFP(s)[x]$ and (\ref{eqn:fp-inv1}). -\end{itemize} - -After an evaluation of $\solve \fixpoint(x)$ we know that $x \in -\stableFP$. If it is also the case that $\RFP = \emptyset$, as is the -case for top-level calls to $\solve \fixpoint$, then we know that -$D[x] = \solnFP(s)[x]$. This means that the function $\lambda x -. (\solve \fixpoint(x); D[x])$ will act as a variable assignment -solving for the greatest fixpoint of $s = \sigma(\system)$, as is -required by our $\max$-strategy iteration. As the $\max$-strategy -iteration changes $\sigma$ it also induces a change in $s = -\sigma(\system)$, so in order to maintain the correctness of this -algorithm we must show that our above invariants are maintained by -this process. - -When the $\max$-strategy iteration changes the current $\max$-strategy -$\sigma$ at the variable $x$ it changes the equation system $s = -\sigma(\system)$ that the fixpoint iteration uses. The $\max$-strategy -iteration then invalidates the affected portion of the fixpoint -iteration by doing two things: it removes the variable $x$ from the -$\stableFP$ set, and it sets $D[x] = \infty$. The invalidation of -variables is then propagated to each variable which transitively -depends on $x$: $\{ y \in X \mid x \in \depFP^{+}[y] \} = \{ y \in X -\mid y \in \inflFP^{+}[x] \}$, where $\inflFP^{+}$ is the transitive -closure of $\inflFP$. We know from (\ref{eqn:fp-inv3}) that this -transitive closure of $\inflFP$ will identify the entire subsystem -which may depend on the value of $x$. The invalidation of transitive -dependencies ensures that (\ref{eqn:fp-inv1}) holds for the changed -subsystem, as $\infty \ge z ~ \forall z \in \CZ$. From -(\ref{eqn:fp-inv1}) we can then conclude that (\ref{eqn:fp-inv2}) -holds as well, as the removal of $x$ from $\stableFP$ combined with -(\ref{eqn:fp-inv1}) leads to (\ref{eqn:fp-inv2}). These invariants now -hold for the affected sub-system and are still true for the unaffected -sub-system. Thus our invariants hold for our entire system and our -fixpoint iteration will continue to be correct in the presence of -invalidation by the $\max$-strategy iteration. - - -We move now to the $\max$-strategy iteration. We will use $\rho: X \to -\CZ$ as $\rho = \lambda x . (\solve \fixpoint(x); D[x])$, a variable +In the following brief analysis we use $\rho: X \to \CZ$ as $\rho += \lambda x . (\solve \fixpoint(x); D[x])$ to denote a variable assignment which will always calculate the greatest fixpoint of $\sigma(\system)$ for the current strategy $\sigma$. Each time $\rho$ is queried for a variable's value it will also record which variables have had their values changed, whether or not those changed values are -final, in the set $\touchedFP$. +final, in the set $\touchedFP$. This represents the effect of the +fixpoint iteration portion of the DSI algorithm. We can establish similar invariants for the $\max$-strategy iteration as we had for the fixpoint iteration. We denote the optimal strategy @@ -1188,9 +1176,6 @@ the $\max$-strategy. If the variable has stabilised to the same value as it had previously then it has not changed and thus has no effect on the $\max$-strategy. -If both the $\max$-strategy iteration and the fixpoint iteration -correctly solve their respective portions of the problem, and the -communication between them is correct, then we know that the overall -algorithm is correct. + \subsection{Implementation} \label{sec:ldsi:implementation} diff --git a/tex/thesis/experiments/experiments.tex b/tex/thesis/experiments/experiments.tex deleted file mode 100644 index cea409d..0000000 --- a/tex/thesis/experiments/experiments.tex +++ /dev/null @@ -1,4 +0,0 @@ -\chapter{Experiments} \label{chap:experiments} - -The experiments go here. - diff --git a/tex/thesis/introduction/introduction.tex b/tex/thesis/introduction/introduction.tex index ffc6cd2..17473af 100644 --- a/tex/thesis/introduction/introduction.tex +++ b/tex/thesis/introduction/introduction.tex @@ -36,9 +36,42 @@ in some way \emph{verify} the software before running it. While it is possible to write programs in a way that is easier to verify, it is still a difficult and expensive process to provide verification. -Recently there has been work done in automatically analysing programs -in order to identify bugs. This work is broadly classed \emph{static - analysis} and this thesis contributes to this work. +There has been a continuous stream of research into automatically +analysing programs in order to identify bugs from the late 70s to the +present\cite{CousotCousot77-1,EasyChair:117}. This work is broadly +classed \emph{static analysis} and this thesis contributes to this +work. + +Due to Rice's Theorem\cite{Rice} we know that it is impossible to +perform a perfect static analysis in general. In order to overcome +this many approaches to static analysis have been developed, including +the framework of abstract interpretation presented by Cousot and +Cousot\cite{CousotCousot77-1}. In this framework we consider a program +in a simplified, \emph{abstract}, domain. This allows us to perform a +sound analysis in general at the expense of precision. + +Two abstract domains which are particularly well-known and useful are +those of \emph{interval} and \emph{zones}. The interval abstract +domain consists of restricting variables in the range $[a,b]$. This +domain can be useful for avoiding such program errors as division by +zero errors and out of bounds memory accesses. The abstract domain of +zones allows us to further place a bound on the difference between +variables $x - y \le c$. This allows us to be more precise in our +analysis, thereby avoiding false-positives, at the cost of speed. + +The process of determining appropriate bounds for program variables +has traditionally been performed by performing several Kleene +iterations, followed by a \emph{widening} and \emph{narrowing}. These +widening and narrowing operators ensure termination, but once again at +the cost of precision. As an alternative to these operators we can, in +some instances, use a game-theoretic approach to compute the abstract +semantics of a program\cite{EasyChair:117}. This approach is less +general than widening and narrowing, but performs precise abstract +interpretation over zones. + +In this thesis we present an implementation of this game-theoretic +algorithm, along with our own enhancements which improve its +performance on real-world data. @@ -56,7 +89,7 @@ Theoretical contribution: \item We present a demand-driven strategy improvement algorithm for solving monotonic, expansive equation systems involving $\min$ and - $\max$ operators + $\max$ operators. \end{enumerate} Systems contribution: @@ -64,13 +97,29 @@ Systems contribution: \item We develop a solver for monotonic, expansive equation systems based on the work of Gawlitza et al.\cite{EasyChair:117} with several - improvements + improvements. \item We analyse the performance of our improved solver on a set of - equation systems to demonstrate the effect of our improvements + equation systems to demonstrate the effect of our improvements. \item We integrate our solver into the LLVM framework to perform analysis - over Zones\cite{mine:padoII} + over Zones\cite{mine:padoII}. \item - We analyse the performance of our LLVM analysis on various program inputs + We analyse the performance of our LLVM analysis on various program + inputs. \end{enumerate} + +\section{Structure of the thesis} + +In Chapter \ref{chap:litreview} we review the background literature in +the field of static analysis, laying the basis for the rest of the +thesis. %TODO: more + +We present our theoretical contribution in \ref{chap:contribution}: a +demand-driven strategy improvement algorithm. We present the algorithm +along with an evaluation of its performance on realistic problems. + +Our implementation is discussed in Chapter +\ref{chap:implementation}. We discuss several aspects of our +implementation before evaluating the results of our analysis on +several programs. diff --git a/tex/thesis/litreview/litreview.tex b/tex/thesis/litreview/litreview.tex index 109864e..38bd9dc 100644 --- a/tex/thesis/litreview/litreview.tex +++ b/tex/thesis/litreview/litreview.tex @@ -8,7 +8,7 @@ \newcommand{\Stmt}{\mathsf{Stmt}} \newcommand{\Vars}{\mathsf{Vars}} -\chapter{Preliminaries} \label{chap:litreview} +\chapter{Background} \label{chap:litreview} In this chapter we review the relevant literature pertaining to static analysis. We briefly cover the semantics of programs generally before @@ -38,9 +38,9 @@ side-effects, so it is necessary to introduce some notion of ``state'' into our semantics. By defining $\semantics{\cdot}: \states \to \states$ for all statements $x$ we provide a mechanism for statements to perform side effects. Elements of $\states$ each represent a -possible state for the program to be in. For example: for an -integral assignment statement $I := E$, assigning the value of an -expression $E$ to a variable $I$, have the following denotation: +possible state for the program to be in. For example: for an integral +assignment statement $I := E$, assigning the value of an expression +$E$ to a variable $I$, we have the following denotation: \begin{align*} \semantics{I := E} = \lambda \rho . (\rho \oplus \{I \mapsto \semantics{E}(\rho)\}) \end{align*} diff --git a/tex/thesis/thesis.tex b/tex/thesis/thesis.tex index 9b57230..d959f81 100644 --- a/tex/thesis/thesis.tex +++ b/tex/thesis/thesis.tex @@ -145,7 +145,7 @@ \input{introduction/introduction.tex} \input{litreview/litreview.tex} \input{contribution/contribution.tex} -\input{experiments/experiments.tex} +\input{implementation/implementation.tex} \input{conclusion/conclusion.tex} -- cgit v1.2.3