\documentclass{article} \usepackage[a4paper]{geometry} \usepackage{hyperref} \usepackage{amsmath} \title{INFO5993 \\ Literature Review} \author{Carlo Zancanaro} \begin{document} \maketitle \section{Introduction} Static analysis is concerned with determining as much as possible about a program from it without actually running the program. Being able to analyse the program statically can provide a wealth of information about the program's characteristics without executing it at all. The uses for static analysis vary significantly, from program optimisation (eg. constant propagation) to verification of program semantics and bug-finding. In critical applications (for example: medial equipment and flight computers) static analysis techniques are used to ensure that programs are free from errors which would cause catastrophic failure. Of particular use in static analysis is the determination of integral values within a program. By determining which possible values any integer variables in the program can take at each code point it is possible to detect a significant number of errors relating to improper integral values (divide-by-zero, array-out-of-bounds, etc.). Such errors are among the most common in real-world software development, so their detection is of much use to software developers. \section{Abstract Interpretation} \subsection{General Framework} Abstract interpretation is a technique introduced in \cite{CousotCousot77-1} in which a program's semantics are examined within an abstract domain. This abstract domain is a simplification of the concrete domain in which the program actually executes, but is constructed in such a way as to retain some properties of the original program for analysis. We perform our analysis by "executing" our program within this abstract domain which we have constructed. An example of an abstract domain is considering numbers as either positive $(+)$, negative $(-)$, zero $(0)$, unknown $(\pm)$ or undefined $(?)$. Using this abstraction we can still define some semantics which reflect the program itself (as an example: $(+) \times (-) = (-)$, or the product of a positive number with a negative number is a negative number). Each abstract domain must also contain a "top" $(\top)$ and a "bottom" $(\bot)$ element. The existence of these two elements turn our abstract domain into a complete lattice, which ensures some helpful properties of our abstract domain, including the existence of solutions to fixpoint-equations (\cite{tarski}). By constructing appropriate abstract domains we can infer properties of the program's concrete semantics which are relevant to our application. The above abstract domain, for example, can be used to detect divide-by-zero errors (although it is clear from the above that not all divide-by-zero errors will be detected) or negative array-indexing (one form of an out of bounds error). A number of abstract domains have been conceived and formalised in literature. These are considered in the following subsection. An important factor to consider in the following is that as abstract domains get more "detailed" they become more computationally expensive to analyse. \subsection{Abstract Domains} Cousot and Cousot, in \cite{CousotCousot77-1} introduce an abstract domain providing an upper and lower bound to each integral value. This domain allows them to reason about numbers with greater precision than simply considering positive/negative/zero. The abstract domain of "zones", are handled in \cite{mine:padoII}. These zones provide a way to perform analysis on the bounds of variable differences (ie. to place bounds as $a - b \leq c$) and are also referred to as Difference Bound Matrices (which are used in other model-checking applications, such as \cite{Yovine98modelchecking}). The domain of zones is then extended by the same author into the "Octagon Abstract Domain" in \cite{mine:hosc06} which additionally allows bounds on the sum of variables ($\pm a \pm b \leq c$). All of these above domains are actually shown to be instances of the polyhedra abstract domain in \cite{DBLP:conf/vmcai/SankaranarayananSM05}. The polyhedra domain expresses bounds on $a_1 x_1 + a_2 x_2 ... + a_n x_n + c \ge 0$, where $a_i$ are fixed in advance (a predefined polyhedral shape). This allows for ranges to be expressed as $1 \times x_1 + c \ge 0$ provides a single bound for a single variable (two such bounds is, of course, a range). Similarly for zones one can define a polyhedra $1 \times x_1 + (-1) \times x_2 + c \ge 0$ which is equivalent. The octagon domain is $(\pm 1) \times x_1 + (\pm 1) \times x_2 + c \ge 0$. Non-integral abstract domains have also been investigated in papers such as \cite{springerlink}, which considers the abstract domain of strings with abstract values being regular expressions, but it seems the majority of the literature is concerned with providing bounds for integral, or at the very least numerical, expressions. The reasons for this are fairly obvious when one considers the ubiquity of integers and the relative simplicity of the numerical domain compared to many others. \section{Fixpoints} When performing static analysis there is a great need to perform some sort of fixpoint solving to determine the possible resultant values for some operations. When looping, for instance, it is necessary to "iterate" (in some sense) to find what values each variable may take during a loop iteration. The most basic method for doing so is simply to perform a complete iteration in the abstract domain, but doing so within the integers has no guarantee of termination (due to the integers being an unbounded set) and would not be expected to be performant. In \cite{CousotCousot77-1} an approach of "widening" and "narrowing" is introduced. This widening/narrowing method performs some fixed number of iteration steps. If, after this fixed number, the iteration has not stabilised then it assumes that it will not stabilise and will instead continue infinitely, so any affected variables are "widened" to infinity. A "narrowing" step is then carried out wherein variables are brought back into the finite domain by some terminating condition on the loop. This approach is quite general, and so is applicable to many instances of real programs, but is a crude approximation of the reality of the program (albeit, with surprisingly good results). In specific instances it is possible to perform a better approximation. In particular the case of addition, multiplication (by positive constants), minimum and maximum is considered in \cite{Gawlitza:2007:PFC:1762174.1762203}. With this approach code points are converted into fixpoint equations with $minimum$ and $maximum$ expressions. These fixpoint equations can then be solved in order to give information about the program's abstract semantics. These equations are solved using a form of policy iteration (referred to as strategy iteration) over the $maximum$ expressions present in each term. A second step consisting of ordinary fixpoint-iteration over the resulting $minimum$-expressions is then performed. When fixpoint-iteration is known to terminate then it is possible to get significant speed gains by processing each expression according to their dependencies. The approach presented in \cite{DBLP:tr/trier/MI96-11}, in particular, improves the real-world performance of their fixpoint-equation solver. It should be noted, however, that such evaluation strategies can do nothing to affect the worst-case complexity as it is always possible to construct a system in which evaluation of each variable is necessary at every stage until the system stabilises. \section{Conclusion} Static analysis is of great practical importance to real world developers. It gives us the ability to create tools which automatically identify common errors in programs due to developer errors and so allows us to provide greater guarantees about the behaviour of our software. Unfortunately complete static analysis of a program is an undecidable problem, and so we must perform abstract interpretation of a program's semantics in order to provide useful results of analysis. Many different abstract domains exist in which to consider programs, with varying precision and computational complexity. In particular the abstract domain of polyhedra (and the related domains of zones and ranges) are of great use when performing analysis of a program's integral semantics. \bibliographystyle{abbrv} \bibliography{references.bib} \end{document}