1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
|
\newcommand{\Env}{\mathsf{Env}}
\newcommand{\PP}{\mathsf{PP}}
\newcommand{\States}{\mathsf{States}}
\chapter{Literature Review} \label{chap:litreview}
\section{Program Semantics}
In order to perform any static analysis of a program it is necessary
to have some notion of the \emph{semantics}, or meaning, of a
program.
A program's semantics can be realised by considering the possible
values for each variable at each point in the program, which we will
refer to as the possible \emph{states} of the program, and the
transitions between these states performed by \emph{statements}.
If we consider a finite set of control points, $N$, and a finite set
of variables $X$ then we can capture the state of a program at a
particular point
As a formalisation of these notions we can consider these to be
defined as follows:
\begin{align*}
\Env: {} & X \to \Z \\
\PP: {} & \{ \text{program points} \} \\
\States: {} & \PP \times \Env
\end{align*}
Let $X$ be the set of program variables, $D$ be the union of all
possible domains from which variables can come, and $PP$ be the set of
all edges in the Control Flow Graph of the program.
\begin{eqnarray}
Env: {} & X \to D \\
Expr: {} & Env \to D \\
States: {} & PP \times Env
\end{eqnarray}
$States$, then, contains every possible state for the program ($Env$)
at each program point. A more interesting set for us is the set of
reachable states, this is: states which the program can actually reach
during execution. This concept of ``reachable states'' allows us to
formally capture all possible execution states for a program. We can
further define $Reach(p) = {e \in Env | \le p,e \ge \in States}$
Unfortunately it is unfeasible to analyse the complete semantics of a
program, due to the sheer number of possible reachable states which
must be examined. In many cases there are an infinite number of states
which would have to be considered to ensure a sound analysis of the
program.
\section{Abstract Interpretation} \label{sec:abstract-interpretation}
\subsection{General Framework}
Cousot and Cousot developed a framework for performing sound program
analysis called ``Abstract Interpretation''. In this framework we
consider a program under a set of ``abstract semantics'' which, in
some sense, simplify the program (and the analysis thereof).
As a simple example one may consider the case of mathematical
expressions over the integers. The operators \(+,\times\) perform
operations on these integers, with their semantics being defined as
one would expect. We may then define an abstract semantic of these
expressions when considering only signs (and zero), for example. Then
our abstract states contains four states: ${(+), (-), (\pm), 0}$.
\begin{align*}
( + ) + ( + ) &= ( + ) \\
( + ) + ( - ) &= ( \pm ) \\
( - ) + ( - ) &= ( - ) \\
x + 0 &= x \\
( \pm ) + x &= ( \pm ) \\
( + ) \times ( + ) &= ( + ) \\
( + ) \times ( - ) &= ( - ) \\
( - ) \times ( + ) &= ( - ) \\
0 \times x &= 0 \\
( \pm ) \times x &= ( \pm ), \forall x \ne 0 \\
\end{align*}
This abstract semantic will give us sound results (any analysis of the
abstract semantic will correspond to an equivalent analysis of the
concrete semantic), but will not give us every opportunity for
analysis (the abstraction may ignore relevant details).
In order to ensure we have a useful abstract interpretation it's
necessary to provide two operations to map between concrete states and
abstract states. These operations we call ``abstraction'' (mapping
from a concrete state to an abstract state), denoted by \(\alpha\),
and ``concretization'' (mapping from an abstract state to a concrete
state), denoted by \(\gamma\).
\textbf{Write more about the abstraction/concretization functions}
\subsection{Intervals}
Something about intervals.
\subsection{Zones} \label{subsec:zones}
In order to perform any abstract interpretation we must first
determine which abstract domain we will consider. Cousot and Cousot
introduced the abstract domain of ``ranges'' in their original paper
\cite{CousotCousot77-1}. This domain constrains variables to lie
within a range \(x \in [a, b]\) with \(x, a, b \in \Z\).
Further work has been done the abstract domain of zones and octagons,
which are generalised by the template polyhedral abstract domain.
Zones provide an abstract domain in which we try to put a bound on
either a variable's magnitude ($\pm x \le c$) or to put a bound on the
difference between two variables ($x - y \le c$).
Octagons generalise this by allowing us to vary the sign on both
variables to derive a bound as $\pm x \pm y \le c$.
The polyhedral abstract domain is a clear generalisation of both the
zone and octagon abstract domains in that it determines a bound for an
arbitrary linear combination of arbitrary program variables. For
example, finding a bound as $c_0 x_0 + c_1 x_1 + ... + c_n x_n \le c$,
where $c, c_i \in \R$ and $x_i$ are program variables.
\section{Min/max problem}
In \newcite{Gawlitza:2007:PFC:1762174.1762203} a conversion from a
program's Control Flow Graph (henceforth CFG) to a system of equations
involving only monotonic, expansive operators is presented. This
conversion results in an equation system which can be solved in order
to perform Abstract Interpretation (Section
\ref{sec:abstract-interpretation}) over Zones (Section
\ref{subsec:zones}).
\subsection{Partial-ordering Formulation}
To begin our construction of the equation systems we first need to
formalise one of a CFG. For our purposes we will consider a CFG to be
a tuple $G = (N, E, S, st)$. A finite set of \emph{control-points}
$N$, a finite set $E \subseteq N \times N$ of \emph{edges} between
these control points (ie. if $(u,v) \in E$ then an edge exists in the
CFG from $u$ to $v$). A mapping $s: N \times N \to Stmt$
taking each edge in the CFG to a \emph{statement}. $st \in N$ denotes
the special \emph{start control-point}.
We can now begin our construction of our abstract semantics $V^\#[v],
\forall v \in N$.
\begin{align}
V^\# [st] &\ge \alpha(\Z^n) \label{eqn:start-node} \\
V^\# [v] &\ge \llb s_{(u,v)} \rrb ^\# (V^\# [u]) & \forall (u,v) \in
E \label{eqn:confluence}
\end{align}
In the first equation, the possible abstract states of the start node
must always be the completely unconstrained abstract domain. Any value
is possible for any variable on the first node of the CFG.
The second equation requires that the possible abstract states of a
node, $v$, contains either the same amount of information or less
information than each of the predecessors of $v$ in the CFG.
\begin{figure}
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node
distance=2cm,main node/.style={circle,fill=blue!20,draw},every
loop/.style={min distance=1.5cm}]
\node[main node] (V) {$v$};
\node[main node] (U) [above left of=V] {$u$};
\node[main node] (W) [above right of=V] {$w$};
\path[every node/.style={fill=none}]
(U) edge node[left]{$s_{(u,v)}$} (V)
(W) edge node[right]{$s_{(w,v)}$} (V);
\end{tikzpicture}
\caption{A simple CFG fragment.}
\label{cfg:confluence}
\end{figure}
So, if we have the CFG fragment shown in Figure \ref{cfg:confluence}
then we know that $V^\#[v]$ must `contain', or be larger than, each of
the edges coming in to $v$. In this case there are two incoming edges
to $v$: $(u,v)$ and $(w,v)$.
The edge $(u,v)$ leads to the state $\llb s_{(u,v)} \rrb^\#
(V^\#[u])$. It is clear that $V^\#[v]$ must be either the same size
or `larger' than this value, as it must either equal this constraint
or it must be a looser constraint. This gives us the inequality
\eqref{eqn:confluence}: $V^\#[v] \ge \llb s_{(u,v)}^\# \rrb
(V^\#[u])$.
Similarly by considering the edge $(w,v)$ and following the same argument
we find that $V^\# [v] \ge \llb s_{(w,v)} \rrb ^\# (V^\# [w])$.
For the start node in the CFG this inequality is obviously
useless. The start node has no predecessors (by definition) and thus
must have a value that is not dependent on other nodes. We can,
therefore, simply define the start node's possible abstract states to
be at least as big as the complete set of abstract states (as in
\eqref{eqn:start-node}).
We have not yet considered the effect that $s_{(u,v)}, \forall (u,v)
\in E$ has on the state in each edge. The effect of the statement in
the abstract domain depends on the statement's effect in the concrete
domain, but for all statements we define $\llb s \rrb^\# := \alpha
\circ \llb s \rrb \circ \gamma$. In essence this is just changing the
environment in which the statement is acting, so the statement's
effects are moved into our abstract domain with our abstraction and
concretization functions.
\begin{example}
\label{example:partial-ordered-construction}
As an example, let us consider the following program:
\begin{figure}[here]
\begin{minipage}{5cm}
\begin{lstlisting}
x = 1
while x <= 99:
x = x + 1
\end{lstlisting}
\end{minipage}
\hfill
\begin{minipage}{0.45\textwidth}
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node
distance=2cm,main node/.style={circle,fill=blue!20,draw},every
loop/.style={min distance=1.5cm}]
\node[main node] (A) {$A$};
\node[main node] (B) [below of=A] {$B$};
\node[main node] (C) [below of=B] {$C$};
\path[every node/.style={fill=none}]
(A) edge node[right]{$x = 1$} (B)
(B) edge [loop right] node[right]{$x \le 99: x = x + 1$} (B)
(B) edge node[right]{$x > 99$} (C);
\end{tikzpicture}
\end{minipage}
\caption{Program and CFG for Example
\ref{example:partial-ordered-construction}}
\label{fig:partial-ordered-construction}
\end{figure}
We will now attempt to derive a system of inequalities from this for
the \emph{least upper bound} of $x$:
\begin{align*}
ub: {} & ~\text{\bf Vars} \to \CZ \\
ub(x) &= \min\{v \in \CZ ~|~ x \le v\}
\end{align*}
For $A$ the value of $x$ is entirely unknown, so no useful bound can
be given. The possible values of $x$ are bounded only by the top
value in $\CZ$, which is $\infty$. This gives us the following
inequality for node $A$:
\begin{align*}
ub(x_A) &\ge \infty
\end{align*}
When we reach $B$ we have two incoming edges. One comes from $B$
itself, the other from $A$.
The edge from $A$ to $B$ is very simple. It has no constraint and
simply sets the value of $x$ to $1$. It corresponds to an equation
of the form $ub(x_B) \ge 1$.
The edge from $B$ to itself is somewhat more complex. To begin with
it changes the value of $x$ depending on the previous value of
$x$. It also is executed conditionally (when $x \le 99$), which will
affect the bound on $x$. Because the condition limits the growth of
$x$ the corresponding equation for this edge is $ub(x_B) \ge
\min(x, 99) + 1$.
This leaves us with two simultaneous constraints on the value of
$ub(x_B)$.
\begin{align*}
ub(x_B) &\ge 1 \\
ub(x_B) &\ge \min(ub(x_B), 99) + 1
\end{align*}
The upper bound on the value of $x$ for node $C$ is the same as the
upper bound on the value of $x$ in $B$, because the edge from $B$ to
$C$ does not change the value of $x$, nor does it provide any more
details about the upper bound of $x$. (It does change the lower
bound of $x$, but we have not considered the lower bound in this
analysis.) This gives us a very simple equation system for $C$.
\begin{align*}
ub(x_C) &\ge ub(x_B)
\end{align*}
So, the upper bounds for $x$ can be found at every program point in
our original program by finding the smallest values of $ub(x_A),
ub(x_B), ub(x_C)$ subject to these constraints:
\begin{align*}
ub(x_A) &\ge \infty \\
ub(x_B) &\ge 1 \\
ub(x_B) &\ge \min(ub(x_B), 99) + 1 \\
ub(x_C) &\ge ub(x_B)
\end{align*}
\end{example}
\subsection{Lattice construction}
\label{subsec:lattice-construction}
The partial-ordering formulation considered above is a helpful
presentation of the problem in terms of constraints, but an equivalent
system can be constructed to aid us in locating the \emph{least}
solution to the system.
For every partial-ordering on a set there exists an equivalent
lattice, defined in terms of a meet operator ($\vee$) and a join
operator ($\land$), which are defined by the following relationships:
\begin{align*}
x \le y ~&\iff~ x \vee y = y \\
x \le y ~&\iff~ x \land y = x
\end{align*}
In the case that we are considering, with $x, y \in \CZ$, the meet and
join operators correspond with $\max$ and $\min$, respectively. This
allows us to simplify the system of inequalities in Example
\ref{example:partial-ordered-construction} to the following:
\begin{align*}
ub(x_A) &= \infty \\
ub(x_B) &= \max(1, \min(ub(x_B), 99) + 1) \\
ub(x_C) &= ub(x_B)
\end{align*}
The two constraints on the upper bound of $x_B$ can be combined into
one constraint by taking the maximum of them. This is because $x \ge
\max(a, b) ~\implies~ x \ge a \text{ and } x \ge b$
\begin{example}
\label{example:both-bounds}
In order to determine a lower bound for $x$ at each program point it
is sufficient to determine an \emph{upper} bound for $-x$. For our
second example we will consider the following program fragment:
\begin{figure}[here]
\begin{minipage}{5cm}
\begin{lstlisting}
x = 0
while x <= 9:
x = x + 1
\end{lstlisting}
\end{minipage}
\hfill
\begin{minipage}{0.5\textwidth}
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node
distance=2cm,main node/.style={circle,fill=blue!20,draw},every
loop/.style={min distance=1.5cm}]
\node[main node] (A) {$A$};
\node[main node] (B) [below of=A] {$B$};
\node[main node] (C) [below of=B] {$C$};
\path[every node/.style={fill=none}]
(A) edge node[right]{$x = 0$} (B)
(B) edge [loop right] node[right]{$x \le 9: x = x + 1$} (B)
(B) edge node[right]{$x > 9$} (C);
\end{tikzpicture}
\end{minipage}
\caption{Program and CFG for Example \ref{example:both-bounds}}
\label{fig:both-bounds}
\end{figure}
We will now try to derive an equation system for the \emph{upper}
and \emph{lower} bounds of $x$.
In order to capture the state of the variables at each program point
we will consider the abstract state as the two-dimensional vector
$(ub(x), -lb(x))$, that is the upper bound of $x$, $ub(x)$, and the
\emph{negated} lower bound of $x$, $-lb(x)$. The least fixpoint of
our system of equations over $\CZ^2$ will then give us our bounds
for $ub(x)$ and $lb(x)$.
To begin with, let us once again consider the node $A$. At the point
of $A$ there is no information as to the bounds of $x$, so our
constraint must contain the entirety of $\CZ$. This corresponds to
$ub(x) = \infty, lb(x) = -\infty$, which gives us the following
inequality:
\begin{align*}
\left(\begin{array}{c}
ub(x_A) \\ -lb(x_A)
\end{array}\right)
&\ge
\left(\begin{array}{c}
\infty \\ \infty
\end{array}\right)
\end{align*}
As before, the node $B$ is the first of particular interest to
us. With two incoming edges we must have two constraints for this
node.
The first constraint comes from the edge $A \to B$. This edge does
not do much of interest, it ignores the preceding value of $x$ and
simply sets $x$ to $1$. This is captured in the following inequality:
\begin{align*}
\left(\begin{array}{c}
ub(x_B) \\ -lb(x_B)
\end{array}\right)
&\ge
\left(\begin{array}{c}
1 \\ -1
\end{array}\right)
\end{align*}
The constraint from the edge $B \to B$ is somewhat more
complicated. The transformation $x = x + 1$ is represented by the
matrix/vector operations of multiplication and
addition. Multiplication by the identity matrix indicates that $x$
itself is present in the right-hand side of the transformation,
while the addition of the vector \(\left(\begin{array}{c} 1 \\ -1
\end{array}\right) \) captures the effect of the addition on
each of the bounds (increasing each of the bounds by one).
The $x \le 9$ conditional, which is present on the edge, is captured
by the restriction on the growth of the upper bound. \(\min(ub(x_B),
9)\) ensures the upper-bound's growth is limited by the value
$9$. It should be noted that there is no bound placed on the
lower-bound as the program only bounds $x$'s growth from above on
this edge.
\begin{align*}
\left(\begin{array}{c}
ub(x_B) \\ -lb(x_B)
\end{array}\right)
&\ge
\left[\begin{array}{cc}
1 & 0 \\
0 & 1
\end{array}\right]
\left(\begin{array}{c}
\min(ub(x_B), 99) \\
-lb(x_B)
\end{array}\right)
+
\left(\begin{array}{c}
1 \\
-1
\end{array}\right)
\end{align*}
The edge $B \to C$ is quite simple as it simply provides another
restriction on the value of $x$ without any further
transformation. This restriction is placed on the lower-bound of $x$
only (leaving the upper-bound unchanged) and is captured by the
following inequality:
\begin{align*}
\left(\begin{array}{c}
ub(x_C) \\
-lb(x_C)
\end{array}\right)
&\ge
\left(\begin{array}{c}
ub(x_B) \\
\min(-lb(x_B), 99)
\end{array}\right)
\end{align*}
This gives us the following inequalities for this program:
\begin{align*}
\left(\begin{array}{c}
ub(x_A) \\ -lb(x_A)
\end{array}\right)
&\ge
\left(\begin{array}{c}
\infty \\ \infty
\end{array}\right)
\\
\left(\begin{array}{c}
ub(x_B) \\ -lb(x_B)
\end{array}\right)
&\ge
\left(\begin{array}{c}
1 \\ -1
\end{array}\right)
\\
\left(\begin{array}{c}
ub(x_B) \\ -lb(x_B)
\end{array}\right)
&\ge
\left[\begin{array}{cc}
1 & 0 \\
0 & 1
\end{array}\right]
\left(\begin{array}{c}
\min(ub(x_B), 99) \\
-lb(x_B)
\end{array}\right)
+
\left(\begin{array}{c}
1 \\
-1
\end{array}\right)
\\
\left(\begin{array}{c}
ub(x_C) \\
-lb(x_C)
\end{array}\right)
&\ge
\left(\begin{array}{c}
ub(x_B) \\
\min(-lb(x_B), 99)
\end{array}\right)
\end{align*}
By combining these expressions as discussed in Section
\ref{subsec:lattice-construction} we can obtain the following system
of equations:
\begin{align*}
\left(\begin{array}{c}
ub(x_A) \\ -lb(x_A)
\end{array}\right)
&=
\left(\begin{array}{c}
\infty \\ \infty
\end{array}\right)
\\
\left(\begin{array}{c}
ub(x_B) \\ -lb(x_B)
\end{array}\right)
&=
\max\left(
\left(\begin{array}{c}
1 \\ -1
\end{array}\right),
\left[\begin{array}{cc}
1 & 0 \\
0 & 1
\end{array}\right]
\left(\begin{array}{c}
\min(ub(x_B), 99) \\
-lb(x_B)
\end{array}\right)
+
\left(\begin{array}{c}
1 \\
-1
\end{array}\right)
\right)
\\
\left(\begin{array}{c}
ub(x_C) \\
-lb(x_C)
\end{array}\right)
&=
\left(\begin{array}{c}
ub(x_B) \\
\min(-lb(x_B), 99)
\end{array}\right)
\end{align*}
\end{example}
\subsection{Formalities}
\subsection{Example solve}
\label{sec-1-3-5}
Reconsidering our earlier equation system, we will now attempt to find
the minimal fixpoint by means of \emph{policy iteration} and
\emph{fixpoint iteration}.
\begin{eqnarray*}
x &=& \max(1, \min(x, 99) + 1) \\
\end{eqnarray*}
At the beginning and end of each player's turn we will have a variable
assignment $\{ x \to ? \}$, which will start with the value
$\{ x \to -\infty \}$.
The $\max$ player then gets a chance to play. They choose to improve
the strategy by choosing for $x$ to take the value $1$. The $\min$
player then has no choice but to find a variable assignment $\{ x
\to 1 \}$, as the $\max$ player has left them no choice.
Play then returns to the $\max$ player who chooses, again, to improve
their strategy to instead map $x$ to $\min(x, 99) + 1$. This time the
$\min$ player must make a decision. It solves $x = \min(x, 99) + 1$
for it's maximal fixpoint, deciding to use the $99$ value (as the $x$
value will grow without bound with the $+ 1$, thereby making $99$ the
minimum of the two options) and returning $\{ x \to 100 \}$ as
the new assignment.
The $\max$ player has no further way to improve their strategy, as $1
< 100$, so the game is over. The minimal fixpoint of the original
system has been found as $x = 100$.
|