summaryrefslogtreecommitdiff
path: root/tex/thesis/litreview/litreview.tex
blob: d223f3614377744363b064eba8d2b2258ddc2a51 (about) (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
\newcommand{\Env}{\mathsf{Env}}
\newcommand{\PP}{\mathsf{PP}}
\newcommand{\states}{\mathsf{states}}
\newcommand{\state}{\mathsf{state}}
\renewcommand{\b}[1]{\mathbf{#1}}
\newcommand{\semantics}[1]{\llb #1 \rrb}

\newcommand{\Stmt}{\mathsf{Stmt}}
\newcommand{\Vars}{\mathsf{Vars}}

\chapter{Background} \label{chap:litreview}

In this chapter we review the relevant literature pertaining to static
analysis. We briefly cover the semantics of programs generally before
looking more closely at the framework of abstract interpretation for
program analysis. The chapter concludes with an explanation of the
particular method of abstract interpretation over intervals and zones
which is implemented in this thesis.

\section{Program Semantics}

In order to perform any static analysis of a program it is necessary
to have some notion of the \emph{semantics}, or meaning, of a
program. The semantics of a program can be captured in several ways,
but for our purposes we consider \emph{denotational}, or
\emph{mathematical}, \emph{semantics} \cite{SCOTT70B}.

Denotational semantics views a program as a (partial) function built
up by the composition of many smaller functions. By expressing our
expressions, assignments, conditionals and loops as smaller functions
we can compose them to determine the semantics of the whole
program. The operator $\semantics{\cdot}$ represents the semantics of
its argument as a mathematical function.

In imperative languages programs are not a series of functional
transformations, however, each statement or expression may have
side-effects, so it is necessary to introduce some notion of ``state''
into our semantics. By defining $\semantics{\cdot}: \states \to
\states$ for all statements $x$ we provide a mechanism for statements
to perform side effects. Elements of $\states$ each represent a
possible state for the program to be in. For example: for an integral
assignment statement $I := E$, assigning the value of an expression
$E$ to a variable $I$, we have the following denotation:
\begin{align*}
  \semantics{I := E} = \lambda \rho . (\rho \oplus \{I \mapsto \semantics{E}(\rho)\})
\end{align*}
where $\rho \in \states$ and $\semantics{E}: \states \to \Z$. Our
$\states$ in this case are functions $\rho: \Vars \to \D$ from
variables to values.

Denotational semantics provide a formal abstract framework for us to
analyse programs mathematically, but they do not provide a meaningful
way to perform automated analysis of programs. As they perfectly
capture the semantics of a program they do not avoid the consequences
of Rice's Theorem \cite{Rice} or the Halting Problem \cite{Halting}
which preclude us from performing precise analysis of programs in
general.

\section{Abstract Interpretation} \label{sec:abstract-interpretation}

In order to work around the limitations on precise static analysis in
general, Cousot et al.~ developed a framework for performing sound
program analysis called ``abstract
interpretation'' \cite{CousotCousot77-1}. In this framework we consider
a program's semantics under an abstraction. The original semantics we
call ``concrete semantics'' and the abstracted semantics we call
``abstract semantics''. The abstract semantics consist of replacing
$\D$ and $\state$ from the denotational semantics with ``abstracted''
structures.

To provide us with some intuition of these concepts, consider the
following example:

\begin{example}
  Let us assume that our program's state is represented by $\rho:
  \Vars \to \Z$, with values coming from $\Z$.

  One possible abstraction would be to instead consider a variable's
  \emph{sign}. That is, to consider $x$ as satisfying one of the
  following (the symbols in parentheses are used to refer to these
  values from now):
  \begin{align*}
    x &\in \Z & (\pm) \\
    x &\ge 0 & (+) \\
    x &\le 0 &(-) \\
    x &= 0  &(0)
  \end{align*}
  From this abstraction we can then determine something about normal
  arithmetic operations in our program. We know, for instance, that
  $(+) \times (-) = (-)$, or that $0 \times (\pm) = 0$. We also, then,
  know that $x / 0$ is certainly an error.
  
  \label{example:abstract-interpretation} 
\end{example}

Abstraction sacrifices precision of analysis for decidability in
general. By abstracting our domain in a way which is ``sound'', which
safely over-approximates our concrete semantics, we ensure that any
analysis we perform on the abstraction is also valid for the concrete
semantics. Unfortunately in order to make this abstraction we
sacrifice precision, so it is possible that the analysis will identify
errors which are not errors in the concrete semantics.

\subsection{General Framework}

In order to transform between the concrete and abstract domains we
define two functions: $\alpha$ and $\gamma$. The function $\alpha:
2^\states \to \states_\alpha$ transforms a concrete state into an
abstract state. The function $\gamma: \states_\alpha \to 2^\states$
transforms an abstract state into a set of concrete states. We require
these two functions to satisfy the properties of a Galois connection:
\begin{align*}
  a \le \gamma(b) &\iff \alpha(a) \le b & \forall a \subseteq
  2^\states, b \in \states_\alpha.
\end{align*}
\noindent As a consequence of this, the result of abstracting $a \in
\states$ must at least contain $a$. In particular we are looking for
the \emph{best} abstract transformers, such that $\alpha(a)$ is as
tight as possible for a state $a$.

Abstract interpretation follows on from the idea of denotational
semantics, and it is possible to determine the abstract semantics of
an operation based on its concrete semantics. We use
$\semantics{\cdot}^\#$ to represent the abstract semantics of a
statement, and define the transformation as $\semantics{s}^\# = \alpha
\circ \semantics{s} \circ \gamma$.


We can now formalise the notion of a Control Flow Graph (henceforth
CFG) and define its abstraction. We consider a CFG to be a tuple $G =
(N, E, s, st)$. A finite set of \emph{control-points} $N$, a finite
set $E \subseteq N \times N$ of \emph{edges} between these control
points (ie. if $(u,v) \in E$ then an edge exists in the CFG from $u$
to $v$). A mapping $s: N \times N \to \Stmt$ taking each edge in the
CFG to a \emph{statement} $\in \Stmt$. $st \in N$ denotes the special
\emph{start control-point}.

We denote our abstract semantics of a node in our CFG by $V^\#[v],
\forall v \in N$ such that the following hold:
\begin{align}
  V^\# [st] &\ge \alpha(\Z^n) \label{eqn:start-node} \\
  V^\# [v] &\ge \llb s_{(u,v)} \rrb ^\# (V^\# [u]) & \forall (u,v) \in
  E \label{eqn:confluence}
\end{align}

In the first equation, the possible abstract states of the start node
must always be the completely unconstrained abstract domain. Any value
is possible for any variable on the first node of the CFG.

The second equation requires that the possible abstract states of a
node, $v$, contains either the same amount of information or less
information than each of the predecessors of $v$ in the CFG.

\begin{figure}
  \begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node
      distance=2cm,main node/.style={circle,fill=blue!20,draw},every
      loop/.style={min distance=1.5cm}]
    
    \node[main node] (V) {$v$};
    \node[main node] (U) [above left of=V] {$u$};
    \node[main node] (W) [above right of=V] {$w$};
    
    \path[every node/.style={fill=none}]
    (U) edge node[left]{$s_{(u,v)}$} (V)
    (W) edge node[right]{$s_{(w,v)}$} (V);
    
  \end{tikzpicture}
  \caption{A simple CFG fragment.}
  \label{cfg:confluence}
\end{figure}

So, if we have the CFG fragment shown in Figure \ref{cfg:confluence}
then we know that $V^\#[v]$ must `contain', or be larger than, each of
the edges coming in to $v$. In this case there are two incoming edges
to $v$: $(u,v)$ and $(w,v)$.

The edge $(u,v)$ leads to the state $\llb s_{(u,v)} \rrb^\#
(V^\#[u])$. It is clear that $V^\#[v]$ must be either the same size or
`larger' than this value, as it must either equal this constraint or
it must be a looser constraint. This gives us the inequality
\eqref{eqn:confluence}: $V^\#[v] \ge \llb s_{(u,v)} \rrb ^\#
(V^\#[u])$. Similarly by considering the edge $(w,v)$ and following
the same argument we find that $V^\# [v] \ge \llb s_{(w,v)} \rrb ^\#
(V^\# [w])$.

This analysis thus far has avoided defining any particular abstract
domain, but in practice we must choose an abstract domain before
performing our analysis. Much has been written on the topic of
abstract domains, ranging from the relatively simple, such as signs
and intervals \cite{CousotCousot77-1} to the more complex zones
\cite{mine:padoII}, octagons \cite{mine:hosc06} and more general
convex polyhedra \cite{DBLP:conf/vmcai/SankaranarayananSM05}. For this
thesis we are only concerned with \emph{intervals} and \emph{zones}.

\subsection{Intervals}

The abstract domain of \emph{intervals} was introduced by Cousot et
al. \cite{CousotCousot77-1} along with abstract interpretation. In this
domain we constrain each variable $x$ to lie within a range $[a, b]
\in \CZ^2$. This domain is finer than signs, so it allows more
information about the program to be determined. Another equivalent
formulation of intervals is that we are finding bounds such that
\begin{align*}
  x &\le b \text{ and,} \\
  -x &\le -a.
\end{align*}
%\noindent We are finding \emph{upper bounds} for $x$ and $-x$ at each
%location in the program.


\subsection{Zones}

\emph{Zones} are an abstract domain introduced by Min\'e
\cite{mine:padoII} which generalise intervals to include also bounding
the differences between variables. Min\'e's work on zones builds off
work done in the model checking of timed automata
\cite{yovine-zones}. For each variable $x$ zones constrain the
variable and its negation in the same way as intervals,
\begin{align*}
  x &\le b \text{ and,} \\
  -x &\le a.
\end{align*}
Zones additionally provide us with bounds on \emph{differences}
between variables. So for each pair of variables $(x,y)$, zones
provide us with a bound
\begin{align*}
  x - y & \le c.
\end{align*}




\section{Min/max problem}

In \cite{Gawlitza:2007:PFC:1762174.1762203} a conversion from a
program's CFG to a system of equations involving only monotonic,
expansive operators is presented. This conversion results in an
equation system which can be solved in order to perform abstract
interpretation over zones.

\subsection{Partial-ordering Formulation}

\begin{example}
  \label{example:partial-ordered-construction}

  As an example, let us consider the following program:
  
  \begin{figure}[here]
    \begin{minipage}{5cm}
      \begin{lstlisting}
    x = 1
    while x <= 9:
      x = x + 1
      \end{lstlisting}
    \end{minipage}
    \hfill
    \begin{minipage}{0.45\textwidth}
      \begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node
          distance=2cm,main node/.style={circle,fill=blue!20,draw},every
          loop/.style={min distance=1.5cm}]
        
        \node[main node] (A) {$A$};
        \node[main node] (B) [below of=A] {$B$};
        \node[main node] (C) [below of=B] {$C$};
        
        \path[every node/.style={fill=none}]
        (A) edge node[right]{$x = 1$} (B)
        (B) edge [loop right] node[right]{$x \le 9: x = x + 1$} (B)
        (B) edge node[right]{$x > 9$} (C);
        
      \end{tikzpicture}
    \end{minipage}
    
    \caption{A simple program with its corresponding CFG}
    \label{fig:partial-ordered-construction}
  \end{figure}

  We now attempt to derive a system of inequalities from this for the
  \emph{least upper bound} of a variable $x$ at each program
  point. That is, we want to find a value $ub(x)$ such that, for all
  possible executions of the program $x \le ub(x)$. We wish to find a
  value for $ub(x)$ at each program point in our original program. We
  denote these values by $ub_A(x)$, $ub_B(x)$ and $ub_C(x)$.

  For $A$ the value of $x$ is entirely unknown, so no useful bound can
  be given. The possible values of $x$ are bounded only by the top
  value in $\CZ$, which is $\infty$. This gives us the following
  inequality for node $A$:
  \begin{align*}
    ub(x_A) &\ge \infty
  \end{align*}

  When we reach $B$ we have two incoming edges. One comes from $B$
  itself, the other from $A$.

  The edge from $A$ to $B$ is very simple. It has no constraint and
  sets the value of $x$ to $1$. It corresponds to an equation of the
  form $ub(x_B) \ge 1$. However, there is also the possibility that
  $ub(x_A)$ will be unreachable, in which case $ub(x_B)$ will also be
  unreachable. This makes the inequality 
  \begin{align*}
    ub(x_B) \ge
    \left\{\begin{array}{lc}
      1 & \text{if } ub(u_A) > -\infty \\
      -\infty & \text{otherwise}
    \end{array}\right.
  \end{align*}

  The edge from $B$ to itself is somewhat more complex. To begin with
  it changes the value of $x$ depending on the previous value of
  $x$. It also is executed conditionally (when $x \le 99$), which
  affects the bound on $x$. Because the condition limits the growth of
  $x$ the corresponding equation for this edge is $ub(x_B) \ge \min(x,
  99) + 1$.

  This leaves us with two simultaneous constraints on the value of
  $ub(x_B)$.
  \begin{align*}
    ub(x_B) &\ge 1 \\
    ub(x_B) &\ge \min(ub(x_B), 9) + 1
  \end{align*}

  The upper bound on the value of $x$ for node $C$ is the same as the
  upper bound on the value of $x$ in $B$, because the edge from $B$ to
  $C$ does not change the value of $x$, nor does it provide any more
  details about the upper bound of $x$. (It does change the lower
  bound of $x$, but we have not considered the lower bound in this
  analysis.)  This gives us a very simple equation system for $C$.
  \begin{align*}
    ub(x_C) &\ge ub(x_B)
  \end{align*}

  So, the upper bounds for $x$ can be found at every program point in
  our original program by finding the smallest values of $ub(x_A),
  ub(x_B), ub(x_C)$ subject to these constraints:
  \begin{align*}
    ub(x_A) &\ge \infty \\
    ub(x_B) &\ge
    \left\{\begin{array}{lc}
      1 & \text{if } ub(u_A) > -\infty \\
      -\infty & \text{otherwise}
    \end{array}\right. \\
    ub(x_B) &\ge \min(ub(x_B), 9) + 1 \\
    ub(x_C) &\ge ub(x_B)
  \end{align*}

  The smallest values for $ub(x_A), ub(x_B), ub(x_C)$ which satisfy
  these constraints are:
  \begin{align*}
    ub(x_A) &= \infty \\
    ub(x_B) &= 10 \\
    ub(x_C) &= 10
  \end{align*}
  So, at program point $A$: $x \le \infty$; at $B$: $x \le 10$ and at
  $C$: $x \le 10$. 

\end{example}


\subsection{Lattice construction}
\label{subsec:lattice-construction}

The partial-ordering formulation considered above is a helpful
presentation of the problem in terms of constraints, but an equivalent
equation system can be constructed to aid us in locating the
\emph{least} solution to the constraint system.

For every partial-ordering on a set there exists an equivalent
lattice, defined in terms of a meet operator ($\vee$) and a join
operator ($\land$), which are defined by:
\begin{align*}
  x \vee y = z \iff z \ge x, z \ge y, (w \ge x, w \ge y \implies w \ge
  z, \forall w) \\
  x \land y = z \iff z \le x, z \le y, (w \le x, w \le y \implies w \le
  z, \forall w)
  %x \le y ~&\iff~ x \vee y = y \\
  %x \le y ~&\iff~ x \land y = x
\end{align*}

In the case of Example \ref{example:partial-ordered-construction},
with $x, y \in \CZ$, the meet and join operators correspond with
$\max$ and $\min$, respectively. This allows us to simplify the system
of inequalities in Example \ref{example:partial-ordered-construction}
to the following:
\begin{align*}
  ub(x_A) &= \infty \\
  ub(x_B) &= \max(1, \min(ub(x_B), 99) + 1) \\
  ub(x_C) &= ub(x_B)
\end{align*}

The two constraints on the upper bound of $x_B$ can be combined into
one constraint by taking the maximum of them. This is because $x \ge
\max(a, b) ~\implies~ x \ge a \text{ and } x \ge b$. The
Knaster-Tarski Fixpoint Theorem \cite{Knaster-Tarski} permits us to
replace our inequalities with equalities, thereby forming a system of
equations which can be solved for the least solution to the earlier
constraint system.



\begin{example}
  \label{example:both-bounds}

  In order to determine a lower bound for $x$ at each program point it
  is sufficient to determine an \emph{upper} bound for $-x$. We again
  consider the program fragment in Figure
  \ref{fig:partial-ordered-construction}.

  We now try to derive an equation system for the \emph{upper} and
  \emph{lower} bounds of $x$.

  In order to capture the state of the variables at each program point
  we consider the abstract state as the two-dimensional vector
  $(ub(x), -lb(x))$, that is the upper bound of $x$, $ub(x)$, and the
  \emph{negated} lower bound of $x$, $-lb(x)$. The least fixpoint of
  our system of equations over $\CZ^2$ then gives us bounds for
  $ub(x)$ and $lb(x)$.

  To begin with, let us once again consider the node $A$. At the point
  of $A$ there is no information as to the bounds of $x$, so our
  constraint must contain the entirety of $\CZ$. This corresponds to
  $ub(x) = \infty, lb(x) = -\infty$, which gives us the following
  inequality:
  \begin{align*}
    \left(\begin{array}{c}
      ub(x_A) \\ -lb(x_A)
    \end{array}\right)
    &\ge
    \left(\begin{array}{c}
      \infty \\ \infty
    \end{array}\right)
  \end{align*}

  As before, the node $B$ is the first of particular interest to
  us. With two incoming edges we must have two constraints for this
  node.

  The first constraint comes from the edge $A \to B$. This edge does
  not do much of interest, it ignores the preceding value of $x$ and
  sets $x$ to $1$. This is captured in the following inequality:
  \begin{align*}
    \left(\begin{array}{c}
      ub(x_B) \\ -lb(x_B)
    \end{array}\right)
    &\ge
    \left(\begin{array}{c}
      1 \\ -1
    \end{array}\right)
  \end{align*}

  The constraint from the edge $B \to B$ is somewhat more
  complicated. The transformation $x = x + 1$ is represented by the
  matrix/vector operations of multiplication and
  addition. Multiplication by the identity matrix indicates that $x$
  itself is present in the right-hand side of the transformation,
  while the addition of the vector \(\left(\begin{array}{c} 1 \\ -1
  \end{array}\right) \) captures the effect of the addition on
  each of the bounds (increasing each of the bounds by one).

  The $x \le 9$ conditional, which is present on the edge, is captured
  by the restriction on the growth of the upper bound. \(\min(ub(x_B),
  9)\) ensures the upper-bound's growth is limited by the value
  $9$. It should be noted that there is no bound placed on the
  lower-bound as the program only bounds $x$'s growth from above on
  this edge.
  \begin{align*}
    \left(\begin{array}{c}
      ub(x_B) \\ -lb(x_B)
    \end{array}\right)
    &\ge
    \left[\begin{array}{cc}
      1 & 0 \\
      0 & 1
    \end{array}\right]
    \left(\begin{array}{c}
      \min(ub(x_B), 99) \\
      -lb(x_B)
    \end{array}\right)
    +
    \left(\begin{array}{c}
      1 \\
      -1
    \end{array}\right)
  \end{align*}

  The edge $B \to C$ is quite simple as it provides another
  restriction on the value of $x$ without any further
  transformation. This restriction is placed on the lower-bound of $x$
  only (leaving the upper-bound unchanged) and is captured by the
  following inequality:
  \begin{align*}
    \left(\begin{array}{c}
      ub(x_C) \\
      -lb(x_C)
    \end{array}\right)
    &\ge
    \left(\begin{array}{c}
      ub(x_B) \\
      \min(-lb(x_B), 99)
    \end{array}\right)    
  \end{align*}

  This gives us the following inequalities for this program:
  \begin{align*}
    \left(\begin{array}{c}
      ub(x_A) \\ -lb(x_A)
    \end{array}\right)
    &\ge
    \left(\begin{array}{c}
      \infty \\ \infty
    \end{array}\right)
    \\
    \left(\begin{array}{c}
      ub(x_B) \\ -lb(x_B)
    \end{array}\right)
    &\ge
    \left(\begin{array}{c}
      1 \\ -1
    \end{array}\right)
    \\
    \left(\begin{array}{c}
      ub(x_B) \\ -lb(x_B)
    \end{array}\right)
    &\ge
    \left[\begin{array}{cc}
      1 & 0 \\
      0 & 1
    \end{array}\right]
    \left(\begin{array}{c}
      \min(ub(x_B), 99) \\
      -lb(x_B)
    \end{array}\right)
    +
    \left(\begin{array}{c}
      1 \\
      -1
    \end{array}\right)
    \\
    \left(\begin{array}{c}
      ub(x_C) \\
      -lb(x_C)
    \end{array}\right)
    &\ge
    \left(\begin{array}{c}
      ub(x_B) \\
      \min(-lb(x_B), 99)
    \end{array}\right)
  \end{align*}

  By combining these expressions as discussed in Section
  \ref{subsec:lattice-construction} we can obtain the following system
  of equations:
  \begin{align*}
    \left(\begin{array}{c}
      ub(x_A) \\ -lb(x_A)
    \end{array}\right)
    &=
    \left(\begin{array}{c}
      \infty \\ \infty
    \end{array}\right)
    \\
    \left(\begin{array}{c}
      ub(x_B) \\ -lb(x_B)
    \end{array}\right)
    &=
    \max\left(
    \left(\begin{array}{c}
      1 \\ -1
    \end{array}\right),
    \left[\begin{array}{cc}
      1 & 0 \\
      0 & 1
    \end{array}\right]
    \left(\begin{array}{c}
      \min(ub(x_B), 99) \\
      -lb(x_B)
    \end{array}\right)
    +
    \left(\begin{array}{c}
      1 \\
      -1
    \end{array}\right)
    \right)
    \\
    \left(\begin{array}{c}
      ub(x_C) \\
      -lb(x_C)
    \end{array}\right)
    &=
    \left(\begin{array}{c}
      ub(x_B) \\
      \min(-lb(x_B), 99)
    \end{array}\right)
  \end{align*}
\end{example}

\section{Min-cost flow}

The matrix formulation is sufficient to calculate bounds on individual
program variables, as in the interval domain, but in doing so there is
a significant loss of precision. In order to reclaim some of this
precision we use a relational domain, such as zones.

We use a ``template constraint matrix'' to specify interesting values
for our analysis. A template constraint matrix is an $m \times n$
matrix, where $n$ is the number of program variables and $m$ is the
number of interesting abstract values.

\begin{example}
  If we have two program variables $x_1$ and $x_2$ then we can analyse
  for values $x_1$, $-x_1$, $x_2$, $-x_2$, $(x_1 - x_2)$ and $(x_2 -
  x_1)$. If we are only interested in the values of $x_1$, $x_2$ and
  $(x_2 - x_1)$ then we set our template constraint matrix to be
  \begin{align*}
    T &= \left[\begin{array}{cc}
        1 & 0 \\
        0 & 1 \\
        -1 & 1
      \end{array}\right]
  \end{align*}
\end{example}

This template constraint matrix also specifies the topology of a
graph. By considering each row as an edge, and each column as a node,
we can produce a directed graph $G$, such that for each row an edge
exists between the column containing a $1$ and the column containing a
$-1$ in that row. In order to guarantee that all rows exist in $G$ it
is also necessary to add an extra ``dummy node'' such that any
unspecified edges are adjacent to the dummy node.

\begin{example}
  For the template constraint matrix
  \begin{align*}
    T &= \left[\begin{array}{cc}
        1 & 0 \\
        0 & 1 \\
        -1 & 1
      \end{array}\right]
  \end{align*}

  \noindent we construct a directed graph

  \begin{center}
    \begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node
        distance=2.5cm]
      \node (x1)[draw,circle] {$x1$};
      \node (x2)[draw,circle,below of=x1] {$x2$};
      \node (dummy)[draw,circle,below right of=x1] {dummy};

      \path
      (x1) edge[->,bend left] node {} (dummy)
      (x2) edge[->,bend left] node {} (x1)
      (x2) edge[->,bend left] node {} (dummy);
    \end{tikzpicture}.
  \end{center}
\end{example}

Once we have determined the topology of this graph, we then perform
abstract interpretation by assigning the nodes ``supplies'' and the
edges ``costs''. The supplies correspond to a statement's semantics,
capturing the combination of variables which are the result of a given
assignment, while the costs consider the incoming values of variables
into the statement.

We use the min-cost flow problem on this graph as an operator in our
equation systems. We fix the topology and the costs in the graph and
then consider it as a function of its edge costs, which are the
abstract values entering this node. The validity of this approach is
considered by Gawlitza et al.~\cite{EasyChair:117}, where it is shown
that the min-cost flow problem, when considered as a function of its
edge weights is monotonic and quasi-expansive, thereby making it a
valid operator in our equation systems.