Add the clang library to the repo (with some of my changes, too).

author: Zancanaro; Carlo <czan8762@plang3.cs.usyd.edu.au> 2012-09-24 09:58:17 +1000
committer: Zancanaro; Carlo <czan8762@plang3.cs.usyd.edu.au> 2012-09-24 09:58:17 +1000
commit: 222e2a7620e6520ffaf4fc4e69d79c18da31542e (patch)
tree: 7bfbc05bfa3b41c8f9d2e56d53a0bc3e310df239 /clang/www/analyzer/checker_dev_manual.html
parent: 3d206f03985b50beacae843d880bccdc91a9f424 (diff)
1 files changed, 346 insertions, 0 deletions
diff --git a/clang/www/analyzer/checker_dev_manual.html b/clang/www/analyzer/checker_dev_manual.html
new file mode 100644
index 0000000..fc9adf3
--- /dev/null
+++ b/clang/www/analyzer/checker_dev_manual.html
@@ -0,0 +1,346 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
+          "http://www.w3.org/TR/html4/strict.dtd">
+<html>
+<head>
+  <title>Checker Developer Manual</title>
+  <link type="text/css" rel="stylesheet" href="menu.css">
+  <link type="text/css" rel="stylesheet" href="content.css">
+  <script type="text/javascript" src="scripts/menu.js"></script>
+</head>
+<body>
+
+<div id="page">
+<!--#include virtual="menu.html.incl"-->
+
+<div id="content">
+
+<h1 style="color:red">This Page Is Under Construction</h1>
+
+<h1>Checker Developer Manual</h1>
+
+<p>The static analyzer engine performs symbolic execution of the program and 
+relies on a set of checkers to implement the logic for detecting and 
+constructing bug reports. This page provides hints and guidelines for anyone 
+who is interested in implementing their own checker. The static analyzer is a 
+part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> 
+and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
+for general developer guidelines and information. </p>
+
+    <ul>
+      <li><a href="#start">Getting Started</a></li>
+      <li><a href="#analyzer">Analyzer Overview</a></li>
+      <li><a href="#idea">Idea for a Checker</a></li>
+      <li><a href="#registration">Checker Registration</a></li>
+      <li><a href="#skeleton">Checker Skeleton</a></li>
+      <li><a href="#node">Exploded Node</a></li>
+      <li><a href="#bugs">Bug Reports</a></li>
+      <li><a href="#ast">AST Visitors</a></li>
+      <li><a href="#testing">Testing</a></li>
+      <li><a href="#commands">Useful Commands</a></li>
+    </ul>
+
+<h2 id=start>Getting Started</h2>
+  <ul>
+    <li>To check out the source code and build the project, follow steps 1-4 of 
+    the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> 
+  page.</li>
+
+    <li>The analyzer source code is located under the Clang source tree:
+    <br><tt>
+    $ <b>cd llvm/tools/clang</b>
+    </tt>
+    <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
+     <tt>test/Analysis</tt>.</li>
+
+    <li>The analyzer regression tests can be executed from the Clang's build 
+    directory:
+    <br><tt>
+    $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
+    </tt></li>
+    
+    <li>Analyze a file with the specified checker:
+    <br><tt>
+    $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
+    </tt></li>
+
+    <li>List the available checkers:
+    <br><tt>
+    $ <b>clang -cc1 -analyzer-checker-help</b>
+    </tt></li>
+
+    <li>See the analyzer help for different output formats, fine tuning, and 
+    debug options:
+    <br><tt>
+    $ <b>clang -cc1 -help | grep "analyzer"</b>
+    </tt></li>
+
+  </ul>
+ 
+<h2 id=analyzer>Static Analyzer Overview</h2>
+  The analyzer core performs symbolic execution of the given program. All the 
+  input values are represented with symbolic values; further, the engine deduces 
+  the values of all the expressions in the program based on the input symbols  
+  and the path. The execution is path sensitive and every possible path through 
+  the program is explored. The explored execution traces are represented with 
+  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplidedGraph</a> object. 
+  Each node of the graph is 
+  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, 
+  which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
+  <p>
+  <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> 
+  represents the corresponding location in the program (or the CFG graph). 
+  <tt>ProgramPoint</tt> is also used to record additional information on 
+  when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> 
+  kind means that the state is the result of purging dead symbols - the 
+  analyzer's equivalent of garbage collection. 
+  <p>
+  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> 
+  represents abstract state of the program. It consists of:
+  <ul>
+    <li><tt>Environment</tt> - a mapping from source code expressions to symbolic 
+    values
+    <li><tt>Store</tt> - a mapping from memory locations to symbolic values
+    <li><tt>GenericDataMap</tt> - constraints on symbolic values
+  </ul>
+  
+  <h3>Interaction with Checkers</h3>
+  Checkers are not merely passive receivers of the analyzer core changes - they 
+  actively participate in the <tt>ProgramState</tt> construction through the
+  <tt>GenericDataMap</tt> which can be used to store the checker-defined part 
+  of the state. Each time the analyzer engine explores a new statement, it 
+  notifies each checker registered to listen for that statement, giving it an 
+  opportunity to either report a bug or modify the state. (As a rule of thumb, 
+  the checker itself should be stateless.) The checkers are called one after another 
+  in the predefined order; thus, calling all the checkers adds a chain to the 
+  <tt>ExplodedGraph</tt>. 
+  
+  <h3>Representing Values</h3>
+  During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> 
+  objects are used to represent the semantic evaluation of expressions. They can 
+  represent things like concrete integers, symbolic values, or memory locations 
+  (which are memory regions). They are a discriminated union of "values", 
+  symbolic and otherwise.
+  <p>
+  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) 
+  is meant to represent abstract, but named, symbolic value. 
+  Symbolic values can have constraints associated with them. Symbols represent 
+  an actual (immutable) value. We might not know what its specific value is, but 
+  we can associate constraints with that value as we analyze a path. 
+  <p>
+  
+  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.  
+  It is used to provide a lexicon of how to describe abstract memory. Regions can 
+  layer on top of other regions, providing a layered approach to representing memory. 
+  For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, 
+  but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could 
+  be used to represent the memory associated with a specific field of that object.
+  So how do we represent symbolic memory regions? That's what <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> 
+  is for.  It is a <tt>MemRegion</tt> that has an associated symbol. Since the 
+  symbol is unique and has a unique name; that symbol names the region.
+  <p>
+  Let's see how the analyzer processes the expressions in the following example:
+  <p>
+  <pre class="code_example">
+  int foo(int x) {
+     int y = x * 2;
+     int z = x;
+     ...
+  }
+  </pre>
+  <p>
+Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, 
+we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in 
+this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. 
+Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, 
+which references the value <b>currently bound</b> to <tt>x</tt>. That value is 
+symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. 
+Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, 
+and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When 
+we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, 
+and create a new <tt>SVal</tt> that represents their multiplication (which in 
+this case is a new symbolic expression, which we might call <tt>$1</tt>). When we 
+evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), 
+and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) 
+to the <tt>MemRegion</tt> in the symbolic store.
+<br>
+The second line is similar. When we evaluate <tt>x</tt> again, we do the same 
+dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> 
+might reference the same underlying values.
+
+<p>
+To summarize, MemRegions are unique names for blocks of memory. Symbols are 
+unique names for abstract symbolic values. Some MemRegions represents abstract 
+symbolic chunks of memory, and thus are also based on symbols. SVals are just 
+references to values, and can reference either MemRegions, Symbols, or concrete 
+values (e.g., the number 1).
+
+  <!-- 
+  TODO: Add a picture.
+  <br>
+  Symbols<br>
+  FunctionalObjects are used throughout.  
+  -->
+<h2 id=idea>Idea for a Checker</h2>
+  Here are several questions which you should consider when evaluating your 
+  checker idea:
+  <ul>
+    <li>Can the check be effectively implemented without path-sensitive 
+    analysis? See <a href="#ast">AST Visitors</a>.</li>
+    
+    <li>How high the false positive rate is going to be? Looking at the occurrences 
+    of the issue you want to write a checker for in the existing code bases might 
+    give you some ideas. </li>
+    
+    <li>How the current limitations of the analysis will effect the false alarm 
+    rate? Currently, the analyzer only reasons about one procedure at a time (no 
+    inter-procedural analysis). Also, it uses a simple range tracking based 
+    solver to model symbolic execution.</li>
+    
+    <li>Consult the <a
+    href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&amp;bug_status=NEW&amp;bug_status=REOPENED&amp;version=trunk&amp;component=Static%20Analyzer&amp;product=clang">Bugzilla database</a> 
+    to get some ideas for new checkers and consider starting with improving/fixing  
+    bugs in the existing checkers.</li>
+  </ul>
+
+<h2 id=registration>Checker Registration</h2>
+  All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt> 
+  folder. Follow the steps below to register a new checker with the analyzer.
+<ol>
+  <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt>
+<pre class="code_example">
+using namespace clang;
+using namespace ento;
+
+namespace {
+class NewChecker: public Checker< check::PreStmt&lt;CallExpr> > {
+public:
+  void checkPreStmt(const CallExpr *CE, CheckerContext &amp;Ctx) const {}
+}
+}
+void ento::registerNewChecker(CheckerManager &amp;mgr) {
+  mgr.registerChecker&lt;NewChecker>();
+}
+</pre>
+
+<li>Pick the package name for your checker and add the registration code to 
+<tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should 
+first be developed as experimental. Suppose our new checker performs security 
+related checks, then we should add the following lines under 
+<tt>SecurityExperimental</tt> package: 
+<pre class="code_example">
+let ParentPackage = SecurityExperimental in {
+...
+def NewChecker : Checker<"NewChecker">,
+  HelpText<"This text should give a short description of the checks performed.">,
+  DescFile<"NewChecker.cpp">;
+...
+} // end "security.experimental"
+</pre>
+
+<li>Make the source code file visible to CMake by adding it to 
+<tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
+
+<li>Compile and see your checker in the list of available checkers by running:<br>
+<tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
+</ol>
+   
+
+<h2 id=skeleton>Checker Skeleton</h2>
+  There are two main decisions you need to make:
+  <ul>
+    <li> Which events the checker should be tracking. 
+    See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a> 
+    for the list of available checker callbacks.</li>
+    <li> What data you want to store as part of the checker-specific program 
+    state. Try to minimize the checker state as much as possible. </li>
+  </ul>
+
+<h2 id=bugs>Bug Reports</h2>
+
+<h2 id=ast>AST Visitors</h2>
+  Some checks might not require path-sensitivity to be effective. Simple AST walk 
+  might be sufficient. If that is the case, consider implementing a Clang 
+  compiler warning. On the other hand, a check might not be acceptable as a compiler 
+  warning; for example, because of a relatively high false positive rate. In this 
+  situation, AST callbacks <tt><b>checkASTDecl</b></tt> and 
+  <tt><b>checkASTCodeBody</b></tt> are your best friends. 
+
+<h2 id=testing>Testing</h2>
+  Every patch should be well tested with Clang regression tests. The checker tests 
+  live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, 
+  execute the following from the <tt>clang</tt> build directory:
+    <pre class="code">
+    $ <b>TESTDIRS=Analysis make test</b>
+    </pre>
+
+<h2 id=commands>Useful Commands/Debugging Hints</h2>
+<ul>
+<li>
+While investigating a checker-related issue, instruct the analyzer to only 
+execute a single checker:
+<br><tt>
+$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
+</tt>
+</li>
+<li>
+To dump AST:
+<br><tt>
+$ <b>clang -cc1 -ast-dump test.c</b>
+</tt>
+</li>
+<li>
+To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
+<br><tt>
+$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
+</tt> 
+</li>
+<li>
+To see all available debug checkers:
+<br><tt>
+$ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
+</tt>
+</li>
+<li>
+To see which function is failing while processing a large file use 
+<tt>-analyzer-display-progress</tt> option.
+</li>
+<li>
+While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt> 
+instead of <tt>clang --analyze</tt>, as the later would call the compiler 
+in a separate process.
+</li>
+<li>
+To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while 
+debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and 
+execute:
+<br><tt> 
+(gdb) <b>p ViewGraph(0)</b>
+</tt>
+</li>
+<li>
+To see the <tt>ProgramState</tt> while debugging use the following command. 
+<br><tt>
+(gdb) <b>p State->dump()</b>
+</tt> 
+</li>
+<li>
+To see <tt>clang::Expr</tt> while debugging use the following command. If you 
+pass in a SourceManager object, it will also dump the corresponding line in the 
+source code.
+<br><tt>
+(gdb) <b>p E->dump()</b>
+</tt> 
+</li>
+<li>
+To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
+<br><tt>
+(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
+(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
+</tt>
+</li>
+</ul>
+
+</div>
+</div>
+</body>
+</html>
author	Zancanaro; Carlo <czan8762@plang3.cs.usyd.edu.au>	2012-09-24 09:58:17 +1000
committer	Zancanaro; Carlo <czan8762@plang3.cs.usyd.edu.au>	2012-09-24 09:58:17 +1000
commit	222e2a7620e6520ffaf4fc4e69d79c18da31542e (patch)
tree	7bfbc05bfa3b41c8f9d2e56d53a0bc3e310df239 /clang/www/analyzer/checker_dev_manual.html
parent	3d206f03985b50beacae843d880bccdc91a9f424 (diff)