diff options
| author | Carlo Zancanaro <carlo@pc-4w14-0.cs.usyd.edu.au> | 2012-10-15 17:10:06 +1100 | 
|---|---|---|
| committer | Carlo Zancanaro <carlo@pc-4w14-0.cs.usyd.edu.au> | 2012-10-15 17:10:06 +1100 | 
| commit | be1de4be954c80875ad4108e0a33e8e131b2f2c0 (patch) | |
| tree | 1fbbecf276bf7c7bdcbb4dd446099d6d90eaa516 /clang/www/analyzer/checker_dev_manual.html | |
| parent | c4626a62754862d20b41e8a46a3574264ea80e6d (diff) | |
| parent | f1bd2e48c5324d3f7cda4090c87f8a5b6f463ce2 (diff) | |
Merge branch 'master' of ssh://bitbucket.org/czan/honours
Diffstat (limited to 'clang/www/analyzer/checker_dev_manual.html')
| -rw-r--r-- | clang/www/analyzer/checker_dev_manual.html | 346 | 
1 files changed, 346 insertions, 0 deletions
| diff --git a/clang/www/analyzer/checker_dev_manual.html b/clang/www/analyzer/checker_dev_manual.html new file mode 100644 index 0000000..fc9adf3 --- /dev/null +++ b/clang/www/analyzer/checker_dev_manual.html @@ -0,0 +1,346 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" +          "http://www.w3.org/TR/html4/strict.dtd"> +<html> +<head> +  <title>Checker Developer Manual</title> +  <link type="text/css" rel="stylesheet" href="menu.css"> +  <link type="text/css" rel="stylesheet" href="content.css"> +  <script type="text/javascript" src="scripts/menu.js"></script> +</head> +<body> + +<div id="page"> +<!--#include virtual="menu.html.incl"--> + +<div id="content"> + +<h1 style="color:red">This Page Is Under Construction</h1> + +<h1>Checker Developer Manual</h1> + +<p>The static analyzer engine performs symbolic execution of the program and  +relies on a set of checkers to implement the logic for detecting and  +constructing bug reports. This page provides hints and guidelines for anyone  +who is interested in implementing their own checker. The static analyzer is a  +part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a>  +and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> +for general developer guidelines and information. </p> + +    <ul> +      <li><a href="#start">Getting Started</a></li> +      <li><a href="#analyzer">Analyzer Overview</a></li> +      <li><a href="#idea">Idea for a Checker</a></li> +      <li><a href="#registration">Checker Registration</a></li> +      <li><a href="#skeleton">Checker Skeleton</a></li> +      <li><a href="#node">Exploded Node</a></li> +      <li><a href="#bugs">Bug Reports</a></li> +      <li><a href="#ast">AST Visitors</a></li> +      <li><a href="#testing">Testing</a></li> +      <li><a href="#commands">Useful Commands</a></li> +    </ul> + +<h2 id=start>Getting Started</h2> +  <ul> +    <li>To check out the source code and build the project, follow steps 1-4 of  +    the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>  +  page.</li> + +    <li>The analyzer source code is located under the Clang source tree: +    <br><tt> +    $ <b>cd llvm/tools/clang</b> +    </tt> +    <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>, +     <tt>test/Analysis</tt>.</li> + +    <li>The analyzer regression tests can be executed from the Clang's build  +    directory: +    <br><tt> +    $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b> +    </tt></li> +     +    <li>Analyze a file with the specified checker: +    <br><tt> +    $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b> +    </tt></li> + +    <li>List the available checkers: +    <br><tt> +    $ <b>clang -cc1 -analyzer-checker-help</b> +    </tt></li> + +    <li>See the analyzer help for different output formats, fine tuning, and  +    debug options: +    <br><tt> +    $ <b>clang -cc1 -help | grep "analyzer"</b> +    </tt></li> + +  </ul> +  +<h2 id=analyzer>Static Analyzer Overview</h2> +  The analyzer core performs symbolic execution of the given program. All the  +  input values are represented with symbolic values; further, the engine deduces  +  the values of all the expressions in the program based on the input symbols   +  and the path. The execution is path sensitive and every possible path through  +  the program is explored. The explored execution traces are represented with  +  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplidedGraph</a> object.  +  Each node of the graph is  +  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,  +  which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>. +  <p> +  <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>  +  represents the corresponding location in the program (or the CFG graph).  +  <tt>ProgramPoint</tt> is also used to record additional information on  +  when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>  +  kind means that the state is the result of purging dead symbols - the  +  analyzer's equivalent of garbage collection.  +  <p> +  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>  +  represents abstract state of the program. It consists of: +  <ul> +    <li><tt>Environment</tt> - a mapping from source code expressions to symbolic  +    values +    <li><tt>Store</tt> - a mapping from memory locations to symbolic values +    <li><tt>GenericDataMap</tt> - constraints on symbolic values +  </ul> +   +  <h3>Interaction with Checkers</h3> +  Checkers are not merely passive receivers of the analyzer core changes - they  +  actively participate in the <tt>ProgramState</tt> construction through the +  <tt>GenericDataMap</tt> which can be used to store the checker-defined part  +  of the state. Each time the analyzer engine explores a new statement, it  +  notifies each checker registered to listen for that statement, giving it an  +  opportunity to either report a bug or modify the state. (As a rule of thumb,  +  the checker itself should be stateless.) The checkers are called one after another  +  in the predefined order; thus, calling all the checkers adds a chain to the  +  <tt>ExplodedGraph</tt>.  +   +  <h3>Representing Values</h3> +  During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>  +  objects are used to represent the semantic evaluation of expressions. They can  +  represent things like concrete integers, symbolic values, or memory locations  +  (which are memory regions). They are a discriminated union of "values",  +  symbolic and otherwise. +  <p> +  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)  +  is meant to represent abstract, but named, symbolic value.  +  Symbolic values can have constraints associated with them. Symbols represent  +  an actual (immutable) value. We might not know what its specific value is, but  +  we can associate constraints with that value as we analyze a path.  +  <p> +   +  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.   +  It is used to provide a lexicon of how to describe abstract memory. Regions can  +  layer on top of other regions, providing a layered approach to representing memory.  +  For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,  +  but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could  +  be used to represent the memory associated with a specific field of that object. +  So how do we represent symbolic memory regions? That's what <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>  +  is for.  It is a <tt>MemRegion</tt> that has an associated symbol. Since the  +  symbol is unique and has a unique name; that symbol names the region. +  <p> +  Let's see how the analyzer processes the expressions in the following example: +  <p> +  <pre class="code_example"> +  int foo(int x) { +     int y = x * 2; +     int z = x; +     ... +  } +  </pre> +  <p> +Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,  +we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in  +this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.  +Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,  +which references the value <b>currently bound</b> to <tt>x</tt>. That value is  +symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.  +Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,  +and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When  +we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,  +and create a new <tt>SVal</tt> that represents their multiplication (which in  +this case is a new symbolic expression, which we might call <tt>$1</tt>). When we  +evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),  +and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)  +to the <tt>MemRegion</tt> in the symbolic store. +<br> +The second line is similar. When we evaluate <tt>x</tt> again, we do the same  +dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>  +might reference the same underlying values. + +<p> +To summarize, MemRegions are unique names for blocks of memory. Symbols are  +unique names for abstract symbolic values. Some MemRegions represents abstract  +symbolic chunks of memory, and thus are also based on symbols. SVals are just  +references to values, and can reference either MemRegions, Symbols, or concrete  +values (e.g., the number 1). + +  <!--  +  TODO: Add a picture. +  <br> +  Symbols<br> +  FunctionalObjects are used throughout.   +  --> +<h2 id=idea>Idea for a Checker</h2> +  Here are several questions which you should consider when evaluating your  +  checker idea: +  <ul> +    <li>Can the check be effectively implemented without path-sensitive  +    analysis? See <a href="#ast">AST Visitors</a>.</li> +     +    <li>How high the false positive rate is going to be? Looking at the occurrences  +    of the issue you want to write a checker for in the existing code bases might  +    give you some ideas. </li> +     +    <li>How the current limitations of the analysis will effect the false alarm  +    rate? Currently, the analyzer only reasons about one procedure at a time (no  +    inter-procedural analysis). Also, it uses a simple range tracking based  +    solver to model symbolic execution.</li> +     +    <li>Consult the <a +    href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a>  +    to get some ideas for new checkers and consider starting with improving/fixing   +    bugs in the existing checkers.</li> +  </ul> + +<h2 id=registration>Checker Registration</h2> +  All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt>  +  folder. Follow the steps below to register a new checker with the analyzer. +<ol> +  <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt> +<pre class="code_example"> +using namespace clang; +using namespace ento; + +namespace { +class NewChecker: public Checker< check::PreStmt<CallExpr> > { +public: +  void checkPreStmt(const CallExpr *CE, CheckerContext &Ctx) const {} +} +} +void ento::registerNewChecker(CheckerManager &mgr) { +  mgr.registerChecker<NewChecker>(); +} +</pre> + +<li>Pick the package name for your checker and add the registration code to  +<tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should  +first be developed as experimental. Suppose our new checker performs security  +related checks, then we should add the following lines under  +<tt>SecurityExperimental</tt> package:  +<pre class="code_example"> +let ParentPackage = SecurityExperimental in { +... +def NewChecker : Checker<"NewChecker">, +  HelpText<"This text should give a short description of the checks performed.">, +  DescFile<"NewChecker.cpp">; +... +} // end "security.experimental" +</pre> + +<li>Make the source code file visible to CMake by adding it to  +<tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>. + +<li>Compile and see your checker in the list of available checkers by running:<br> +<tt><b>$clang -cc1 -analyzer-checker-help</b></tt> +</ol> +    + +<h2 id=skeleton>Checker Skeleton</h2> +  There are two main decisions you need to make: +  <ul> +    <li> Which events the checker should be tracking.  +    See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a>  +    for the list of available checker callbacks.</li> +    <li> What data you want to store as part of the checker-specific program  +    state. Try to minimize the checker state as much as possible. </li> +  </ul> + +<h2 id=bugs>Bug Reports</h2> + +<h2 id=ast>AST Visitors</h2> +  Some checks might not require path-sensitivity to be effective. Simple AST walk  +  might be sufficient. If that is the case, consider implementing a Clang  +  compiler warning. On the other hand, a check might not be acceptable as a compiler  +  warning; for example, because of a relatively high false positive rate. In this  +  situation, AST callbacks <tt><b>checkASTDecl</b></tt> and  +  <tt><b>checkASTCodeBody</b></tt> are your best friends.  + +<h2 id=testing>Testing</h2> +  Every patch should be well tested with Clang regression tests. The checker tests  +  live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,  +  execute the following from the <tt>clang</tt> build directory: +    <pre class="code"> +    $ <b>TESTDIRS=Analysis make test</b> +    </pre> + +<h2 id=commands>Useful Commands/Debugging Hints</h2> +<ul> +<li> +While investigating a checker-related issue, instruct the analyzer to only  +execute a single checker: +<br><tt> +$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b> +</tt> +</li> +<li> +To dump AST: +<br><tt> +$ <b>clang -cc1 -ast-dump test.c</b> +</tt> +</li> +<li> +To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers: +<br><tt> +$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b> +</tt>  +</li> +<li> +To see all available debug checkers: +<br><tt> +$ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b> +</tt> +</li> +<li> +To see which function is failing while processing a large file use  +<tt>-analyzer-display-progress</tt> option. +</li> +<li> +While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt>  +instead of <tt>clang --analyze</tt>, as the later would call the compiler  +in a separate process. +</li> +<li> +To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while  +debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and  +execute: +<br><tt>  +(gdb) <b>p ViewGraph(0)</b> +</tt> +</li> +<li> +To see the <tt>ProgramState</tt> while debugging use the following command.  +<br><tt> +(gdb) <b>p State->dump()</b> +</tt>  +</li> +<li> +To see <tt>clang::Expr</tt> while debugging use the following command. If you  +pass in a SourceManager object, it will also dump the corresponding line in the  +source code. +<br><tt> +(gdb) <b>p E->dump()</b> +</tt>  +</li> +<li> +To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to: +<br><tt> +(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b> +(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b> +</tt> +</li> +</ul> + +</div> +</div> +</body> +</html> | 
