diff options
Diffstat (limited to 'clang/www/analyzer/checker_dev_manual.html')
-rw-r--r-- | clang/www/analyzer/checker_dev_manual.html | 346 |
1 files changed, 346 insertions, 0 deletions
diff --git a/clang/www/analyzer/checker_dev_manual.html b/clang/www/analyzer/checker_dev_manual.html new file mode 100644 index 0000000..fc9adf3 --- /dev/null +++ b/clang/www/analyzer/checker_dev_manual.html @@ -0,0 +1,346 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" + "http://www.w3.org/TR/html4/strict.dtd"> +<html> +<head> + <title>Checker Developer Manual</title> + <link type="text/css" rel="stylesheet" href="menu.css"> + <link type="text/css" rel="stylesheet" href="content.css"> + <script type="text/javascript" src="scripts/menu.js"></script> +</head> +<body> + +<div id="page"> +<!--#include virtual="menu.html.incl"--> + +<div id="content"> + +<h1 style="color:red">This Page Is Under Construction</h1> + +<h1>Checker Developer Manual</h1> + +<p>The static analyzer engine performs symbolic execution of the program and +relies on a set of checkers to implement the logic for detecting and +constructing bug reports. This page provides hints and guidelines for anyone +who is interested in implementing their own checker. The static analyzer is a +part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> +and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> +for general developer guidelines and information. </p> + + <ul> + <li><a href="#start">Getting Started</a></li> + <li><a href="#analyzer">Analyzer Overview</a></li> + <li><a href="#idea">Idea for a Checker</a></li> + <li><a href="#registration">Checker Registration</a></li> + <li><a href="#skeleton">Checker Skeleton</a></li> + <li><a href="#node">Exploded Node</a></li> + <li><a href="#bugs">Bug Reports</a></li> + <li><a href="#ast">AST Visitors</a></li> + <li><a href="#testing">Testing</a></li> + <li><a href="#commands">Useful Commands</a></li> + </ul> + +<h2 id=start>Getting Started</h2> + <ul> + <li>To check out the source code and build the project, follow steps 1-4 of + the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> + page.</li> + + <li>The analyzer source code is located under the Clang source tree: + <br><tt> + $ <b>cd llvm/tools/clang</b> + </tt> + <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>, + <tt>test/Analysis</tt>.</li> + + <li>The analyzer regression tests can be executed from the Clang's build + directory: + <br><tt> + $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b> + </tt></li> + + <li>Analyze a file with the specified checker: + <br><tt> + $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b> + </tt></li> + + <li>List the available checkers: + <br><tt> + $ <b>clang -cc1 -analyzer-checker-help</b> + </tt></li> + + <li>See the analyzer help for different output formats, fine tuning, and + debug options: + <br><tt> + $ <b>clang -cc1 -help | grep "analyzer"</b> + </tt></li> + + </ul> + +<h2 id=analyzer>Static Analyzer Overview</h2> + The analyzer core performs symbolic execution of the given program. All the + input values are represented with symbolic values; further, the engine deduces + the values of all the expressions in the program based on the input symbols + and the path. The execution is path sensitive and every possible path through + the program is explored. The explored execution traces are represented with + <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplidedGraph</a> object. + Each node of the graph is + <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, + which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>. + <p> + <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> + represents the corresponding location in the program (or the CFG graph). + <tt>ProgramPoint</tt> is also used to record additional information on + when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> + kind means that the state is the result of purging dead symbols - the + analyzer's equivalent of garbage collection. + <p> + <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> + represents abstract state of the program. It consists of: + <ul> + <li><tt>Environment</tt> - a mapping from source code expressions to symbolic + values + <li><tt>Store</tt> - a mapping from memory locations to symbolic values + <li><tt>GenericDataMap</tt> - constraints on symbolic values + </ul> + + <h3>Interaction with Checkers</h3> + Checkers are not merely passive receivers of the analyzer core changes - they + actively participate in the <tt>ProgramState</tt> construction through the + <tt>GenericDataMap</tt> which can be used to store the checker-defined part + of the state. Each time the analyzer engine explores a new statement, it + notifies each checker registered to listen for that statement, giving it an + opportunity to either report a bug or modify the state. (As a rule of thumb, + the checker itself should be stateless.) The checkers are called one after another + in the predefined order; thus, calling all the checkers adds a chain to the + <tt>ExplodedGraph</tt>. + + <h3>Representing Values</h3> + During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> + objects are used to represent the semantic evaluation of expressions. They can + represent things like concrete integers, symbolic values, or memory locations + (which are memory regions). They are a discriminated union of "values", + symbolic and otherwise. + <p> + <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) + is meant to represent abstract, but named, symbolic value. + Symbolic values can have constraints associated with them. Symbols represent + an actual (immutable) value. We might not know what its specific value is, but + we can associate constraints with that value as we analyze a path. + <p> + + <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol. + It is used to provide a lexicon of how to describe abstract memory. Regions can + layer on top of other regions, providing a layered approach to representing memory. + For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, + but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could + be used to represent the memory associated with a specific field of that object. + So how do we represent symbolic memory regions? That's what <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> + is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the + symbol is unique and has a unique name; that symbol names the region. + <p> + Let's see how the analyzer processes the expressions in the following example: + <p> + <pre class="code_example"> + int foo(int x) { + int y = x * 2; + int z = x; + ... + } + </pre> + <p> +Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, +we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in +this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. +Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, +which references the value <b>currently bound</b> to <tt>x</tt>. That value is +symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. +Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, +and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When +we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, +and create a new <tt>SVal</tt> that represents their multiplication (which in +this case is a new symbolic expression, which we might call <tt>$1</tt>). When we +evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), +and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) +to the <tt>MemRegion</tt> in the symbolic store. +<br> +The second line is similar. When we evaluate <tt>x</tt> again, we do the same +dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> +might reference the same underlying values. + +<p> +To summarize, MemRegions are unique names for blocks of memory. Symbols are +unique names for abstract symbolic values. Some MemRegions represents abstract +symbolic chunks of memory, and thus are also based on symbols. SVals are just +references to values, and can reference either MemRegions, Symbols, or concrete +values (e.g., the number 1). + + <!-- + TODO: Add a picture. + <br> + Symbols<br> + FunctionalObjects are used throughout. + --> +<h2 id=idea>Idea for a Checker</h2> + Here are several questions which you should consider when evaluating your + checker idea: + <ul> + <li>Can the check be effectively implemented without path-sensitive + analysis? See <a href="#ast">AST Visitors</a>.</li> + + <li>How high the false positive rate is going to be? Looking at the occurrences + of the issue you want to write a checker for in the existing code bases might + give you some ideas. </li> + + <li>How the current limitations of the analysis will effect the false alarm + rate? Currently, the analyzer only reasons about one procedure at a time (no + inter-procedural analysis). Also, it uses a simple range tracking based + solver to model symbolic execution.</li> + + <li>Consult the <a + href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> + to get some ideas for new checkers and consider starting with improving/fixing + bugs in the existing checkers.</li> + </ul> + +<h2 id=registration>Checker Registration</h2> + All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt> + folder. Follow the steps below to register a new checker with the analyzer. +<ol> + <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt> +<pre class="code_example"> +using namespace clang; +using namespace ento; + +namespace { +class NewChecker: public Checker< check::PreStmt<CallExpr> > { +public: + void checkPreStmt(const CallExpr *CE, CheckerContext &Ctx) const {} +} +} +void ento::registerNewChecker(CheckerManager &mgr) { + mgr.registerChecker<NewChecker>(); +} +</pre> + +<li>Pick the package name for your checker and add the registration code to +<tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should +first be developed as experimental. Suppose our new checker performs security +related checks, then we should add the following lines under +<tt>SecurityExperimental</tt> package: +<pre class="code_example"> +let ParentPackage = SecurityExperimental in { +... +def NewChecker : Checker<"NewChecker">, + HelpText<"This text should give a short description of the checks performed.">, + DescFile<"NewChecker.cpp">; +... +} // end "security.experimental" +</pre> + +<li>Make the source code file visible to CMake by adding it to +<tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>. + +<li>Compile and see your checker in the list of available checkers by running:<br> +<tt><b>$clang -cc1 -analyzer-checker-help</b></tt> +</ol> + + +<h2 id=skeleton>Checker Skeleton</h2> + There are two main decisions you need to make: + <ul> + <li> Which events the checker should be tracking. + See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a> + for the list of available checker callbacks.</li> + <li> What data you want to store as part of the checker-specific program + state. Try to minimize the checker state as much as possible. </li> + </ul> + +<h2 id=bugs>Bug Reports</h2> + +<h2 id=ast>AST Visitors</h2> + Some checks might not require path-sensitivity to be effective. Simple AST walk + might be sufficient. If that is the case, consider implementing a Clang + compiler warning. On the other hand, a check might not be acceptable as a compiler + warning; for example, because of a relatively high false positive rate. In this + situation, AST callbacks <tt><b>checkASTDecl</b></tt> and + <tt><b>checkASTCodeBody</b></tt> are your best friends. + +<h2 id=testing>Testing</h2> + Every patch should be well tested with Clang regression tests. The checker tests + live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, + execute the following from the <tt>clang</tt> build directory: + <pre class="code"> + $ <b>TESTDIRS=Analysis make test</b> + </pre> + +<h2 id=commands>Useful Commands/Debugging Hints</h2> +<ul> +<li> +While investigating a checker-related issue, instruct the analyzer to only +execute a single checker: +<br><tt> +$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b> +</tt> +</li> +<li> +To dump AST: +<br><tt> +$ <b>clang -cc1 -ast-dump test.c</b> +</tt> +</li> +<li> +To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers: +<br><tt> +$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b> +</tt> +</li> +<li> +To see all available debug checkers: +<br><tt> +$ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b> +</tt> +</li> +<li> +To see which function is failing while processing a large file use +<tt>-analyzer-display-progress</tt> option. +</li> +<li> +While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt> +instead of <tt>clang --analyze</tt>, as the later would call the compiler +in a separate process. +</li> +<li> +To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while +debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and +execute: +<br><tt> +(gdb) <b>p ViewGraph(0)</b> +</tt> +</li> +<li> +To see the <tt>ProgramState</tt> while debugging use the following command. +<br><tt> +(gdb) <b>p State->dump()</b> +</tt> +</li> +<li> +To see <tt>clang::Expr</tt> while debugging use the following command. If you +pass in a SourceManager object, it will also dump the corresponding line in the +source code. +<br><tt> +(gdb) <b>p E->dump()</b> +</tt> +</li> +<li> +To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to: +<br><tt> +(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b> +(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b> +</tt> +</li> +</ul> + +</div> +</div> +</body> +</html> |