summaryrefslogtreecommitdiff
path: root/clang/www/analyzer/checker_dev_manual.html
diff options
context:
space:
mode:
Diffstat (limited to 'clang/www/analyzer/checker_dev_manual.html')
-rw-r--r--clang/www/analyzer/checker_dev_manual.html346
1 files changed, 346 insertions, 0 deletions
diff --git a/clang/www/analyzer/checker_dev_manual.html b/clang/www/analyzer/checker_dev_manual.html
new file mode 100644
index 0000000..fc9adf3
--- /dev/null
+++ b/clang/www/analyzer/checker_dev_manual.html
@@ -0,0 +1,346 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
+ "http://www.w3.org/TR/html4/strict.dtd">
+<html>
+<head>
+ <title>Checker Developer Manual</title>
+ <link type="text/css" rel="stylesheet" href="menu.css">
+ <link type="text/css" rel="stylesheet" href="content.css">
+ <script type="text/javascript" src="scripts/menu.js"></script>
+</head>
+<body>
+
+<div id="page">
+<!--#include virtual="menu.html.incl"-->
+
+<div id="content">
+
+<h1 style="color:red">This Page Is Under Construction</h1>
+
+<h1>Checker Developer Manual</h1>
+
+<p>The static analyzer engine performs symbolic execution of the program and
+relies on a set of checkers to implement the logic for detecting and
+constructing bug reports. This page provides hints and guidelines for anyone
+who is interested in implementing their own checker. The static analyzer is a
+part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a>
+and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
+for general developer guidelines and information. </p>
+
+ <ul>
+ <li><a href="#start">Getting Started</a></li>
+ <li><a href="#analyzer">Analyzer Overview</a></li>
+ <li><a href="#idea">Idea for a Checker</a></li>
+ <li><a href="#registration">Checker Registration</a></li>
+ <li><a href="#skeleton">Checker Skeleton</a></li>
+ <li><a href="#node">Exploded Node</a></li>
+ <li><a href="#bugs">Bug Reports</a></li>
+ <li><a href="#ast">AST Visitors</a></li>
+ <li><a href="#testing">Testing</a></li>
+ <li><a href="#commands">Useful Commands</a></li>
+ </ul>
+
+<h2 id=start>Getting Started</h2>
+ <ul>
+ <li>To check out the source code and build the project, follow steps 1-4 of
+ the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
+ page.</li>
+
+ <li>The analyzer source code is located under the Clang source tree:
+ <br><tt>
+ $ <b>cd llvm/tools/clang</b>
+ </tt>
+ <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
+ <tt>test/Analysis</tt>.</li>
+
+ <li>The analyzer regression tests can be executed from the Clang's build
+ directory:
+ <br><tt>
+ $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
+ </tt></li>
+
+ <li>Analyze a file with the specified checker:
+ <br><tt>
+ $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
+ </tt></li>
+
+ <li>List the available checkers:
+ <br><tt>
+ $ <b>clang -cc1 -analyzer-checker-help</b>
+ </tt></li>
+
+ <li>See the analyzer help for different output formats, fine tuning, and
+ debug options:
+ <br><tt>
+ $ <b>clang -cc1 -help | grep "analyzer"</b>
+ </tt></li>
+
+ </ul>
+
+<h2 id=analyzer>Static Analyzer Overview</h2>
+ The analyzer core performs symbolic execution of the given program. All the
+ input values are represented with symbolic values; further, the engine deduces
+ the values of all the expressions in the program based on the input symbols
+ and the path. The execution is path sensitive and every possible path through
+ the program is explored. The explored execution traces are represented with
+ <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplidedGraph</a> object.
+ Each node of the graph is
+ <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
+ which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
+ <p>
+ <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
+ represents the corresponding location in the program (or the CFG graph).
+ <tt>ProgramPoint</tt> is also used to record additional information on
+ when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
+ kind means that the state is the result of purging dead symbols - the
+ analyzer's equivalent of garbage collection.
+ <p>
+ <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
+ represents abstract state of the program. It consists of:
+ <ul>
+ <li><tt>Environment</tt> - a mapping from source code expressions to symbolic
+ values
+ <li><tt>Store</tt> - a mapping from memory locations to symbolic values
+ <li><tt>GenericDataMap</tt> - constraints on symbolic values
+ </ul>
+
+ <h3>Interaction with Checkers</h3>
+ Checkers are not merely passive receivers of the analyzer core changes - they
+ actively participate in the <tt>ProgramState</tt> construction through the
+ <tt>GenericDataMap</tt> which can be used to store the checker-defined part
+ of the state. Each time the analyzer engine explores a new statement, it
+ notifies each checker registered to listen for that statement, giving it an
+ opportunity to either report a bug or modify the state. (As a rule of thumb,
+ the checker itself should be stateless.) The checkers are called one after another
+ in the predefined order; thus, calling all the checkers adds a chain to the
+ <tt>ExplodedGraph</tt>.
+
+ <h3>Representing Values</h3>
+ During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
+ objects are used to represent the semantic evaluation of expressions. They can
+ represent things like concrete integers, symbolic values, or memory locations
+ (which are memory regions). They are a discriminated union of "values",
+ symbolic and otherwise.
+ <p>
+ <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
+ is meant to represent abstract, but named, symbolic value.
+ Symbolic values can have constraints associated with them. Symbols represent
+ an actual (immutable) value. We might not know what its specific value is, but
+ we can associate constraints with that value as we analyze a path.
+ <p>
+
+ <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
+ It is used to provide a lexicon of how to describe abstract memory. Regions can
+ layer on top of other regions, providing a layered approach to representing memory.
+ For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
+ but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
+ be used to represent the memory associated with a specific field of that object.
+ So how do we represent symbolic memory regions? That's what <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
+ is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the
+ symbol is unique and has a unique name; that symbol names the region.
+ <p>
+ Let's see how the analyzer processes the expressions in the following example:
+ <p>
+ <pre class="code_example">
+ int foo(int x) {
+ int y = x * 2;
+ int z = x;
+ ...
+ }
+ </pre>
+ <p>
+Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
+we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
+this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
+Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
+which references the value <b>currently bound</b> to <tt>x</tt>. That value is
+symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
+Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
+and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
+we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
+and create a new <tt>SVal</tt> that represents their multiplication (which in
+this case is a new symbolic expression, which we might call <tt>$1</tt>). When we
+evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
+and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
+to the <tt>MemRegion</tt> in the symbolic store.
+<br>
+The second line is similar. When we evaluate <tt>x</tt> again, we do the same
+dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
+might reference the same underlying values.
+
+<p>
+To summarize, MemRegions are unique names for blocks of memory. Symbols are
+unique names for abstract symbolic values. Some MemRegions represents abstract
+symbolic chunks of memory, and thus are also based on symbols. SVals are just
+references to values, and can reference either MemRegions, Symbols, or concrete
+values (e.g., the number 1).
+
+ <!--
+ TODO: Add a picture.
+ <br>
+ Symbols<br>
+ FunctionalObjects are used throughout.
+ -->
+<h2 id=idea>Idea for a Checker</h2>
+ Here are several questions which you should consider when evaluating your
+ checker idea:
+ <ul>
+ <li>Can the check be effectively implemented without path-sensitive
+ analysis? See <a href="#ast">AST Visitors</a>.</li>
+
+ <li>How high the false positive rate is going to be? Looking at the occurrences
+ of the issue you want to write a checker for in the existing code bases might
+ give you some ideas. </li>
+
+ <li>How the current limitations of the analysis will effect the false alarm
+ rate? Currently, the analyzer only reasons about one procedure at a time (no
+ inter-procedural analysis). Also, it uses a simple range tracking based
+ solver to model symbolic execution.</li>
+
+ <li>Consult the <a
+ href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&amp;bug_status=NEW&amp;bug_status=REOPENED&amp;version=trunk&amp;component=Static%20Analyzer&amp;product=clang">Bugzilla database</a>
+ to get some ideas for new checkers and consider starting with improving/fixing
+ bugs in the existing checkers.</li>
+ </ul>
+
+<h2 id=registration>Checker Registration</h2>
+ All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt>
+ folder. Follow the steps below to register a new checker with the analyzer.
+<ol>
+ <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt>
+<pre class="code_example">
+using namespace clang;
+using namespace ento;
+
+namespace {
+class NewChecker: public Checker< check::PreStmt&lt;CallExpr> > {
+public:
+ void checkPreStmt(const CallExpr *CE, CheckerContext &amp;Ctx) const {}
+}
+}
+void ento::registerNewChecker(CheckerManager &amp;mgr) {
+ mgr.registerChecker&lt;NewChecker>();
+}
+</pre>
+
+<li>Pick the package name for your checker and add the registration code to
+<tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should
+first be developed as experimental. Suppose our new checker performs security
+related checks, then we should add the following lines under
+<tt>SecurityExperimental</tt> package:
+<pre class="code_example">
+let ParentPackage = SecurityExperimental in {
+...
+def NewChecker : Checker<"NewChecker">,
+ HelpText<"This text should give a short description of the checks performed.">,
+ DescFile<"NewChecker.cpp">;
+...
+} // end "security.experimental"
+</pre>
+
+<li>Make the source code file visible to CMake by adding it to
+<tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
+
+<li>Compile and see your checker in the list of available checkers by running:<br>
+<tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
+</ol>
+
+
+<h2 id=skeleton>Checker Skeleton</h2>
+ There are two main decisions you need to make:
+ <ul>
+ <li> Which events the checker should be tracking.
+ See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a>
+ for the list of available checker callbacks.</li>
+ <li> What data you want to store as part of the checker-specific program
+ state. Try to minimize the checker state as much as possible. </li>
+ </ul>
+
+<h2 id=bugs>Bug Reports</h2>
+
+<h2 id=ast>AST Visitors</h2>
+ Some checks might not require path-sensitivity to be effective. Simple AST walk
+ might be sufficient. If that is the case, consider implementing a Clang
+ compiler warning. On the other hand, a check might not be acceptable as a compiler
+ warning; for example, because of a relatively high false positive rate. In this
+ situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
+ <tt><b>checkASTCodeBody</b></tt> are your best friends.
+
+<h2 id=testing>Testing</h2>
+ Every patch should be well tested with Clang regression tests. The checker tests
+ live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
+ execute the following from the <tt>clang</tt> build directory:
+ <pre class="code">
+ $ <b>TESTDIRS=Analysis make test</b>
+ </pre>
+
+<h2 id=commands>Useful Commands/Debugging Hints</h2>
+<ul>
+<li>
+While investigating a checker-related issue, instruct the analyzer to only
+execute a single checker:
+<br><tt>
+$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
+</tt>
+</li>
+<li>
+To dump AST:
+<br><tt>
+$ <b>clang -cc1 -ast-dump test.c</b>
+</tt>
+</li>
+<li>
+To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
+<br><tt>
+$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
+</tt>
+</li>
+<li>
+To see all available debug checkers:
+<br><tt>
+$ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
+</tt>
+</li>
+<li>
+To see which function is failing while processing a large file use
+<tt>-analyzer-display-progress</tt> option.
+</li>
+<li>
+While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt>
+instead of <tt>clang --analyze</tt>, as the later would call the compiler
+in a separate process.
+</li>
+<li>
+To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while
+debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and
+execute:
+<br><tt>
+(gdb) <b>p ViewGraph(0)</b>
+</tt>
+</li>
+<li>
+To see the <tt>ProgramState</tt> while debugging use the following command.
+<br><tt>
+(gdb) <b>p State->dump()</b>
+</tt>
+</li>
+<li>
+To see <tt>clang::Expr</tt> while debugging use the following command. If you
+pass in a SourceManager object, it will also dump the corresponding line in the
+source code.
+<br><tt>
+(gdb) <b>p E->dump()</b>
+</tt>
+</li>
+<li>
+To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
+<br><tt>
+(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
+(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
+</tt>
+</li>
+</ul>
+
+</div>
+</div>
+</body>
+</html>