+ +

This Page Is Under Construction

+ +

Checker Developer Manual

+ +

The static analyzer engine performs symbolic execution of the program and +relies on a set of checkers to implement the logic for detecting and +constructing bug reports. This page provides hints and guidelines for anyone +who is interested in implementing their own checker. The static analyzer is a +part of the Clang project, so consult Hacking on Clang +and LLVM Programmer's Manual +for general developer guidelines and information.

+ +

Getting Started
Analyzer Overview
Idea for a Checker
Checker Registration
Checker Skeleton
Exploded Node
Bug Reports
AST Visitors
Testing
Useful Commands

+ +

Getting Started

To check out the source code and build the project, follow steps 1-4 of + the Clang Getting Started + page.
The analyzer source code is located under the Clang source tree: +
+ $ cd llvm/tools/clang + +
See: include/clang/StaticAnalyzer, lib/StaticAnalyzer, + test/Analysis.
The analyzer regression tests can be executed from the Clang's build + directory: +
+ $ cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test +
Analyze a file with the specified checker: +
+ $ clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c +
List the available checkers: +
+ $ clang -cc1 -analyzer-checker-help +
See the analyzer help for different output formats, fine tuning, and + debug options: +
+ $ clang -cc1 -help | grep "analyzer" +

+ +

Static Analyzer Overview

+ The analyzer core performs symbolic execution of the given program. All the + input values are represented with symbolic values; further, the engine deduces + the values of all the expressions in the program based on the input symbols + and the path. The execution is path sensitive and every possible path through + the program is explored. The explored execution traces are represented with + ExplidedGraph object. + Each node of the graph is + ExplodedNode, + which consists of a ProgramPoint and a ProgramState. +

+ ProgramPoint + represents the corresponding location in the program (or the CFG graph). + ProgramPoint is also used to record additional information on + when/how the state was added. For example, PostPurgeDeadSymbolsKind + kind means that the state is the result of purging dead symbols - the + analyzer's equivalent of garbage collection. +

+ ProgramState + represents abstract state of the program. It consists of: +

Environment - a mapping from source code expressions to symbolic + values +
Store - a mapping from memory locations to symbolic values +
GenericDataMap - constraints on symbolic values +

+ +

Interaction with Checkers

+ Checkers are not merely passive receivers of the analyzer core changes - they + actively participate in the ProgramState construction through the + GenericDataMap which can be used to store the checker-defined part + of the state. Each time the analyzer engine explores a new statement, it + notifies each checker registered to listen for that statement, giving it an + opportunity to either report a bug or modify the state. (As a rule of thumb, + the checker itself should be stateless.) The checkers are called one after another + in the predefined order; thus, calling all the checkers adds a chain to the + ExplodedGraph. + +

Representing Values

+ During symbolic execution, SVal + objects are used to represent the semantic evaluation of expressions. They can + represent things like concrete integers, symbolic values, or memory locations + (which are memory regions). They are a discriminated union of "values", + symbolic and otherwise. +

+ SymExpr (symbol) + is meant to represent abstract, but named, symbolic value. + Symbolic values can have constraints associated with them. Symbols represent + an actual (immutable) value. We might not know what its specific value is, but + we can associate constraints with that value as we analyze a path. +

+ + MemRegion is similar to a symbol. + It is used to provide a lexicon of how to describe abstract memory. Regions can + layer on top of other regions, providing a layered approach to representing memory. + For example, a struct object on the stack might be represented by a VarRegion, + but a FieldRegion which is a subregion of the VarRegion could + be used to represent the memory associated with a specific field of that object. + So how do we represent symbolic memory regions? That's what SymbolicRegion + is for. It is a MemRegion that has an associated symbol. Since the + symbol is unique and has a unique name; that symbol names the region. +

+ Let's see how the analyzer processes the expressions in the following example: +

+  int foo(int x) {
+     int y = x * 2;
+     int z = x;
+     ...
+  }
+

+Let's look at how x*2 gets evaluated. When x is evaluated, +we first construct an SVal that represents the lvalue of x, in +this case it is an SVal that references the MemRegion for x. +Afterwards, when we do the lvalue-to-rvalue conversion, we get a new SVal, +which references the value currently bound to x. That value is +symbolic; it's whatever x was bound to at the start of the function. +Let's call that symbol $0. Similarly, we evaluate the expression for 2, +and get an SVal that references the concrete number 2. When +we evaluate x*2, we take the two SVals of the subexpressions, +and create a new SVal that represents their multiplication (which in +this case is a new symbolic expression, which we might call $1). When we +evaluate the assignment to y, we again compute its lvalue (a MemRegion), +and then bind the SVal for the RHS (which references the symbolic value $1) +to the MemRegion in the symbolic store. +
+The second line is similar. When we evaluate x again, we do the same +dance, and create an SVal that references the symbol $0. Note, two SVals +might reference the same underlying values. + +

+To summarize, MemRegions are unique names for blocks of memory. Symbols are +unique names for abstract symbolic values. Some MemRegions represents abstract +symbolic chunks of memory, and thus are also based on symbols. SVals are just +references to values, and can reference either MemRegions, Symbols, or concrete +values (e.g., the number 1). + + +

Idea for a Checker

+ Here are several questions which you should consider when evaluating your + checker idea: +

Can the check be effectively implemented without path-sensitive + analysis? See AST Visitors.
How high the false positive rate is going to be? Looking at the occurrences + of the issue you want to write a checker for in the existing code bases might + give you some ideas.
How the current limitations of the analysis will effect the false alarm + rate? Currently, the analyzer only reasons about one procedure at a time (no + inter-procedural analysis). Also, it uses a simple range tracking based + solver to model symbolic execution.
Consult the Bugzilla database + to get some ideas for new checkers and consider starting with improving/fixing + bugs in the existing checkers.

+ +

Checker Registration

+ All checker implementation files are located in clang/lib/StaticAnalyzer/Checkers + folder. Follow the steps below to register a new checker with the analyzer. +

Create a new checker implementation file, for example ./lib/StaticAnalyzer/Checkers/NewChecker.cpp +

+using namespace clang;
+using namespace ento;
+
+namespace {
+class NewChecker: public Checker< check::PreStmt<CallExpr> > {
+public:
+  void checkPreStmt(const CallExpr *CE, CheckerContext &Ctx) const {}
+}
+}
+void ento::registerNewChecker(CheckerManager &mgr) {
+  mgr.registerChecker<NewChecker>();
+}
+

+ +

Pick the package name for your checker and add the registration code to +./lib/StaticAnalyzer/Checkers/Checkers.td. Note, all checkers should +first be developed as experimental. Suppose our new checker performs security +related checks, then we should add the following lines under +SecurityExperimental package: +
```
+let ParentPackage = SecurityExperimental in {
+...
+def NewChecker : Checker<"NewChecker">,
+  HelpText<"This text should give a short description of the checks performed.">,
+  DescFile<"NewChecker.cpp">;
+...
+} // end "security.experimental"
+
```
+ +
Make the source code file visible to CMake by adding it to +./lib/StaticAnalyzer/Checkers/CMakeLists.txt. + +
Compile and see your checker in the list of available checkers by running:
+$clang -cc1 -analyzer-checker-help +

+ + +

Checker Skeleton

+ There are two main decisions you need to make: +

Which events the checker should be tracking. + See CheckerDocumentation + for the list of available checker callbacks.
What data you want to store as part of the checker-specific program + state. Try to minimize the checker state as much as possible.

+ +

Bug Reports

+ +

AST Visitors

+ Some checks might not require path-sensitivity to be effective. Simple AST walk + might be sufficient. If that is the case, consider implementing a Clang + compiler warning. On the other hand, a check might not be acceptable as a compiler + warning; for example, because of a relatively high false positive rate. In this + situation, AST callbacks checkASTDecl and + checkASTCodeBody are your best friends. + +

Testing

+ Every patch should be well tested with Clang regression tests. The checker tests + live in clang/test/Analysis folder. To run all of the analyzer tests, + execute the following from the clang build directory: +

+    $ TESTDIRS=Analysis make test
+

+ +

Useful Commands/Debugging Hints

+While investigating a checker-related issue, instruct the analyzer to only +execute a single checker: +
+$ clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c + +
+To dump AST: +
+$ clang -cc1 -ast-dump test.c + +
+To view/dump CFG use debug.ViewCFG or debug.DumpCFG checkers: +
+$ clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c + +
+To see all available debug checkers: +
+$ clang -cc1 -analyzer-checker-help | grep "debug" + +
+To see which function is failing while processing a large file use +-analyzer-display-progress option. +
+While debugging execute clang -cc1 -analyze -analyzer-checker=core +instead of clang --analyze, as the later would call the compiler +in a separate process. +
+To view ExplodedGraph (the state graph explored by the analyzer) while +debugging, goto a frame that has clang::ento::ExprEngine object and +execute: +
+(gdb) p ViewGraph(0) + +
+To see the ProgramState while debugging use the following command. +
+(gdb) p State->dump() + +
+To see clang::Expr while debugging use the following command. If you +pass in a SourceManager object, it will also dump the corresponding line in the +source code. +
+(gdb) p E->dump() + +
+To dump AST of a method that the current ExplodedNode belongs to: +
+(gdb) p C.getPredecessor()->getCodeDecl().getBody()->dump() +(gdb) p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager()) + +

+ +