From 222e2a7620e6520ffaf4fc4e69d79c18da31542e Mon Sep 17 00:00:00 2001 From: "Zancanaro; Carlo" Date: Mon, 24 Sep 2012 09:58:17 +1000 Subject: Add the clang library to the repo (with some of my changes, too). --- clang/www/analyzer/checker_dev_manual.html | 346 +++++++++++++++++++++++++++++ 1 file changed, 346 insertions(+) create mode 100644 clang/www/analyzer/checker_dev_manual.html (limited to 'clang/www/analyzer/checker_dev_manual.html') diff --git a/clang/www/analyzer/checker_dev_manual.html b/clang/www/analyzer/checker_dev_manual.html new file mode 100644 index 0000000..fc9adf3 --- /dev/null +++ b/clang/www/analyzer/checker_dev_manual.html @@ -0,0 +1,346 @@ + + + + Checker Developer Manual + + + + + + +
+ + +
+ +

This Page Is Under Construction

+ +

Checker Developer Manual

+ +

The static analyzer engine performs symbolic execution of the program and +relies on a set of checkers to implement the logic for detecting and +constructing bug reports. This page provides hints and guidelines for anyone +who is interested in implementing their own checker. The static analyzer is a +part of the Clang project, so consult Hacking on Clang +and LLVM Programmer's Manual +for general developer guidelines and information.

+ + + +

Getting Started

+
    +
  • To check out the source code and build the project, follow steps 1-4 of + the Clang Getting Started + page.
  • + +
  • The analyzer source code is located under the Clang source tree: +
    + $ cd llvm/tools/clang + +
    See: include/clang/StaticAnalyzer, lib/StaticAnalyzer, + test/Analysis.
  • + +
  • The analyzer regression tests can be executed from the Clang's build + directory: +
    + $ cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test +
  • + +
  • Analyze a file with the specified checker: +
    + $ clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c +
  • + +
  • List the available checkers: +
    + $ clang -cc1 -analyzer-checker-help +
  • + +
  • See the analyzer help for different output formats, fine tuning, and + debug options: +
    + $ clang -cc1 -help | grep "analyzer" +
  • + +
+ +

Static Analyzer Overview

+ The analyzer core performs symbolic execution of the given program. All the + input values are represented with symbolic values; further, the engine deduces + the values of all the expressions in the program based on the input symbols + and the path. The execution is path sensitive and every possible path through + the program is explored. The explored execution traces are represented with + ExplidedGraph object. + Each node of the graph is + ExplodedNode, + which consists of a ProgramPoint and a ProgramState. +

+ ProgramPoint + represents the corresponding location in the program (or the CFG graph). + ProgramPoint is also used to record additional information on + when/how the state was added. For example, PostPurgeDeadSymbolsKind + kind means that the state is the result of purging dead symbols - the + analyzer's equivalent of garbage collection. +

+ ProgramState + represents abstract state of the program. It consists of: +

    +
  • Environment - a mapping from source code expressions to symbolic + values +
  • Store - a mapping from memory locations to symbolic values +
  • GenericDataMap - constraints on symbolic values +
+ +

Interaction with Checkers

+ Checkers are not merely passive receivers of the analyzer core changes - they + actively participate in the ProgramState construction through the + GenericDataMap which can be used to store the checker-defined part + of the state. Each time the analyzer engine explores a new statement, it + notifies each checker registered to listen for that statement, giving it an + opportunity to either report a bug or modify the state. (As a rule of thumb, + the checker itself should be stateless.) The checkers are called one after another + in the predefined order; thus, calling all the checkers adds a chain to the + ExplodedGraph. + +

Representing Values

+ During symbolic execution, SVal + objects are used to represent the semantic evaluation of expressions. They can + represent things like concrete integers, symbolic values, or memory locations + (which are memory regions). They are a discriminated union of "values", + symbolic and otherwise. +

+ SymExpr (symbol) + is meant to represent abstract, but named, symbolic value. + Symbolic values can have constraints associated with them. Symbols represent + an actual (immutable) value. We might not know what its specific value is, but + we can associate constraints with that value as we analyze a path. +

+ + MemRegion is similar to a symbol. + It is used to provide a lexicon of how to describe abstract memory. Regions can + layer on top of other regions, providing a layered approach to representing memory. + For example, a struct object on the stack might be represented by a VarRegion, + but a FieldRegion which is a subregion of the VarRegion could + be used to represent the memory associated with a specific field of that object. + So how do we represent symbolic memory regions? That's what SymbolicRegion + is for. It is a MemRegion that has an associated symbol. Since the + symbol is unique and has a unique name; that symbol names the region. +

+ Let's see how the analyzer processes the expressions in the following example: +

+

+  int foo(int x) {
+     int y = x * 2;
+     int z = x;
+     ...
+  }
+  
+

+Let's look at how x*2 gets evaluated. When x is evaluated, +we first construct an SVal that represents the lvalue of x, in +this case it is an SVal that references the MemRegion for x. +Afterwards, when we do the lvalue-to-rvalue conversion, we get a new SVal, +which references the value currently bound to x. That value is +symbolic; it's whatever x was bound to at the start of the function. +Let's call that symbol $0. Similarly, we evaluate the expression for 2, +and get an SVal that references the concrete number 2. When +we evaluate x*2, we take the two SVals of the subexpressions, +and create a new SVal that represents their multiplication (which in +this case is a new symbolic expression, which we might call $1). When we +evaluate the assignment to y, we again compute its lvalue (a MemRegion), +and then bind the SVal for the RHS (which references the symbolic value $1) +to the MemRegion in the symbolic store. +
+The second line is similar. When we evaluate x again, we do the same +dance, and create an SVal that references the symbol $0. Note, two SVals +might reference the same underlying values. + +

+To summarize, MemRegions are unique names for blocks of memory. Symbols are +unique names for abstract symbolic values. Some MemRegions represents abstract +symbolic chunks of memory, and thus are also based on symbols. SVals are just +references to values, and can reference either MemRegions, Symbols, or concrete +values (e.g., the number 1). + + +

Idea for a Checker

+ Here are several questions which you should consider when evaluating your + checker idea: +
    +
  • Can the check be effectively implemented without path-sensitive + analysis? See AST Visitors.
  • + +
  • How high the false positive rate is going to be? Looking at the occurrences + of the issue you want to write a checker for in the existing code bases might + give you some ideas.
  • + +
  • How the current limitations of the analysis will effect the false alarm + rate? Currently, the analyzer only reasons about one procedure at a time (no + inter-procedural analysis). Also, it uses a simple range tracking based + solver to model symbolic execution.
  • + +
  • Consult the Bugzilla database + to get some ideas for new checkers and consider starting with improving/fixing + bugs in the existing checkers.
  • +
+ +

Checker Registration

+ All checker implementation files are located in clang/lib/StaticAnalyzer/Checkers + folder. Follow the steps below to register a new checker with the analyzer. +
    +
  1. Create a new checker implementation file, for example ./lib/StaticAnalyzer/Checkers/NewChecker.cpp +
    +using namespace clang;
    +using namespace ento;
    +
    +namespace {
    +class NewChecker: public Checker< check::PreStmt<CallExpr> > {
    +public:
    +  void checkPreStmt(const CallExpr *CE, CheckerContext &Ctx) const {}
    +}
    +}
    +void ento::registerNewChecker(CheckerManager &mgr) {
    +  mgr.registerChecker<NewChecker>();
    +}
    +
    + +
  2. Pick the package name for your checker and add the registration code to +./lib/StaticAnalyzer/Checkers/Checkers.td. Note, all checkers should +first be developed as experimental. Suppose our new checker performs security +related checks, then we should add the following lines under +SecurityExperimental package: +
    +let ParentPackage = SecurityExperimental in {
    +...
    +def NewChecker : Checker<"NewChecker">,
    +  HelpText<"This text should give a short description of the checks performed.">,
    +  DescFile<"NewChecker.cpp">;
    +...
    +} // end "security.experimental"
    +
    + +
  3. Make the source code file visible to CMake by adding it to +./lib/StaticAnalyzer/Checkers/CMakeLists.txt. + +
  4. Compile and see your checker in the list of available checkers by running:
    +$clang -cc1 -analyzer-checker-help +
+ + +

Checker Skeleton

+ There are two main decisions you need to make: +
    +
  • Which events the checker should be tracking. + See CheckerDocumentation + for the list of available checker callbacks.
  • +
  • What data you want to store as part of the checker-specific program + state. Try to minimize the checker state as much as possible.
  • +
+ +

Bug Reports

+ +

AST Visitors

+ Some checks might not require path-sensitivity to be effective. Simple AST walk + might be sufficient. If that is the case, consider implementing a Clang + compiler warning. On the other hand, a check might not be acceptable as a compiler + warning; for example, because of a relatively high false positive rate. In this + situation, AST callbacks checkASTDecl and + checkASTCodeBody are your best friends. + +

Testing

+ Every patch should be well tested with Clang regression tests. The checker tests + live in clang/test/Analysis folder. To run all of the analyzer tests, + execute the following from the clang build directory: +
+    $ TESTDIRS=Analysis make test
+    
+ +

Useful Commands/Debugging Hints

+
    +
  • +While investigating a checker-related issue, instruct the analyzer to only +execute a single checker: +
    +$ clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c + +
  • +
  • +To dump AST: +
    +$ clang -cc1 -ast-dump test.c + +
  • +
  • +To view/dump CFG use debug.ViewCFG or debug.DumpCFG checkers: +
    +$ clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c + +
  • +
  • +To see all available debug checkers: +
    +$ clang -cc1 -analyzer-checker-help | grep "debug" + +
  • +
  • +To see which function is failing while processing a large file use +-analyzer-display-progress option. +
  • +
  • +While debugging execute clang -cc1 -analyze -analyzer-checker=core +instead of clang --analyze, as the later would call the compiler +in a separate process. +
  • +
  • +To view ExplodedGraph (the state graph explored by the analyzer) while +debugging, goto a frame that has clang::ento::ExprEngine object and +execute: +
    +(gdb) p ViewGraph(0) + +
  • +
  • +To see the ProgramState while debugging use the following command. +
    +(gdb) p State->dump() + +
  • +
  • +To see clang::Expr while debugging use the following command. If you +pass in a SourceManager object, it will also dump the corresponding line in the +source code. +
    +(gdb) p E->dump() + +
  • +
  • +To dump AST of a method that the current ExplodedNode belongs to: +
    +(gdb) p C.getPredecessor()->getCodeDecl().getBody()->dump() +(gdb) p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager()) + +
  • +
+ +
+
+ + -- cgit v1.2.3