From 222e2a7620e6520ffaf4fc4e69d79c18da31542e Mon Sep 17 00:00:00 2001 From: "Zancanaro; Carlo" Date: Mon, 24 Sep 2012 09:58:17 +1000 Subject: Add the clang library to the repo (with some of my changes, too). --- clang/docs/DriverInternals.html | 523 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 523 insertions(+) create mode 100644 clang/docs/DriverInternals.html (limited to 'clang/docs/DriverInternals.html') diff --git a/clang/docs/DriverInternals.html b/clang/docs/DriverInternals.html new file mode 100644 index 0000000..ce707b9 --- /dev/null +++ b/clang/docs/DriverInternals.html @@ -0,0 +1,523 @@ + + + + Clang Driver Manual + + + + + + + + +
+ +

Driver Design & Internals

+ + + + + +

Introduction

+ + +

This document describes the Clang driver. The purpose of this + document is to describe both the motivation and design goals + for the driver, as well as details of the internal + implementation.

+ + +

Features and Goals

+ + +

The Clang driver is intended to be a production quality + compiler driver providing access to the Clang compiler and + tools, with a command line interface which is compatible with + the gcc driver.

+ +

Although the driver is part of and driven by the Clang + project, it is logically a separate tool which shares many of + the same goals as Clang:

+ +

Features:

+ + + +

GCC Compatibility

+ + +

The number one goal of the driver is to ease the adoption of + Clang by allowing users to drop Clang into a build system + which was designed to call GCC. Although this makes the driver + much more complicated than might otherwise be necessary, we + decided that being very compatible with the gcc command line + interface was worth it in order to allow users to quickly test + clang on their projects.

+ + +

Flexible

+ + +

The driver was designed to be flexible and easily accommodate + new uses as we grow the clang and LLVM infrastructure. As one + example, the driver can easily support the introduction of + tools which have an integrated assembler; something we hope to + add to LLVM in the future.

+ +

Similarly, most of the driver functionality is kept in a + library which can be used to build other tools which want to + implement or accept a gcc like interface.

+ + +

Low Overhead

+ + +

The driver should have as little overhead as possible. In + practice, we found that the gcc driver by itself incurred a + small but meaningful overhead when compiling many small + files. The driver doesn't do much work compared to a + compilation, but we have tried to keep it as efficient as + possible by following a few simple principles:

+ + + +

Simple

+ + +

Finally, the driver was designed to be "as simple as + possible", given the other goals. Notably, trying to be + completely compatible with the gcc driver adds a significant + amount of complexity. However, the design of the driver + attempts to mitigate this complexity by dividing the process + into a number of independent stages instead of a single + monolithic task.

+ + +

Internal Design and Implementation

+ + + + + +

Internals Introduction

+ + +

In order to satisfy the stated goals, the driver was designed + to completely subsume the functionality of the gcc executable; + that is, the driver should not need to delegate to gcc to + perform subtasks. On Darwin, this implies that the Clang + driver also subsumes the gcc driver-driver, which is used to + implement support for building universal images (binaries and + object files). This also implies that the driver should be + able to call the language specific compilers (e.g. cc1) + directly, which means that it must have enough information to + forward command line arguments to child processes + correctly.

+ + +

Design Overview

+ + +

The diagram below shows the significant components of the + driver architecture and how they relate to one another. The + orange components represent concrete data structures built by + the driver, the green components indicate conceptually + distinct stages which manipulate these data structures, and + the blue components are important helper classes.

+ +
+ + Driver Architecture Diagram + +
+ + +

Driver Stages

+ + +

The driver functionality is conceptually divided into five stages:

+ +
    +
  1. + Parse: Option Parsing + +

    The command line argument strings are decomposed into + arguments (Arg instances). The driver expects to + understand all available options, although there is some + facility for just passing certain classes of options + through (like -Wl,).

    + +

    Each argument corresponds to exactly one + abstract Option definition, which describes how + the option is parsed along with some additional + metadata. The Arg instances themselves are lightweight and + merely contain enough information for clients to determine + which option they correspond to and their values (if they + have additional parameters).

    + +

    For example, a command line like "-Ifoo -I foo" would + parse to two Arg instances (a JoinedArg and a SeparateArg + instance), but each would refer to the same Option.

    + +

    Options are lazily created in order to avoid populating + all Option classes when the driver is loaded. Most of the + driver code only needs to deal with options by their + unique ID (e.g., options::OPT_I),

    + +

    Arg instances themselves do not generally store the + values of parameters. In many cases, this would + simply result in creating unnecessary string + copies. Instead, Arg instances are always embedded inside + an ArgList structure, which contains the original vector + of argument strings. Each Arg itself only needs to contain + an index into this vector instead of storing its values + directly.

    + +

    The clang driver can dump the results of this + stage using the -ccc-print-options flag (which + must precede any actual command line arguments). For + example:

    +
    +            $ clang -ccc-print-options -Xarch_i386 -fomit-frame-pointer -Wa,-fast -Ifoo -I foo t.c
    +            Option 0 - Name: "-Xarch_", Values: {"i386", "-fomit-frame-pointer"}
    +            Option 1 - Name: "-Wa,", Values: {"-fast"}
    +            Option 2 - Name: "-I", Values: {"foo"}
    +            Option 3 - Name: "-I", Values: {"foo"}
    +            Option 4 - Name: "<input>", Values: {"t.c"}
    +          
    + +

    After this stage is complete the command line should be + broken down into well defined option objects with their + appropriate parameters. Subsequent stages should rarely, + if ever, need to do any string processing.

    +
  2. + +
  3. + Pipeline: Compilation Job Construction + +

    Once the arguments are parsed, the tree of subprocess + jobs needed for the desired compilation sequence are + constructed. This involves determining the input files and + their types, what work is to be done on them (preprocess, + compile, assemble, link, etc.), and constructing a list of + Action instances for each task. The result is a list of + one or more top-level actions, each of which generally + corresponds to a single output (for example, an object or + linked executable).

    + +

    The majority of Actions correspond to actual tasks, + however there are two special Actions. The first is + InputAction, which simply serves to adapt an input + argument for use as an input to other Actions. The second + is BindArchAction, which conceptually alters the + architecture to be used for all of its input Actions.

    + +

    The clang driver can dump the results of this + stage using the -ccc-print-phases flag. For + example:

    +
    +            $ clang -ccc-print-phases -x c t.c -x assembler t.s
    +            0: input, "t.c", c
    +            1: preprocessor, {0}, cpp-output
    +            2: compiler, {1}, assembler
    +            3: assembler, {2}, object
    +            4: input, "t.s", assembler
    +            5: assembler, {4}, object
    +            6: linker, {3, 5}, image
    +          
    +

    Here the driver is constructing seven distinct actions, + four to compile the "t.c" input into an object file, two to + assemble the "t.s" input, and one to link them together.

    + +

    A rather different compilation pipeline is shown here; in + this example there are two top level actions to compile + the input files into two separate object files, where each + object file is built using lipo to merge results + built for two separate architectures.

    +
    +            $ clang -ccc-print-phases -c -arch i386 -arch x86_64 t0.c t1.c
    +            0: input, "t0.c", c
    +            1: preprocessor, {0}, cpp-output
    +            2: compiler, {1}, assembler
    +            3: assembler, {2}, object
    +            4: bind-arch, "i386", {3}, object
    +            5: bind-arch, "x86_64", {3}, object
    +            6: lipo, {4, 5}, object
    +            7: input, "t1.c", c
    +            8: preprocessor, {7}, cpp-output
    +            9: compiler, {8}, assembler
    +            10: assembler, {9}, object
    +            11: bind-arch, "i386", {10}, object
    +            12: bind-arch, "x86_64", {10}, object
    +            13: lipo, {11, 12}, object
    +          
    + +

    After this stage is complete the compilation process is + divided into a simple set of actions which need to be + performed to produce intermediate or final outputs (in + some cases, like -fsyntax-only, there is no + "real" final output). Phases are well known compilation + steps, such as "preprocess", "compile", "assemble", + "link", etc.

    +
  4. + +
  5. + Bind: Tool & Filename Selection + +

    This stage (in conjunction with the Translate stage) + turns the tree of Actions into a list of actual subprocess + to run. Conceptually, the driver performs a top down + matching to assign Action(s) to Tools. The ToolChain is + responsible for selecting the tool to perform a particular + action; once selected the driver interacts with the tool + to see if it can match additional actions (for example, by + having an integrated preprocessor). + +

    Once Tools have been selected for all actions, the driver + determines how the tools should be connected (for example, + using an inprocess module, pipes, temporary files, or user + provided filenames). If an output file is required, the + driver also computes the appropriate file name (the suffix + and file location depend on the input types and options + such as -save-temps). + +

    The driver interacts with a ToolChain to perform the Tool + bindings. Each ToolChain contains information about all + the tools needed for compilation for a particular + architecture, platform, and operating system. A single + driver invocation may query multiple ToolChains during one + compilation in order to interact with tools for separate + architectures.

    + +

    The results of this stage are not computed directly, but + the driver can print the results via + the -ccc-print-bindings option. For example:

    +
    +            $ clang -ccc-print-bindings -arch i386 -arch ppc t0.c
    +            # "i386-apple-darwin9" - "clang", inputs: ["t0.c"], output: "/tmp/cc-Sn4RKF.s"
    +            # "i386-apple-darwin9" - "darwin::Assemble", inputs: ["/tmp/cc-Sn4RKF.s"], output: "/tmp/cc-gvSnbS.o"
    +            # "i386-apple-darwin9" - "darwin::Link", inputs: ["/tmp/cc-gvSnbS.o"], output: "/tmp/cc-jgHQxi.out"
    +            # "ppc-apple-darwin9" - "gcc::Compile", inputs: ["t0.c"], output: "/tmp/cc-Q0bTox.s"
    +            # "ppc-apple-darwin9" - "gcc::Assemble", inputs: ["/tmp/cc-Q0bTox.s"], output: "/tmp/cc-WCdicw.o"
    +            # "ppc-apple-darwin9" - "gcc::Link", inputs: ["/tmp/cc-WCdicw.o"], output: "/tmp/cc-HHBEBh.out"
    +            # "i386-apple-darwin9" - "darwin::Lipo", inputs: ["/tmp/cc-jgHQxi.out", "/tmp/cc-HHBEBh.out"], output: "a.out"
    +          
    + +

    This shows the tool chain, tool, inputs and outputs which + have been bound for this compilation sequence. Here clang + is being used to compile t0.c on the i386 architecture and + darwin specific versions of the tools are being used to + assemble and link the result, but generic gcc versions of + the tools are being used on PowerPC.

    +
  6. + +
  7. + Translate: Tool Specific Argument Translation + +

    Once a Tool has been selected to perform a particular + Action, the Tool must construct concrete Jobs which will be + executed during compilation. The main work is in translating + from the gcc style command line options to whatever options + the subprocess expects.

    + +

    Some tools, such as the assembler, only interact with a + handful of arguments and just determine the path of the + executable to call and pass on their input and output + arguments. Others, like the compiler or the linker, may + translate a large number of arguments in addition.

    + +

    The ArgList class provides a number of simple helper + methods to assist with translating arguments; for example, + to pass on only the last of arguments corresponding to some + option, or all arguments for an option.

    + +

    The result of this stage is a list of Jobs (executable + paths and argument strings) to execute.

    +
  8. + +
  9. + Execute +

    Finally, the compilation pipeline is executed. This is + mostly straightforward, although there is some interaction + with options + like -pipe, -pass-exit-codes + and -time.

    +
  10. + +
+ + +

Additional Notes

+ + +

The Compilation Object

+ +

The driver constructs a Compilation object for each set of + command line arguments. The Driver itself is intended to be + invariant during construction of a Compilation; an IDE should be + able to construct a single long lived driver instance to use + for an entire build, for example.

+ +

The Compilation object holds information that is particular + to each compilation sequence. For example, the list of used + temporary files (which must be removed once compilation is + finished) and result files (which should be removed if + compilation fails).

+ +

Unified Parsing & Pipelining

+ +

Parsing and pipelining both occur without reference to a + Compilation instance. This is by design; the driver expects that + both of these phases are platform neutral, with a few very well + defined exceptions such as whether the platform uses a driver + driver.

+ +

ToolChain Argument Translation

+ +

In order to match gcc very closely, the clang driver + currently allows tool chains to perform their own translation of + the argument list (into a new ArgList data structure). Although + this allows the clang driver to match gcc easily, it also makes + the driver operation much harder to understand (since the Tools + stop seeing some arguments the user provided, and see new ones + instead).

+ +

For example, on Darwin -gfull gets translated into two + separate arguments, -g + and -fno-eliminate-unused-debug-symbols. Trying to write Tool + logic to do something with -gfull will not work, because Tool + argument translation is done after the arguments have been + translated.

+ +

A long term goal is to remove this tool chain specific + translation, and instead force each tool to change its own logic + to do the right thing on the untranslated original arguments.

+ +

Unused Argument Warnings

+

The driver operates by parsing all arguments but giving Tools + the opportunity to choose which arguments to pass on. One + downside of this infrastructure is that if the user misspells + some option, or is confused about which options to use, some + command line arguments the user really cared about may go + unused. This problem is particularly important when using + clang as a compiler, since the clang compiler does not support + anywhere near all the options that gcc does, and we want to make + sure users know which ones are being used.

+ +

To support this, the driver maintains a bit associated with + each argument of whether it has been used (at all) during the + compilation. This bit usually doesn't need to be set by hand, + as the key ArgList accessors will set it automatically.

+ +

When a compilation is successful (there are no errors), the + driver checks the bit and emits an "unused argument" warning for + any arguments which were never accessed. This is conservative + (the argument may not have been used to do what the user wanted) + but still catches the most obvious cases.

+ + +

Relation to GCC Driver Concepts

+ + +

For those familiar with the gcc driver, this section provides + a brief overview of how things from the gcc driver map to the + clang driver.

+ + +
+ + -- cgit v1.2.3