Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to extract source files when using a special compiler (e.g. TMS320C2000 C/C++ Compiler)? #8453

Open
li-xin-yi opened this issue Mar 15, 2022 · 18 comments
Labels
question Further information is requested

Comments

@li-xin-yi
Copy link

When I first created a database by codeql-cli for C/C++ source files for some embedded system software codes generated by Code Composer Studio, it failed to extract any source file even though the gmake build process ran successfully.

I found the root cause is that the makefile takes TMS320C2000 C/C++ Compiler as the compiler for the project. Even when I only have the helloworld.c as:

int main() {
   return 0;
}

build.sh as

"C:\ti\ccs1100\ccs\tools\compiler\ti-cgt-c2000_21.6.0.LTS\bin\cl2000" helloworld.c

And run the command as:

codeql database create cpp-database-01 --language=cpp --command="bash build.sh"

No source file can be extracted and the error message shows:

No source code was seen and extracted to ****
This can occur if the specified build commands failed to compile or process any code.
 - Confirm that there is some source code for the specified language in the project.
 - For codebases written in Go, JavaScript, TypeScript, and Python, do not specify 

   an explicit --command.
 - For other languages, the --command must specify a "clean" build which compiles  
   all the source code files without reusing existing build artefacts.
(base) 

I guess the extractor in codeql fails to monitor the compilation process of TMS320C2000 C/C++ Compiler, which technically can be traced as it has similar usages to take source/lib/include files with other compilers (e.g., gcc, clang).

I really want to apply codeql on those codes compiled by some special compilers to do some security analysis, but get stock with the extraction of source files. How can I fix the problem? Can I change some extractor configuration? Or if there is any direction to modify the source code of codeql to generalize the compiler requirement? Or can I add the source files to the database manually?

Thank you very much.

@li-xin-yi li-xin-yi added the question Further information is requested label Mar 15, 2022
@aibaars
Copy link
Contributor

aibaars commented Mar 15, 2022

The problem is that CodeQL does not recognize cl2000 as a C/C++ compiler.

The configuration file codeql/cpp/tools/win64/compiler-tracing.spec defines rules for the CodeQL tracer to detect compilers. It contains rules like

**/cl.exe:
**/clang-cl.exe:
  invoke ${config_dir}/extractor.exe
  order compiler, extractor
  prepend --mimic
  prepend "${compiler}"

The rule above informs CodeQL that is should run the C/C++ "extractor" whenever a binary named cl.exe or clang-cl.exe is run.

You could copy this file and try to add a rule for **/cl2000 (or **/cl2000.exe) and use the --compiler-spec flag to point to the modified configuration file.

codeql database create --help -v
...
     --compiler-spec=<spec-file>
                             [Advanced] The path to a compiler specification
                               file. It may be used to pick out compiler
                               processes that run as part of the build command,
                               and trigger the execution of other tools. The
                               extractors will provide default compiler
                               specifications that should work in most
                               situations.
...

Note that compiler specifications are an advanced feature. Adding a rule will cause the CodeQL tracer to intercept cl2000 and run the extractor. However, if cl2000 behaves differently than the well-supported compilers then things will probably still fail. The best way to debug any problem is to inspect the build-tracer.log file. This contains detailing information on which processes were intercepted and which of those are considered "compilers" by CodeQL.

@li-xin-yi
Copy link
Author

The problem is that CodeQL does not recognize cl2000 as a C/C++ compiler.

The configuration file codeql/cpp/tools/win64/compiler-tracing.spec defines rules for the CodeQL tracer to detect compilers. It contains rules like

**/cl.exe:
**/clang-cl.exe:
  invoke ${config_dir}/extractor.exe
  order compiler, extractor
  prepend --mimic
  prepend "${compiler}"

The rule above informs CodeQL that is should run the C/C++ "extractor" whenever a binary named cl.exe or clang-cl.exe is run.

You could copy this file and try to add a rule for **/cl2000 (or **/cl2000.exe) and use the --compiler-spec flag to point to the modified configuration file.

codeql database create --help -v
...
     --compiler-spec=<spec-file>
                             [Advanced] The path to a compiler specification
                               file. It may be used to pick out compiler
                               processes that run as part of the build command,
                               and trigger the execution of other tools. The
                               extractors will provide default compiler
                               specifications that should work in most
                               situations.
...

Note that compiler specifications are an advanced feature. Adding a rule will cause the CodeQL tracer to intercept cl2000 and run the extractor. However, if cl2000 behaves differently than the well-supported compilers then things will probably still fail. The best way to debug any problem is to inspect the build-tracer.log file. This contains detailing information on which processes were intercepted and which of those are considered "compilers" by CodeQL.

Hi @aibaars, thank you for replying. I modified compiler-tracing file and it successfully built the database. However, when I queried with LineOfCode, it returned with result 0, which means none lines of code is extracted from the project. It's strange, I can view many source files in the database archive.

image

How would it happen? Did I make anything wrong when defining the compiler rule? Can you help me with that?

@aibaars
Copy link
Contributor

aibaars commented Mar 24, 2022

@li-xin-yi It looks like CodeQL is now intercepting the compiler calls and running the "extractor" on the source files. The "extractor" copies the analysed source file to the "source archive" (this bit is working) and parses the source file to produce "trap" files (trap files are later imported into the databases). Most likely, the extractor fails to parse the source files, causing the trap files to be incomplete or even empty. In the screenshot you attached I see the Metric table in the Terminal. There should be other diagnostic tables too that report how many files were extracted correctly and how many with failures. There should be error message in the log/build-tracer.log file if there were any extractor failures.

As I said earlier:

However, if cl2000 behaves differently than the well-supported compilers then things will probably still fail. The best way to debug any problem is to inspect the build-tracer.log file. This contains detailing information on which processes were intercepted and which of those are considered "compilers" by CodeQL.

It would be good to know what causes the failures. It could be something simple like a cl2000 compiler flag that the extractor does not understand. However, compilers from hardware manufacturers often have non-standard extensions to the C or C++ language and supporting them is much harder. Even if your helloworld.c is not using any extensions, it's likely that some of the system header files do.

Could you please file a separate feature request issue for CodeQL support for the TMS320C2000 C/C++ Compiler? If you have an enterprise account, it's probably best to file the feature request through enterprise support. We keep track of all feature requests, but when/if a feature gets implemented depends on priorities.

In the short term you might want to try the following:

**/cl2000:
**/cl2000.exe:
  invoke C:/my-extractor-wrapper.cmd
  order compiler, extractor
  prepend ${config_dir}/extractor.exe
  prepend --mimic
  prepend "${compiler}"

The my-compiler-wrapper.cmd is run with the following arguments: path of the extractor binary, the flag --mimic, the path of the compiler, followed by all the command line arguments that were passed to the intercepted compiler call. You could inspect the command line arguments, drop or rewrite some of them, and even rewrite source files to strip out any syntactic constructs that CodeQL does not understand.

@li-xin-yi
Copy link
Author

Thank you @aibaars, I looked into log/build-tracer.log and found tons of errors when after extractor.exe --trapfolder runs. All of errors occurred during archiving lib files of the compiler into the database, for example, this is a snippet of logs:

[E 00:10:44 17572] Starting compilation TRAP C:\lab\BMS5\bms-database\trap\cpp/compilations/48/33743317_0.trap.br
[E 00:10:44 17572] Archiving C:\lab\BMS5\bms-database\src/C_/lab/BMS5/BMS5.c
[E 00:10:44 17572] Archiving C:\lab\BMS5\bms-database\src/C_/lab/BMS5/device/driverlib.h
[E 00:10:44 17572] Archiving C:\lab\BMS5\bms-database\src/C_/ti/c2000/C2000Ware_4_00_00_00/driverlib/f28002x/driverlib/inc/hw_memmap.h
[E 00:10:44 17572] Archiving C:\lab\BMS5\bms-database\src/C_/ti/c2000/C2000Ware_4_00_00_00/driverlib/f28002x/driverlib/adc.h
[E 00:10:44 17572] Archiving C:\lab\BMS5\bms-database\src/C_/ti/ccs1100/ccs/tools/compiler/ti-cgt-c2000_21.6.0.LTS/include/stdbool.h
[E 00:10:44 17572] Archiving C:\lab\BMS5\bms-database\src/C_/ti/ccs1100/ccs/tools/compiler/ti-cgt-c2000_21.6.0.LTS/include/_ti_config.h
"C:/ti/ccs1100/ccs/tools/compiler/ti-cgt-c2000_21.6.0.LTS/include/_ti_config.h", line 54: error: expected a type specifier
  _TI_PROPRIETARY_PRAGMA("diag_push")
                         ^

"C:/ti/ccs1100/ccs/tools/compiler/ti-cgt-c2000_21.6.0.LTS/include/_ti_config.h", line 54: error: unnamed prototyped parameters not allowed when body is present
  _TI_PROPRIETARY_PRAGMA("diag_push")
  ^

"C:/ti/ccs1100/ccs/tools/compiler/ti-cgt-c2000_21.6.0.LTS/include/_ti_config.h", line 55: error: expected a "{"
  _TI_PROPRIETARY_PRAGMA("CHECK_MISRA(\"-19.4\")")
  ^

It considers the include files for the compiler as code of some syntax errors, however, it can actually be compiled by cl2000 without errors. I don't know how the extractor check whether there is any syntax error. From the log entries, I don't think it follows the standard specified by compile commands or compiler itself. I am just now wondering how to force the extractor to ignore those "syntax errors" during extracting and archiving source files?

By the way, I really like CodeQL and applied it on other high-level development projects. I just want to extend the usage to some works involving low-level embedded systems as well.

@aibaars
Copy link
Contributor

aibaars commented Mar 25, 2022

It considers the include files for the compiler as code of some syntax errors, however, it can actually be compiled by cl2000 without errors. I don't know how the extractor check whether there is any syntax error. From the log entries, I don't think it follows the standard specified by compile commands or compiler itself. I am just now wondering how to force the extractor to ignore those "syntax errors" during extracting and archiving source files?

The extractor is a C/C++ compiler frontend that parses and "compiles" the source files into "trap" output. The parser of the extractor does not understand those non-standard bit of code and reports them as syntax errors, cl2000 is of course fine with its own non-standard extensions.

One way to make this particular error disappear is to redefine the _TI_PROPRIETARY_PRAGMA macro to an empty string or similar. Perhaps

append -D_TI_PROPRIETARY_PRAGMA 
append ""

would do the trick. Otherwise, it should be possible to let a my-extractor-wrapper program rewrite the command line to redefine that macro.

Another thing to try would be to add. That should make the extractor carry on regardless of any errors. The drawback is that your database may have many "gaps".

prepend --expect_errors

This post seems related :
https://e2e.ti.com/support/tools/code-composer-studio-group/ccs/f/code-composer-studio-forum/935313/compiler-arm-cgt---preproc_only-creates-weired-style-pragma

@TomShirley
Copy link

I'm trying to get the same thing working with the TMS320C2000 C/C++ Compiler but not having any luck using codeql v2.15.1.

Have things moved on since this post? I don't see --compiler-spec option on codeql database create.

fwiw i'm attempting to get the extractor working on linux. It would be wonderful if someone could guide me on how to get codeql to successfully parse and extract using this compiler 😍

I've tried following this thread #10132 (comment) and it seems to trip up when there's command line arguments on the intercepted c2000 compiler command:

[T 05:23:17 23909] Initializing tracer.
[T 05:23:17 23909] Initialising tags...
[T 05:23:17 23909] ID set to 0000000000005D65_0000000000000001 (parent 0000000000005D5E_0000000000000001)
[E 05:23:17 23909] CodeQL C/C++ Extractor 2.15.1
[E 05:23:17 23909] Current directory: /src/Debug
[E 05:23:17 23909] Command: /root/bin/codeql/codeql/cpp/tools/linux64/extractor --mimic /opt/ti/ccs/tools/compiler/c2000_x/bin/cl2000 -c '"/opt/ti/ccs/tools/compiler/c2000_x/bin/cl2000" -v28 -ml -mt -O3 -g --optimize_with_debug=on --include_path="/opt/ti/ccs/tools/compiler/c2000_x/include" --include_path="/src/Debug"  --define="_DEBUG" "./custom.c"'
[E 05:23:17 23909] Checking whether C compilation already happened.
[E 05:23:17 23909] Checking for tag c-compilation-happened
[E 05:23:17 23909] Checking CODEQL_TRACER_DB_ID 0000000000005D5E_0000000000000001
[E 05:23:17 23909] Locking DB /src/codeql-dbs/device-firmware/working/tags.db
[E 05:23:17 23909] Locked DB /src/codeql-dbs/device-firmware/working/tags.db
[E 05:23:17 23909] Unlocking DB
[E 05:23:17 23909] Unlocked DB
[E 05:23:17 23909] Looks like C compilation didn't already happen.
[E 05:23:17 23909] Checking whether C compilation has been attempted.
[E 05:23:17 23909] Checking for tag c-compilation-attempted
[E 05:23:17 23909] Checking CODEQL_TRACER_DB_ID 0000000000005D5E_0000000000000001
[E 05:23:17 23909] Locking DB /src/codeql-dbs/device-firmware/working/tags.db
[E 05:23:17 23909] Locked DB /src/codeql-dbs/device-firmware/working/tags.db
[E 05:23:17 23909] Unlocking DB
[E 05:23:17 23909] Unlocked DB
[E 05:23:17 23909] Marking C compilation as attempted.
[E 05:23:17 23909] Setting tag c-compilation-attempted
[E 05:23:17 23909] Starting from CODEQL_TRACER_DB_ID 0000000000005D5E_0000000000000001
[E 05:23:17 23909] Locking DB /src/codeql-dbs/device-firmware/working/tags.db
[E 05:23:17 23909] Locked DB /src/codeql-dbs/device-firmware/working/tags.db
[E 05:23:17 23909] Set tag for 0000000000005D5E_0000000000000001
[E 05:23:17 23909] Set tag for 0000000000005CBC_0000000000000001
[E 05:23:17 23909] Set tag for 0000000000005C90_0000000000000001
[E 05:23:17 23909] Set tag for 0000000000005C8E_0000000000000001
[E 05:23:17 23909] Set tag for 0000000000005C81_0000000000000003
[E 05:23:17 23909] Set tag for 0000000000005C81_0000000000000002
[E 05:23:17 23909] Set tag for root
[E 05:23:17 23909] Unlocking DB
[E 05:23:17 23909] Unlocked DB
[E 05:23:17 23909] Warning[extractor-c++]: In canonicalise_path: realpath failed
Excluded "/opt/ti/ccs/tools/compiler/c2000_x/bin/cl2000" -v28 -ml -mt -O3 -g --optimize_with_debug=on --include_path="/opt/ti/ccs/tools/compiler/c2000_x/include" --include_path="/src/Debug" --define="_DEBUG" "./custom.c" because it is an object
[E 05:23:17 23909] 0 file groups; exiting.
[T 05:23:17 23902] Extractor /root/bin/codeql/codeql/cpp/tools/linux64/extractor terminated with exit code 0.

@aibaars
Copy link
Contributor

aibaars commented Nov 13, 2023

The "compiler specification" files have been superseded by tracing configuration Lua scripts.

The flag is --extra-tracing-config=<tracing-config.lua> .

This is an example of a custom tracing script used for C# to inject some custom flag into an msbuild command line: https://github.com/aibaars/WebGoat.Net/pull/1/files

Another example of a tracing specification: https://github.com/github/codeql/blob/main/go/codeql-tools/tracing-config.lua

I don't think there is any public documentation at the moment on how to write extra tracing configuration files. You can use the following template as a starting point, save at as for example custom-tracing-config.lua and look at the default lua configuration files found in the CodeQL CLI folder at codeql/cpp/tools/tracing-config.lua and codeql/tools/tracer/base.lua to get an idea of how to define a command matcher.

function GetCompatibleVersions() return {'1.0.0'} end
function RegisterExtraConfig()
    return {
        ['cpp'] = {
            DEFINE_MATCHERS_HERE,
            table.unpack(_RegisteredMatchers['cpp']) -- include default matchers, if needed
        }
    }
end

You can run codeql database create --extra-tracing-configuration=custom-tracing-config.lua my/database. The my/database/log/build-tracer.log file should contain information about which commands were intercepted by the tracer configuration, as well as any problems encountered when interpreting the custom-tracing-config.lua file.

You could make a tracing configuration file that intercepts cl2000 processes and rewrites their command line arguments, either dropping or translating any compiler flags that the CodeQL does not understand before passing the arguments to the extractor.

@TomShirley
Copy link

TomShirley commented Nov 14, 2023

Hi @aibaars thanks for the detailed reply.

I can intercept cl2000 commands with a custom tracing lua script, albeit by intercepting /bin/sh pattern as that's the binary that the interceptor is seeing when I run make all.

The next hurdle is getting the extractor to do the right thing and pass arguments along in the right way. Because I couldn't find the extractor source code (and there isn't any help documentation on the binary), i've tried to tinker with sending commands directly to the extractor.

For example:

Building a single source file with the custom cl2000 compiler requires several include paths for my domain:

/opt/ti/ccs/tools/compiler/c2000_x/bin/cl2000 -v28 --include_path/opt/ti/ccs/tools/compiler/c2000_x/include --include_path=/src --include_path=/opt/ti/ccs/bios/packages/ti/bios/include --include_path=/opt/ti/ccs/bios_x/packages/ti/rtdx/include/c2000 --include_path=/opt/ti/ccs/xdais ./custom.c

Trying to translate this command to something the exctractor can process correctly, i've noticed that there's a --compiler option so i've attempted to use that to instruct the extractor to include them in the generator command that it attempts to run:

/root/bin/codeql/codeql/cpp/tools/linux64/extractor --mimic /opt/ti/ccs/tools/compiler/c2000/bin/cl2000 -c './custom.c' --compiler -I/opt/ti/ccs/tools/compiler/c2000/include --compiler -I/src --compiler -I/opt/ti/ccs/bios/packages/ti/bios/include --compiler -I/opt/ti/ccs/bios/packages/ti/rtdx/include/c2000 --compiler -I/opt/ti/ccs/xdais

with this command, the extractor attempts the following:

[E 06:33:32 5840] Processed command line: /root/bin/codeql/codeql/cpp/tools/linux64/extractor --mimic_config compiler_mimic_cache/fdf7fe580b4c -w --error_limit 1000 --disable_system_macros --variadic_macros --c89 --ti '-D__signed_chars__=1    /* Predefined */' '-D__DATE__="Nov 14 2023"     /* Predefined */' '-D__TIME__="06:32:09"        /* Predefined */' '-D__edg_front_end__=1   /* Predefined */' '-D__TI_COMPILER_VERSION__=6001003    /* Predefined */' '-D__COMPILER_VERSION__=6001003       /* Predefined */' '-D__TMS320C2000__=1  /* Predefined */' '-D_TMS320C2000=1        /* Predefined */' '-D__TMS320C28XX__=1  /* Predefined */' '-D_TMS320C28XX=1     /* Predefined */' '-D__TMS320C28X__=1   /* Predefined */' '-D_TMS320C28X=1 /* Predefined */' '-D__SIZE_T_TYPE__=unsigned long      /* Predefined */' '-D__PTRDIFF_T_TYPE__=long    /* Predefined */' '-D__WCHAR_T_TYPE__=unsigned int      /* Predefined */' '-D__little_endian__=1   /* Predefined */' '-D__TI_STRICT_ANSI_MODE__=1  /* Predefined */' '-D__TI_WCHAR_T_BITS__=16     /* Predefined */' '-D__TI_GNU_ATTRIBUTE_SUPPORT__=0        /* Predefined */' '-D__TI_STRICT_FP_MODE__=1    /* Predefined */' -D_INLINE=1 -D_OPTIMIZE_FOR_SPACE=1 -I/src -I/opt/ti/ccs/bios/packages/ti/bios/include -I/opt/ti/ccs/bios9/packages/ti/rtdx/include/c2000 -I/opt/ti/ccs/xdais/packages/ti/xdais -I/opt/ti/ccs/tools/compiler/c2000/bin/../include/ -- ./custom.c

which doesn't compile (several undefined identifiers but even so, it did manage to include those directories which solved one breaking issue). I'm not sure, but is there a way to pass an argument to the extractor and have it pass that as-is on to the compiler? For instance attempting --compiler -v28 - i don't see -v28 passed through in the command output above?

I'm unclear still however, is the extractor running the gcc compiler with the above arguments? If it is, is there any way to swap out the gcc compiler for the cl2000 compiler and tell the extractor to run that instead?

Or you're saying the only way forward is to write middleware that drops any non standard gcc arguments and patch out any non standard c code in every file in the repo before the extractor runs (which is probably is too big of a challenge to be worth the effort i'm afraid given an it's an existing complex app with many dependencies to cl2000 constructs)

as an aside: this lua script has alot of useful components if needing to transform commands before they're executed by the extractor: https://github.com/microsoft/codeql/blob/d9364c060e8897bb907b05feef458c3892ee38a2/csharp/tools/tracing-config.lua

@aibaars
Copy link
Contributor

aibaars commented Nov 14, 2023

@TomShirley Unfortunately, the C++ extractor is not open source and there is no external documentation on how to configure it. The C# extractor is open source, but is different from the C++ extractor and has different command line arguments. I don't think the C++ has a --compiler flag. The C# flag is defined at

.

I'm afraid I don't know the details of the C++ extractor myself. The way the CodeQL tracer roughly works is that it intercepts all processes, matches their commands and arguments based on the tracer configuration (Lua script), and based on the configuration decides what to do with the command. If a command matches the pattern of a compiler, then the tracer typically let the compiler process proceed as normally, but in addition start an "extractor" process based on the compiler's command and arguments. An "extractor" behaves a lot like a compiler, but instead of generating code, it produces "trap" files which as text files that describe the information that should go into the CodeQL database.

The C++ extractor has a ton of command line flags to make it behave as closely as possible as the intercepted compiler. It needs to know the include paths of the compiler's standard libraries, information about the architecture, the definition of all sorts of macros, the variant of C/C++ , any syntactic extensions of the compiler, and many more things. This is a lot of work to configure by hand, therefore, the "extractor" typically runs in two steps, first in "mimic" mode, which tries to run the compiler's command in various ways to detect how it behaves, and based on that in "real" mode, which runs the "extractor" with the original arguments of the compiler and all the additional configuration flags derived in the "mimic" phase.

You might want to perform a few experiments with some small programs compiled with the supported compilers and have a look at the contents of the "mimic cache". You can do the same for the cl2000 compiler. If the "mimic cache" for the cl2000 compiler looks sensible then you can use the mimic approach as-is. If you feel like things are wrong/missing then you can copy the cached file and tweak it manually. In that case you could stop using the --mimic flag anymore and supply your tweaked file directly as --mimic_config.

The "extractor" knows about (most of) the flags of the supported compilers and whether to interpret them or ignore them. For an unsupported compiler you need to drop or rewrite any of the command line arguments that the extractor does not understand.

This is roughly all I know about how things work ;-) . It will be quite a bit of trial and error, but you can probably get a prototype implementation working this way. Consider contacting the Expert Services team for additional help.

@TomShirley
Copy link

@aibaars interesting!

I've tried --mimic_config and it does indeed appear to be a valid argument, however I don't know how to use it:

/root/bin/codeql/codeql/cpp/tools/linux64/extractor --mimic_config '/src/codeql-dbs/8a85eb605ece' './custom.c' seems to not read the file and just performs a default gcc build.

an example mimic_cache file:

ti configuration 
tries 
/opt/ti/ccs/tools/compiler/c2000/bin/cl2000
c-header
with arguments:
-----
prepend --c89
ienvironment CPATH
ienvironment C_INCLUDE_PATH
append -I/opt/ti/ccs/ccs/tools/compiler/c2000/bin/../include/
predefine __signed_chars__=1	/* Predefined */
predefine __DATE__="Nov 13 2023"	/* Predefined */
predefine __TIME__="22:52:14"	/* Predefined */
predefine __edg_front_end__=1	/* Predefined */
predefine __TI_COMPILER_VERSION__=6001003	/* Predefined */
predefine __COMPILER_VERSION__=6001003	/* Predefined */
predefine __TMS320C2000__=1	/* Predefined */
predefine _TMS320C2000=1	/* Predefined */
predefine __TMS320C27XX__=1	/* Predefined */
predefine _TMS320C27XX=1	/* Predefined */
predefine __TMS320C27X__=1	/* Predefined */
predefine _TMS320C27X=1	/* Predefined */
predefine __SIZE_T_TYPE__=unsigned long	/* Predefined */
predefine __PTRDIFF_T_TYPE__=long	/* Predefined */
predefine __WCHAR_T_TYPE__=unsigned int	/* Predefined */
predefine __little_endian__=1	/* Predefined */
predefine __TI_STRICT_ANSI_MODE__=1	/* Predefined */
predefine __TI_WCHAR_T_BITS__=16	/* Predefined */
predefine __TI_GNU_ATTRIBUTE_SUPPORT__=0	/* Predefined */
predefine __TI_STRICT_FP_MODE__=1	/* Predefined */
predefine _OPTIMIZE_FOR_SPACE=1
prepend --ti

The extractor doesn't appear to do anything with this file; even if i pass in a blank file to the extractor it will just compile with gcc and not use any of the contents of this file. If you know any more about how to instruct the extractor to use a mimic_cache file please share

@aibaars
Copy link
Contributor

aibaars commented Nov 15, 2023

The extractor doesn't appear to do anything with this file; even if i pass in a blank file to the extractor it will just compile with gcc and not use any of the contents of this file. If you know any more about how to instruct the extractor to use a mimic_cache file please share

Ah yeah, you're right. I thought the extractor would take the information from that file and interpret it somehow. However, when I look at the example command line you gave earlier, it looks like the contents of that file have somehow been injected in the command line mostly as '-D.... /* Predefined */' arguments:

[E 06:33:32 5840] Processed command line: /root/bin/codeql/codeql/cpp/tools/linux64/extractor --mimic_config compiler_mimic_cache/fdf7fe580b4c -w --error_limit 1000 --disable_system_macros --variadic_macros --c89 --ti '-D__signed_chars__=1    /* Predefined */' '-D__DATE__="Nov 14 2023"     /* Predefined */' '-D__TIME__="06:32:09"        /* Predefined */' '-D__edg_front_end__=1   /* Predefined */' '-D__TI_COMPILER_VERSION__=6001003    /* Predefined */' '-D__COMPILER_VERSION__=6001003       /* Predefined */' '-D__TMS320C2000__=1  /* Predefined */' '-D_TMS320C2000=1        /* Predefined */' '-D__TMS320C28XX__=1  /* Predefined */' '-D_TMS320C28XX=1     /* Predefined */' '-D__TMS320C28X__=1   /* Predefined */' '-D_TMS320C28X=1 /* Predefined */' '-D__SIZE_T_TYPE__=unsigned long      /* Predefined */' '-D__PTRDIFF_T_TYPE__=long    /* Predefined */' '-D__WCHAR_T_TYPE__=unsigned int      /* Predefined */' '-D__little_endian__=1   /* Predefined */' '-D__TI_STRICT_ANSI_MODE__=1  /* Predefined */' '-D__TI_WCHAR_T_BITS__=16     /* Predefined */' '-D__TI_GNU_ATTRIBUTE_SUPPORT__=0        /* Predefined */' '-D__TI_STRICT_FP_MODE__=1    /* Predefined */' -D_INLINE=1 -D_OPTIMIZE_FOR_SPACE=1 -I/src -I/opt/ti/ccs/bios/packages/ti/bios/include -I/opt/ti/ccs/bios9/packages/ti/rtdx/include/c2000 -I/opt/ti/ccs/xdais/packages/ti/xdais -I/opt/ti/ccs/tools/compiler/c2000/bin/../include/ -- ./custom.c

These flags look like specific to the mimicked compiler (I suppose --ti stands for Texas Instruments). So things are probably partially working.

@TomShirley
Copy link

Yes it does seem that the extractor with --mimic will indeed run the cl2000 compiler as a first pass to glean info like the -D arguments, but then I assume it runs the gcc next. Without a way to instruct the extractor to not use gcc for the actual compile step but rather use the provided compiler I might be at a dead end.

Regardless thanks for your helpful replies to get to this point 🙏

@aibaars
Copy link
Contributor

aibaars commented Nov 15, 2023

Without a way to instruct the extractor to not use gcc for the actual compile step but rather use the provided compiler I might be at a dead end.

The extractor does not do the actual compile step. The CodeQL tracer intercepts the "exec" system call that would run the actual compile step. A typical tracer configuration would run the actual compile step first without changing any of its arguments, followed by an "extractor" call with arguments based on the ones from the intercepted "exec". As a result the build would run as it normally would, but in addition it also runs the extractor for each compilation step.

@TomShirley
Copy link

I agree that the normal (non codeql) compile step is being run with the ti compiler, which originates from the command="" that's passed into codeql database create, however the output when running the extractor contains these sorts of errors:

>/root/bin/codeql/codeql/cpp/tools/linux64/extractor --mimic /opt/ti/ccs/tools/compiler/c2000/bin/cl2000 -I/src  -I/opt/ti/bios/packages/ti/bios/include -I/opt/ti/ccs/bio/packages/ti/rtdx/include/c2000 -I/opt/ti/ccs/xdais/packages/ti/xdais -I/opt/ti/ccs/tools/compiler/c2000/include  ./custom.c
...
[E 22:42:09 7145] Warning[extractor-c++]: In construct_message: "/opt/ti/ccs1220/bios/packages/ti/bios/include/log.h", line 33: error: identifier "Arg" is undefined
      Arg val1;
      ^
...

which are the same errors i get when running gcc:

>gcc -I/src  -I/opt/ti/bios/packages/ti/bios/include -I/opt/ti/ccs/bio/packages/ti/rtdx/include/c2000 -I/opt/ti/ccs/xdais/packages/ti/xdais -I/opt/ti/ccs/tools/compiler/c2000/include  ./custom.c
/opt/ti/ccs1220/bios_5_42_01_09/packages/ti/bios/include/log.h:90:80: error: unknown type name 'Arg'
   90 | extern Void LOG_event5(LOG_Handle log, Arg arg0, Arg arg1, Arg arg2, Arg arg3, Arg arg4);
...

whereas running the ti compiler with

>/opt/ti/ccs1220/ccs/tools/compiler/c2000_6.1.3/bin/cl2000 -v28 --include_path="/opt/ti/ccs/tools/compiler/c2000/include" --include_path="/src" --include_path="/opt/ti/ccs/bios/packages/ti/bios/include" --include_path="/opt/ti/ccs/bios/packages/ti/rtdx/include/c2000" --include_path="/opt/ti/ccs/xdais/packages/ti/xdais"  ./custom.c

Runs fine and compiles. So this leads me to believe that the extractor as part of doing the trap compilations is using gcc and not my passed in --mimic binary. Maybe that's not right but I can't seem to force the extractor to use the ti compiler for the trap compilation using --mimic

@aibaars
Copy link
Contributor

aibaars commented Nov 16, 2023

I agree that the normal (non codeql) compile step is being run with the ti compiler, which originates from the command="" that's passed into codeql database create, however the output when running the extractor contains these sorts of errors:

I understand the confusion. If you look at the messages then they are similar to gcc but not exactly the same. The extractor is actually a C/C++ compiler frontend, so it will parse and type check the source code as a normal C/C++ would do, and print similar errors and warnings. The extractor has a ton of different flags to make it behave as close to the real compiler. For example it has gcc, clang, icc modes and many flags to toggle C/C++ language versions and features on/off.

Where is this Arg type defined? Is it in some header file of the cl2000 compiler, or is it some sort of built-in? You could try inserting additional include flags or -D definitions into the extractor's command to fix it. Note that it is merely a warning so it might not be worth fixing at this point. The extractor is pretty resilient against errors and is able to carry on extracting information from the source files even if there are errors. The quality will degrade if there are too many errors, and "catastrophic errors" in the log are always bad(these indicate an unrecoverable error, causing the rest of a file to be ignored).

@TomShirley
Copy link

ah that's interesting!

The missing Arg type is defined in std.h that's shipped with the ti ide. Which appears in their header as:

#if defined(_54_) || defined(_6x_)
typedef Int Arg;
#elif defined(_55_) || defined(_28_)
typedef void *Arg;
#else

I've since been able to solve these sort of missing types by adding an -D_TMS320XX to the extractor call (which defines internal ti constants so that necessary types are defined at compile time).

The next issue i'm hitting is that TI has this sort of preprocessor code througout many of their core header files:

#ifdef __cplusplus
extern "C" namespace std {
#endif 

And it seems that the compiler that the extractor calls is internally running in c++ mode (with this constant defined) and this is breaking compilation as I need the compiler to not define this as i'm building C code. I need a way to instruct the extractor to run in C mode essentially

I've tried passing in the option -U__cpluscplus to the extractor but that doesn't seem to work. I've naively tried -D__cplusplus="#undefine __cplusplus" but the stdout has a message

[E 12:14:47 42445] Warning[extractor-c++]: In construct_message: Warning: "__cplusplus" is predefined; attempted redefinition ignored

getting closer..

@aibaars
Copy link
Contributor

aibaars commented Nov 19, 2023

Getting closer indeed. There is probably some flag to toggle C or C++ mode. Not sure what it is though. You could try comparing a simple gcc vs g++ command and look in the build-tracer.log for ideas

@TomShirley
Copy link

TomShirley commented Nov 19, 2023

ok there seems to be a way to do this via --mimic

 /root/bin/codeql/codeql/cpp/tools/linux64/extractor --mimic /usr/bin/gcc -I/path -D__PTRDIFF_T_TYPE__=long -D '__SIZE_T_TYPE__=unsigned long' -D__LARGE_MODEL__=1 -D_TMS320XX=1 -D__WCHAR_T_TYPE__=long ./custom.c

or without --mimic, the extractor will use --gcc to force C mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants