Skip to content

CDS Extractor Rewrite Phase 2 : Improve Performance and Precision #195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
5bafe3d
Refactor CDS extractor for dedicated "cds" package
data-douser May 15, 2025
e4c1ff0
Fix CDS extractor findPackageJsonDirs
data-douser May 15, 2025
7e58207
Rename CDS extractor entrypoint and refactor args
data-douser May 16, 2025
af80a68
Add self-parser.test.ts for CDS extractor
data-douser May 16, 2025
ea19649
CDS extractor tests for compiler & packageManager
data-douser May 17, 2025
f9e41aa
Fix CDS extractor environment setup
data-douser May 18, 2025
6412400
Improve CDS extractor logging
data-douser May 18, 2025
009fe42
First attempt at project-aware CDS compilation
data-douser May 18, 2025
2c68c8a
Refactor CDS extractor for dedicated "cds" package
data-douser May 15, 2025
8e09758
Rename CDS extractor entrypoint and refactor args
data-douser May 16, 2025
1dd464b
Add self-parser.test.ts for CDS extractor
data-douser May 16, 2025
729dd2e
CDS extractor tests for compiler & packageManager
data-douser May 17, 2025
0c75133
Fix CDS extractor environment setup
data-douser May 18, 2025
09fa955
Improve CDS extractor logging
data-douser May 18, 2025
c865d94
First attempt at project-aware CDS compilation
data-douser May 18, 2025
d629d1e
Merge branch 'data-douser/cds-ts-rewrite-2' of github.com:data-douser…
data-douser May 18, 2025
5aa2d54
Update node dependencies for CDS extractor
data-douser May 18, 2025
86b5572
Update CDS extractor flowchart diagram
data-douser May 18, 2025
d6a99da
Merge branch 'main' into data-douser/cds-ts-rewrite-2
data-douser Jun 8, 2025
bc82815
Fixes CDS extractor project-aware file detection
data-douser Jun 9, 2025
6315a49
Remove "--parse" from CDS compile command
data-douser Jun 10, 2025
bc4a2cd
Merge branch 'advanced-security:main' into data-douser/cds-ts-rewrite-2
data-douser Jun 10, 2025
27743ba
Simplify CDS extractor logic and refactor
data-douser Jun 10, 2025
7a05f80
Fix project-aware CDS compile file paths
data-douser Jun 11, 2025
af066d7
Merge branch 'advanced-security:main' into data-douser/cds-ts-rewrite-2
data-douser Jun 11, 2025
0ba67d5
Fix code-scanning alerts for insecure tmp files
data-douser Jun 11, 2025
0f1ac9e
Improve testing of CDS extractor graph
data-douser Jun 11, 2025
aaab73b
Update CDS extractor node dependencies
data-douser Jun 11, 2025
e4bbc1f
Implement cdsExtractorLog for consistent logging
data-douser Jun 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,5 @@ tmp/
**.testproj
dbs
*.cds.json
.cds-extractor-cache

50 changes: 32 additions & 18 deletions extractors/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,16 +30,20 @@ pre-finalize.sh`"]
JSE[[javascript extractor]]
DTRAC[codeql database<br>trace-command]
SPF[[pre-finalize.sh]]
DIDX[codeql database index-files<br> --language=cds<br>--include-extension=.cds]
SIF[[index-files.sh]]
SIT[[index-files.ts/js]]
NPM[[npm install & build]]
DETS[[Determine CDS command]]
FIND[[Find package.json dirs]]
INST[[Install dependencies]]
CC[[cds compiler]]
ABCMD[[autobuild.sh/cmd]]
ABT[[cds-extractor.ts/js]]
ENV[[setup & validate<br>environment]]
PDG[[build project<br>dependency graph]]
INSTC[[install dependencies<br>with caching]]
PROC[[process CDS files<br>to JSON]]
PMAP[[project-aware<br>dependency resolution]]
FIND[[find project for<br>CDS file]]
CDCMD[[determine CDS<br>command for project]]
COMP[[compile CDS<br>to JSON]]
CDJ([.cds.json files])
FILT[[configure LGTM<br>index filters]]
JSA[[javascript extractor<br>autobuild script]]
DIAG[[add compilation<br>diagnostics]]
TF([CodeQL TRAP files])
DBF[codeql database finalize<br> -- /path/to/database]

Expand All @@ -54,20 +58,30 @@ pre-finalize.sh`"]
JSE ==> |run autobuild within<br>the javascript extractor| DTRAC

DTRAC ==> |run the build --command| SPF
SPF ==> |run codeql index-files<br>for CDS files| DIDX
DIDX ==> |invoke script via<br>--search-path| SIF
SIF ==> |runs TypeScript version<br>after npm install| NPM
NPM ==> |executes compiled<br>index-files.js| SIT
SPF ==> |run autobuilder<br>for CDS files| ABCMD
ABCMD ==> |runs TypeScript version<br>of CDS extractor| ABT

SIT ==> |finds project directories<br>with package.json| FIND
FIND ==> |install CDS dependencies<br>in project directories| INST
SIT ==> |determines which<br>cds command to use| DETS
DETS ==> |processes each CDS file| CC
ABT ==> |setup and validate<br>environment first| ENV
ABT ==> |build project dependency<br>graph for source root| PDG
PDG ==> |analyze CDS projects<br>structure & relationships| PMAP

ABT ==> |efficiently install<br>required dependencies| INSTC
INSTC ==> |use cached approach for<br>dependency installation| PMAP

ABT ==> |process each CDS file<br>to generate JSON files| PROC
PROC ==> |find which project<br>contains this CDS file| FIND
FIND ==> |uses project-aware<br>dependency resolution| PMAP
FIND ==> |determine appropriate<br>CDS command for project| CDCMD

CDCMD ==> |compile CDS file to JSON<br>with project context| COMP
COMP ==> |generate JSON representation<br>with project awareness| CDJ
COMP --x |if compilation fails,<br>report diagnostics| DIAG
DIAG -.-> |diagnostics stored<br>in database| DB

CC ==> |compile .cds files to<br>create .cds.json files| CDJ
CDJ -.-> |stored in same location<br>as original .cds files| DB

SIT ==> |configures extraction<br>filters for JSON files| JSA
ABT ==> |configure extraction<br>filters for JSON files| FILT
ABT ==> |run JavaScript extractor<br>to process JSON files| JSA
JSA ==> |processes .cds.json files<br>via javascript extractor| CDJ

CDJ ==> |javascript extractor<br>generates TRAP files| TF
Expand Down
3 changes: 3 additions & 0 deletions extractors/cds/tools/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Ignore files create just for debugging the CDS extractor.
debug/

# Ignore the entire "out" directory as this is for the .js and .js.map files
# which are generated by the `tsc` build process. In the current project config,
# we require the platform-specific "index-files" shell/cmd script to run the
Expand Down
182 changes: 182 additions & 0 deletions extractors/cds/tools/cds-extractor.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
import {
buildCdsProjectDependencyGraph,
compileCdsToJson,
determineCdsCommand,
findProjectForCdsFile,
} from './src/cds';
import { CdsProjectMapWithDebugSignals } from './src/cds/parser/types';
import { runJavaScriptExtractor } from './src/codeql';
import { addCompilationDiagnostic } from './src/diagnostics';
import { configureLgtmIndexFilters, setupAndValidateEnvironment } from './src/environment';
import { cdsExtractorLog, setSourceRootDirectory } from './src/logging';
import { installDependencies } from './src/packageManager';
import { RunMode } from './src/runMode';
import { validateArguments } from './src/utils';

// Validate arguments to this script.
// The first argument we pass is the expected run mode, which will be extracted from process.argv[2]
// This will determine the correct minimum argument count for validation
const validationResult = validateArguments(process.argv, RunMode.AUTOBUILD);
if (!validationResult.isValid) {
console.warn(validationResult.usageMessage);
// Exit with an error code on invalid use of this script.
process.exit(1);
}

// Get the validated and sanitized arguments
const { runMode, sourceRoot } = validationResult.args!;

// Initialize the unified logging system with the source root directory
setSourceRootDirectory(sourceRoot);

// Check for autobuild mode
if (runMode === (RunMode.AUTOBUILD as string)) {
cdsExtractorLog('info', 'Autobuild mode is not implemented yet.');
process.exit(1);
}

// Setup the environment and validate all requirements first, before changing directory
// This ensures we can properly locate the CodeQL tools
const {
success: envSetupSuccess,
errorMessages,
codeqlExePath,
autobuildScriptPath,
platformInfo,
} = setupAndValidateEnvironment(sourceRoot);

if (!envSetupSuccess) {
const codeqlExe = platformInfo.isWindows ? 'codeql.exe' : 'codeql';
cdsExtractorLog(
'warn',
`'${codeqlExe} database index-files --language cds' terminated early due to: ${errorMessages.join(
', ',
)}.`,
);
// Exit with an error code when environment setup fails.
process.exit(1);
}

// Force this script, and any process it spawns, to use the project (source) root
// directory as the current working directory.
process.chdir(sourceRoot);

cdsExtractorLog(
'info',
`CodeQL CDS extractor using run mode '${runMode}' for scan of project source root directory '${sourceRoot}'.`,
);

// Using the new project-aware approach to find CDS projects and their dependencies
cdsExtractorLog('info', 'Detecting CDS projects and analyzing their structure...');

// Build the project dependency graph using the project-aware parser
// Pass the script directory (__dirname) to support debug-parser mode internally
const projectMap = buildCdsProjectDependencyGraph(sourceRoot, runMode, __dirname);

// Cast to the interface with debug signals to properly handle debug mode
const typedProjectMap = projectMap as CdsProjectMapWithDebugSignals;

// Check if we're in debug-parser mode and should exit (based on signals from buildCdsProjectDependencyGraph)
if (typedProjectMap.__debugParserSuccess) {
cdsExtractorLog('info', 'Debug parser mode completed successfully.');
process.exit(0);
} else if (typedProjectMap.__debugParserFailure) {
cdsExtractorLog('warn', 'No CDS projects found. Cannot generate debug information.');
process.exit(1);
}

// Install dependencies of discovered CAP/CDS projects
cdsExtractorLog(
'info',
'Ensuring dependencies are installed in cache for required CDS compiler versions...',
);
const projectCacheDirMap = installDependencies(projectMap, sourceRoot, codeqlExePath);

const cdsFilePathsToProcess: string[] = [];

cdsExtractorLog('info', 'Extracting CDS files from discovered projects...');

// Use the project map to collect all `.cds` files from each project.
// We want to "extract" all `.cds` files from all projects so that we have a copy
// of each `.cds` source file in the CodeQL database.
for (const [, project] of projectMap.entries()) {
cdsFilePathsToProcess.push(...project.cdsFiles);
}

cdsExtractorLog('info', 'Processing CDS files to JSON ...');

// Collect files that need compilation, handling project-level compilation
const cdsFilesToCompile: string[] = [];
const projectsForProjectLevelCompilation = new Set<string>();

for (const [projectDir, project] of projectMap.entries()) {
if (project.cdsFilesToCompile.includes('__PROJECT_LEVEL_COMPILATION__')) {
// This project needs project-level compilation
projectsForProjectLevelCompilation.add(projectDir);
// We'll only compile one file per project to trigger project-level compilation
// Use the first CDS file as a representative
if (project.cdsFiles.length > 0) {
cdsFilesToCompile.push(project.cdsFiles[0]);
}
} else {
// Normal individual file compilation
cdsFilesToCompile.push(...project.cdsFilesToCompile);
}
}

cdsExtractorLog(
'info',
`Found ${cdsFilePathsToProcess.length} total CDS files, ${cdsFilesToCompile.length} files to compile (${projectsForProjectLevelCompilation.size} project-level compilations)`,
);

// Evaluate each `.cds` source file that should be compiled to JSON.
for (const rawCdsFilePath of cdsFilesToCompile) {
try {
// Find which project this CDS file belongs to, to use the correct cache directory
const projectDir = findProjectForCdsFile(rawCdsFilePath, sourceRoot, projectMap);
const cacheDir = projectDir ? projectCacheDirMap.get(projectDir) : undefined;

// Determine the CDS command to use based on the cache directory for this specific file
const cdsCommand = determineCdsCommand(cacheDir);

// Use resolved path directly instead of passing through getArg
// Pass the project dependency information to enable project-aware compilation
const compilationResult = compileCdsToJson(
rawCdsFilePath,
sourceRoot,
cdsCommand,
cacheDir,
projectMap,
projectDir,
);

if (!compilationResult.success && compilationResult.message) {
cdsExtractorLog(
'error',
`adding diagnostic for source file=${rawCdsFilePath} : ${compilationResult.message} ...`,
);
addCompilationDiagnostic(rawCdsFilePath, compilationResult.message, codeqlExePath);
}
} catch (errorMessage) {
cdsExtractorLog(
'error',
`adding diagnostic for source file=${rawCdsFilePath} : ${String(errorMessage)} ...`,
);
addCompilationDiagnostic(rawCdsFilePath, String(errorMessage), codeqlExePath);
}
}

// Configure the "LGTM" index filters for proper extraction.
configureLgtmIndexFilters();

// Run CodeQL's JavaScript extractor to process the compiled JSON files.
const extractorResult = runJavaScriptExtractor(sourceRoot, autobuildScriptPath, codeqlExePath);
if (!extractorResult.success && extractorResult.error) {
cdsExtractorLog('error', `Error running JavaScript extractor: ${extractorResult.error}`);
}

// Use the `cds-extractor.js` name in the log message as that is the name of the script
// that is actually run by the `codeql database index-files` command. This TypeScript
// file is where the code/logic is edited/implemented, but the runnable script is
// generated by the TypeScript compiler and is named `cds-extractor.js`.
console.log(`Completed run of cds-extractor.js script for CDS extractor.`);
Loading
Loading