-
Notifications
You must be signed in to change notification settings - Fork 2
CDS Extractor Rewrite Phase 2 : Improve Performance and Precision #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CDS Extractor Rewrite Phase 2 : Improve Performance and Precision #195
Conversation
This commit: - Implements an initial version of a project-aware CDS parser. - Creates a dedicated "cds" package at "extractors/cds/tools/src/cds". - Converts existing unit tests to use the new path for functions related to parsing and/or compiling .cds files.
This commit: - fixes a typo in a comment, as identified in a previous PR ( advanced-security#188 ); - updates the logic of the CDS extractor's `findPackageJsonDirs` function; - fixes a regression in the CDS extractor where a "project directory" was not properly recognized when its path was the same as the "source root" directory for the CDS extractor scan; - adds unit tests to cover edge cases idendified for the `findPackageJsonDirs` function.
Renames the entrypoint to the CDS extractor script and refactors its arguments in order to support using different "run modes" for the extractor, including: - "autobuild" : work-in-progress, just a stub right now; - "debug-parser" : using for debugging CDS project & file parsing; - "index-files" : legacy mode, useful for backwards compatibility; Updates the usage (help) message for the script to represent the required arguments for each of the currently planned run modes. Adds support for the "debug-parser" run mode, which debugs to a file under the `extractors/cds/tools/out/debug/` directory. Useful for in-progress rewrite of the CDS extractor to be more performant when running and more useful in terms of yielding a CodeQL database that allows for high-precision query results for CDS projects/queries.
Adds extended unit tests for the "parser" component of the CDS extractor, using the CDS projects nested under this repository's `javascript/frameworks/cap/test/queries` directory as testing targets and reference points for test cases.
Adds more extensive unit tests of CDS extractor code related to the use of the `cds` compiler. Adds unit tests for CDS extractor functions in "projectMapping.ts".
Fixes the setup of the CDS extractor environment to ensure that the codeql CLI can be reliably found and to avoid duplicate runs of the CDS parser's graph building process for "debug-parser" versus other run modes.
Cleans up DEBUG logging and improves existing CDS extractor logging in order to provide more useful indications of the CDS compiler version used to compile a given `*.cds.json` file.
Initial attempt to use the `cds compile` CLI command in a way that allows for de-duplication of individual `.cds` files that are already included by another `.cds` file in the project.
This commit: - Implements an initial version of a project-aware CDS parser. - Creates a dedicated "cds" package at "extractors/cds/tools/src/cds". - Converts existing unit tests to use the new path for functions related to parsing and/or compiling .cds files.
Renames the entrypoint to the CDS extractor script and refactors its arguments in order to support using different "run modes" for the extractor, including: - "autobuild" : work-in-progress, just a stub right now; - "debug-parser" : using for debugging CDS project & file parsing; - "index-files" : legacy mode, useful for backwards compatibility; Updates the usage (help) message for the script to represent the required arguments for each of the currently planned run modes. Adds support for the "debug-parser" run mode, which debugs to a file under the `extractors/cds/tools/out/debug/` directory. Useful for in-progress rewrite of the CDS extractor to be more performant when running and more useful in terms of yielding a CodeQL database that allows for high-precision query results for CDS projects/queries.
Adds extended unit tests for the "parser" component of the CDS extractor, using the CDS projects nested under this repository's `javascript/frameworks/cap/test/queries` directory as testing targets and reference points for test cases.
Adds more extensive unit tests of CDS extractor code related to the use of the `cds` compiler. Adds unit tests for CDS extractor functions in "projectMapping.ts".
Fixes the setup of the CDS extractor environment to ensure that the codeql CLI can be reliably found and to avoid duplicate runs of the CDS parser's graph building process for "debug-parser" versus other run modes.
Cleans up DEBUG logging and improves existing CDS extractor logging in order to provide more useful indications of the CDS compiler version used to compile a given `*.cds.json` file.
Initial attempt to use the `cds compile` CLI command in a way that allows for de-duplication of individual `.cds` files that are already included by another `.cds` file in the project.
…/codeql-sap-js into data-douser/cds-ts-rewrite-2
Updates the mermaid flowchart for the CDS extractor in order to reflect recent changes to how the CDS extractor actually works.
Fixes detection of .cds file in CDS projects by ensuring that "node_modules" subdirectories are explicitly ignored and "srv" and "db" subdirectories are explicitly included. Migrates some logic from cds-extractor.ts (entrypoint) script to testable functions under extractors/cds/tools/src/ directory. Adds and improves unit tests related to code changes from this commit.
Removes an unintended change in CDS compile (to .cds.json) behavior due to the (mis)use of the "--parse" command. Fixes a regression in the expected query results in at least one case: `javascript/frameworks/cap/src/sensitive-exposure/SensitiveExposure.ql`
Refactors cds extractor `src/cds/compiler` and `src/cds/parser` packages for improved maintainability. Simplifies the main logic of the CDS extractor such that we always build a graph that maps CDS projects to their imports / dependencies, which is part of the longer process of deprecating the "index-files" run mode of the CDS extractor (in favor of autobuild, eventually). Attempts to fix CDS file and project parsing for test projects such as: `javascript/frameworks/cap/test/queries/loginjection/log-injection-without-protocol-none`
Fixes a regression where the project base directory was being used to set the `cwd` of the process spawned for running the CDS compiler for "project-aware" compilation. Adds unit tests to ensure the `cwd` is always set to the value of the `sourceRoot` directory. Further refactoring of the `cds/compiler` and `cds/parser` packages within the source code of the CDS extractor. This commit is expected to actually cause more problems with existing queries, despite fixing the relative-file-path problem / regression. Some changes to existing CodeQL queries and/or expected results may be required as, at this point, the JSON data generated by the CDS compiler (via the CDS extractor) seems valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the Phase 2 rewrite of the CDS extractor to improve performance and precision by introducing a project-aware parsing and compilation workflow, and by unifying scripts into a single cds-extractor.ts
entry point with multiple run modes.
- Added a new
cds-extractor.ts
script withindex-files
,debug-parser
, andautobuild
modes - Refactored parsing and compilation logic into modular
src/cds/parser
andsrc/cds/compiler
packages - Updated shell and batch wrappers, ESLint config,
package.json
, and documentation to reference the new script and modes
Reviewed Changes
Copilot reviewed 46 out of 46 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
extractors/cds/tools/src/cdsCompiler.ts | Removed legacy compiler utility in favor of new modules |
extractors/cds/tools/src/cds/parser/types.ts | Added parser data types for entities, imports, services |
extractors/cds/tools/src/cds/parser/index.ts | Exported parser APIs |
extractors/cds/tools/src/cds/parser/graph.ts | Implemented project dependency graph builder |
extractors/cds/tools/src/cds/parser/debug.ts | Added debug-info output for parser |
extractors/cds/tools/src/cds/index.ts | Re-exported compiler and parser |
extractors/cds/tools/src/cds/compiler/version.ts | Added function to retrieve CDS compiler version |
extractors/cds/tools/src/cds/compiler/types.ts | Extended compilation result type |
extractors/cds/tools/src/cds/compiler/project.ts | Added project lookup helper |
extractors/cds/tools/src/cds/compiler/index.ts | Exported compiler APIs |
extractors/cds/tools/src/cds/compiler/compile.ts | Core project-aware and individual compile logic |
extractors/cds/tools/src/cds/compiler/command.ts | Updated determineCdsCommand to accept cache dir |
extractors/cds/tools/package.json | Updated main script, dependencies, and scripts |
extractors/cds/tools/index-files.ts | Removed legacy TypeScript entry |
extractors/cds/tools/index-files.sh | Renamed to use cds-extractor.js and added run-mode |
extractors/cds/tools/index-files.cmd | Same updates for Windows batch wrapper |
extractors/cds/tools/eslint.config.mjs | Refactored imports and updated file patterns |
extractors/cds/tools/cds-extractor.ts | New unified entry point for all run modes |
extractors/cds/tools/.gitignore | Ignored debug output |
extractors/README.md | Updated diagram to reflect new script flow |
Comments suppressed due to low confidence (1)
extractors/cds/tools/src/cds/parser/graph.ts:22
- Add unit tests for
buildCdsProjectDependencyGraph
to cover scenarios with multiple nested projects and various import types, ensuring parser accuracy and preventing regressions.
export function buildCdsProjectDependencyGraph(
extractors/cds/tools/test/src/cds/parser/project-aware-compilation.test.ts
Fixed
Show fixed
Hide fixed
extractors/cds/tools/test/src/cds/parser/project-aware-compilation.test.ts
Fixed
Show fixed
Hide fixed
extractors/cds/tools/test/src/cds/parser/project-aware-compilation.test.ts
Fixed
Show fixed
Hide fixed
extractors/cds/tools/test/src/cds/parser/project-aware-compilation.test.ts
Fixed
Show fixed
Hide fixed
extractors/cds/tools/test/src/cds/parser/project-aware-compilation.test.ts
Fixed
Show fixed
Hide fixed
Fixes newly introduced code-scanning alerts due to insecure use of files created under the system `/tmp/` directory in some recently implemented unit tests.
cds-extractor.debug.log-injection-without-protocol-none.tgz The attached From what I can tell, we have the same data in the "old" versus "new", except that the "new" representation collapses all of that data into 1 file instead of 3. |
I noticed some tool-level errors related to node dependency installation failures. I suspect that many of these failures may actually be due to outdated (now deprecated) dependency versions in the associated project's I also noticed that these errors do not seem to prevent (this version of) the CDS extractor from continuing to try to use the This seems like odd behavior, but I am not sure it is wrong. The purpose of (code scanning) diagnostic errors (afaik) is to indicate that a problem was encountered that may cause some code results to be missing, and we probably want to make a best effort at compiling |
Creates tests and code for a new, unified `cdsExtractorLog` function and integrates this function throughout the CDS extractor code. Updates `test/jest.setup.ts` config for the CDS extractor in order to simplify setup of the source root directory config required by the new `cdsExtractorLog` function.
For the expected Code Scanning results that are currently missing for the The |
After further research, I think the Code Scanning results may be missing because the For testing of this |
This PR implements the planned "Phase 2" of the full rewrite of the CDS extractor, focusing on improving performance and precision. It introduces significant changes to the CodeQL CDS extractor, including a major refactor of the extraction process, updates to scripts, and improvements to configuration and debugging. Throughout this multi-phase rewrite process, the approach has been documented in the
extractors/cds/tools/autobuild.md
file.This changes of this PR do not fully implement the "autobuild" run mode for the CDS extractor, but it gets reasonably close. New "run modes" were added to the renamed
cds-extractor.ts
script (formerlyindex-files.ts
), and the arguments to the script have been update to allow for run modes such asindex-files
(legacy),debug-parser
(new), andautobuild
(planned / WIP).While staying within the limitations of the
index-files
approach, this changes in the PR are an attempt to integrate parsing and compiling of.cds
files in a manner that is "project aware", meaning that we try to only compile the top-level.cds
files in an effort to avoid duplication of both compilation work and indexed.cds.json
files.Key Changes:
New Features and Functionality:
cds-extractor.ts
: Added a new script to handle CDS file processing, including project dependency graph building, environment setup, and integration with CodeQL tools. This script replaces the previousindex-files.js
script.cds-extractor.ts
script with different "run mode" values, includingautobuild
,debug-parser
, andindex-files
.cds-extractor.ts
script has been rewritten to include features like project dependency graph building, project-aware compilation, and diagnostic handling for CDS files. This enables more efficient and context-aware processing of CDS files.debug-parser
run mode of thecds-extractor
(node) script.Script Updates:
index-files.cmd
) Updates: Updated references fromindex-files.js
tocds-extractor.js
, added_run_mode
parameter, and adjusted logging and execution commands to align with the new script. [1] [2] [3]index-files.sh
) Updates: Similar updates as the batch script, including parameter additions and script name changes for consistency. [1] [2] [3] [4]Configuration Improvements:
eslint.config.mjs
): Refactored imports for better readability, updated rules and plugin configurations, and added comments to clarify TypeScript and JavaScript-specific settings.Miscellaneous:
.gitignore
Update: Added an entry to ignoredebug/
files created during debugging of the CDS extractor.Documentation Updates:
README.md
now reflects the newcds-extractor
workflow, replacing outdated references toindex-files
with the new process and steps for project-aware compilation. [1] [2]