Skip to content

Commit 686feb6

Browse files
committed
Fix1 for issues identified in peer review
Removes the unused '@sap/cds-dk` dependency from `cds/tools/package.json` file. Fixes typos in comments and documentation for the CDS extractor. Removes crossed-out wording from `autobuild.md` documentation and improves descriptions of goals.
1 parent deade1c commit 686feb6

File tree

6 files changed

+44
-41
lines changed

6 files changed

+44
-41
lines changed

extractors/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@
22

33
## CodeQL CDS Extractor : Overview
44

5-
The CodeQL CDS Extractor is a specialized component designed to process and analyze Core Data Services (CDS) files used in SAP Cloud Application Programming (CAP) model applications. This extractor enables CodeQL's static analysis capabilities to detect security vulnerabilities, bugs, and quality issues in CDS files.
5+
The CodeQL CDS Extractor is a specialized component designed to process and analyze Core Data Services (CDS) files used in SAP Cloud Application Programming (CAP) model applications. This extractor expands CodeQL's static analysis capabilities to detect security vulnerabilities, bugs, and quality issues in CDS files.
66

77
Key capabilities of the extractor include:
8+
89
- Compiling `.cds` files to an intermediate JSON representation
910
- Handling SAP CAP dependencies and managing compiler versions
1011
- Integrating with the JavaScript extractor for comprehensive analysis

extractors/cds/tools/autobuild.md

Lines changed: 21 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -14,25 +14,24 @@ This document is meant to be a common reference and a project guide while the it
1414

1515
The current extractor for [CDS] is based on `index-files`, which has several limitations and challenges:
1616

17-
1. **Testability**
18-
19-
The current extractor is difficult to test, and especially difficult to troubleshoot when tests fail, because the current implementation lacks unit tests and relieas heavily on integration tests that are performed in a post-commit workflow that runs via GitHub Actions, which makes it more difficult to track errors back to the source of the problem and adds significant delay to the development process.
20-
21-
2. **Performance**
17+
1. **Performance**
2218

2319
The current extractor is slow and inefficient, especially when dealing with large projects or complex [CDS] files. This is due to the way `index-files` processes files, which can lead to long processing times and increased resource usage. There are several performance improvements that could be made to the extractor, but they are all related to avoid work that we either do not need to do or that has already been done.
2420

2521
- As one example of a performance problem, using the `index-files` approach means that we are provided with a list of all `.cds` files in the project and are expected to index them all, which makes sense for CodeQL (as we want our database to have a copy of every in-scope source code file) but is horribly inefficient from a [CDS] perspective as the [CDS] format allows for a single file to contain multiple [CDS] definitions. The extractor is expected to be able to handle this by parsing the declarative syntax of the `.cds` file in order to understand which other `.cds` files are to be imported as part of that top-level file, meaning that we are expected to avoid duplicate imports of files that are already (and only) used as library-style imports in top-level (project-level) [CDS] files. This is a non-trivial task, and the current extractor does not even try to parse the contents of the `.cds` files to determine which files are actually used in the project. Instead, it simply imports all `.cds` files that are found in the project, which can lead to duplicate imports and increased processing times.
2622

2723
- Another example of a performance problem is that the current `index-files`-based extractor spends a lot of time installing node dependencies because it runs a `npm install` command in every "CDS project directory" that it finds, which is every directory that contains a `package.json` file and either directly contains a `.cds` file (as a sibling of the `package.json` file) or contains some subdirectory that contains either a `.cds` file or a subdirectory that contains a `.cds` file. This means that the extractor will install these dependencies in a directory that we would rather not make changes in just to be able to use a specific version of `@sap/cds` and/or `@sap/cds-dk` (the dependencies that are needed to run the extractor). This also means that if we have five project that all use the same version of `@sap/cds` and/or `@sap/cds-dk`, we will install that version five separate times in five separate locations, which is both a waste of time and creates a cleanup challenge as the install makes changes to the `package-lock.json` file in each of those five project directories (and also makes changes to the `node_modules` subdirectory of each project directory).
2824

29-
3. **Modularity**
25+
2. **Precision**
26+
27+
The root-causes of the `Performance` problems can also cause CDS-specific CodeQL queries to produce false-positives in some cases.
28+
The `.cds` files for a given project must be parsed as a set of related configurations, rather than as independent definitions, in order to avoid false-positives in some CodeQL queries. For example:
3029

31-
~~The current extractor is mostly just one giant script, aka [index-files.js](./index-files.js), which is surrounded by a collection of small wrapper scripts (aka [index-files.sh](./index-files.sh) and [index-files.cmd](./index-files.cmd)) that are used to allow the JavaScript code to be run in different environments (i.e. Windows and Unix-like environments). While we cannot really get away from the wrapper scripts. we should refactor the "one giant script" (in a single `index-files.js` file) into a more modular design that allows us to break the extractor into smaller, more manageable pieces.~~
30+
- [bookshop/srv/admin-service.cds](https://github.com/SAP-samples/cloud-cap-samples/blob/main/bookshop/srv/admin-service.cds) is reported by [EntityExposedWithoutAuthn.ql](https://github.com/advanced-security/codeql-sap-js/blob/main/javascript/frameworks[…]hn-authz/EntityExposedWithoutAuthn/EntityExposedWithoutAuthn.ql) as unprotected. This result is actually a false-positive as the service (flagged in the query result) is annotated as `@requires: 'admin'` in a separate [bookshop/srv/access-control.cds](https://github.com/SAP-samples/cloud-cap-samples/blob/main/bookshop/srv/access-control.cds) file (from the same project).
3231

33-
4. **Maintainability**
32+
- Running the current implementation of the CDS extractor for the `bookshop` project will create `admin-service.cds.json` (from `admin-service.cds`) -- where the service is represented without access control; and will also create `access-control.cds.json` (from `access-control.cds`) -- which represent the service again but with access control.
3433

35-
~~The current implementation is lacking in terms of mandating consistent code style and best practices. For example, there are no linting rules applied or any scripts for applying consistent code style. This makes it difficult to maintain the code at a consistent level of quality, where it would be much better to have basic linting applied as a pre-commit task (i.e. to be performed in the developer's IDE). The current implementation also lacks documentation, which makes it difficult for new developers to understand how the extractor works and how to contribute to it.~~
34+
- In an improved CDS extractor, compiling the whole of the `bookshop` project together should allow us to produce a single `.cds.json` file -- with a single representation of the admin service that it is correctly annotated as having access control.
3635

3736
## Goals for the Future Extractor (using `autobuild`)
3837

@@ -41,43 +40,34 @@ The main goals for the `autobuild`-based [CDS] extractor are to:
4140
1. **Improve the Performance of Running the [CDS] Extractor on Large Codebases**:
4241
The performance problems with the current `index-files`-based [CDS] extractor are compounded when running the extractor on large codebases, where the duplicate import problem is magnified in large projects that make heavy use of library-style imports. The `autobuild`-based extractor will be able to avoid this problem by using a more efficient approach to parsing the `.cds` files and determining which files are actually used in the project. This will allow us to avoid duplicate imports and reduce processing times.
4342

44-
2. **Improve the Testability of the [CDS] Extractor**:
45-
The `autobuild`-based extractor will be designed to be more testable, with a focus on unit tests and integration tests that can be run in a pre-commit workflow. This will allow us to catch errors early in the development process and make it easier to maintain the code over time. The new extractor will also be designed to be more modular, with a focus on breaking the code into smaller, more manageable pieces that can be tested independently.
43+
2. **Improve the Precision of Query Results for [CDS] Services**:
44+
The precision problems of the current [CDS] extractor are also compounded when running the extractor for complex [CAP] projects and/or large codebases, where a lack of project-aware-parsing has a cascading effect as some projects may be imported by other projects and/or may contain multiple `.cds` files that are related to each other. The `autobuild`-based extractor will be able to avoid this problem by using a more efficient approach to parsing the `.cds` files and determining which files are actually used in the project. This will allow us to avoid false-positives in some CodeQL queries and improve the precision of query results for [CDS] services.
4645

47-
All other goals are secondary to and/or in support of the above two goals.
46+
All other goals are secondary to and/or in support of the above goals.
4847

4948
## Expected Technical Changes
5049

5150
- The `autobuild.ts` script/code will need to be able to determine its own list of `.cds` files to process when given a "source root" directory to be scanned (recursively) for `.cds` files and will have to maintain some form of state while determining the most efficient way to process all of the applicable [CDS] statements without duplicating work. This will be done by using a combination of parsing the `.cds` files and using a cache to keep track of which files have already been processed. The cache will be stored in a JSON file that will be created and updated as the extractor runs. This will allow the extractor to avoid re-processing files that have already been processed, which will improve performance and reduce resource usage.
5251

53-
- Keep track of the unique set of `@sap/cds` and `@sap/cds-dk` dependency combinations that are used by any "project directory" found under the "source root" directory. Also, create a temporary directory structure for storing the `package.json`, `package-lock.json`, and `node_modules` subdirectory for each unique combination of `@sap/cds` and `@sap/cds-dk` dependencies. This will allow us to avoid installing the same version of these dependencies multiple times in different project directories, which will improve performance and reduce resource usage. The temporary directory structure will be created in a subdirectory of the "source root" directory, and will be cleaned up after the extractor has finished running. This will allow us to be much more efficient in terms of installing [CDS] compiler dependencies, much more explicit about which version of the [CDS] compiler we are using for a given (sub-)project, will allow us to avoid making changes to the `package.json` and `package-lock.json` files in the project directories, and will allow us to avoid installing the same version of these dependencies multiple times in different project directories.
52+
- Instead of installing node dependencies directly in each CDS project directory, the CDS extractor should keep track of the unique set of `@sap/cds` and `@sap/cds-dk` dependency combinations that are used by any "project" directory found under the "source root" directory. For each unique combination of `@sap/cds` and `@sap/cds-dk` dependencies, the CDS extractor should also create a (.hidden) directory structure to cache the associated `package.json`, `package-lock.json`, and `./node_modules/`. This will allow the CDS extractor to:
53+
- be much more efficient in terms of installing [CDS] compiler dependencies;
54+
- be much more explicit about which version of the [CDS] compiler we are using for a given (sub-)project;
55+
- avoid making changes to the `package.json` and `package-lock.json` and `node_modules/` within the project directories;
56+
- avoid installing the same version of these dependencies multiple times;
57+
- avoid installing project dependencies that we do not actually need for the purpose of running the [CDS] compiler;
58+
- reduce the overall time it takes to run the [CDS] extractor;
59+
- minimize and restrict any changes made on the system where the [CDS] extractor is run.
5460

5561
- Use a new `autobuild.ts` script as the main entry point for the extractor's TypeScript code, meaning that the build process will compile the TypeScript code in `autobuild.ts` to JavaScript code in `autobuild.js`, which will then be run as the main entry point for the extractor. Instead of `index-files.cmd` and `index-files.sh`, we will have wrapper scripts such as `autobuild.cmd` and `autobuild.sh` that will be used to run the `autobuild.js` script in different environments (i.e. Windows and Unix-like environments).
5662

5763
- The new [autobuild.ts](./autobuild.ts) script will be a kept as minimal as possible, with object-oriented code patterns used to encapsulate the functionality of the extractor in `.ts` files stored in a new `src` directory (project path would be `extractors/cds/tools/src`). This will allow us to break the extractor into smaller, more manageable pieces, and will also make it easier to test and maintain the code over time. The new `src` directory will contain all of the TypeScript code for the extractor, and will be organized into subdirectories based on functionality. For example, we might have a `parsers` subdirectory for parsing code, a `utils` subdirectory for utility functions, and so on. This will allow us to keep the code organized and easy to navigate.
5864

59-
- ~~Use TypeScript as the primary language for the extractor, rather than JavaScript. This will allow us to take advantage of TypeScript's type system and other features that make it easier to write and maintain code. Ultimately, we will still be using JavaScript when running the extractor, but we will use TypeScript to develop the extractor and then compile it to JavaScript for use in the CodeQL extractor. This will allow us to take advantage of TypeScript's type system and other features that make it easier to write, test, and maintain code. This will also allow us to use TypeScript's type system to catch errors at compile time rather than runtime, which will make the extractor more robust and easier to maintain.~~
60-
61-
- Add unit tests for everything that can be unit tested. This will allow us to catch errors early in the development process and make it easier to maintain the code over time. We will use a combination of testing frameworks to test the extractor as part of the pre-commit build process. This will allow us to catch errors early in the development process and make it easier to maintain the code over time. ~~Setting up such unit tests will require modifications to the `package.json` file to include the necessary dependencies and scripts for running the tests. We will also need to set up a testing framework, such as Jest or Mocha, to run the tests and report the results. To support all of this, we will create unit tests under a new `test` directory (project path would be `extractors/cds/tools/test`) that will contain all of the unit tests for the extractor. This will allow us to keep the tests organized and easy to navigate. The test directory will be organized into subdirectories based on functionality and mirroring the structure of the `src` directory. For example, if we add a `src/parsers/cdsParser.ts` file, we will also add a `test/parsers/cdsParser.test.ts` file that contains the unit tests for the `cdsParser.ts` file.~~ This will allow us to keep the tests organized and easy to navigate.
62-
63-
## Examples of Improved [CDS] Parsing
64-
65-
TODO
66-
67-
### Example 1: Parsing an `index.cds` [CDS] File with Multiple Definitions
68-
69-
```cds
70-
```
71-
72-
### Example 2: Parsing a `schema.cds` [CDS] File with Multiple Definitions
73-
74-
```cds
75-
```
76-
7765
## References
7866

7967
[CAP]: https://cap.cloud.sap/docs/about/
68+
[CDL]: https://cap.cloud.sap/docs/cds/cdl
8069
[CDS]: https://cap.cloud.sap/docs/cds/
8170

8271
- The [Cloud Application Programming][CAP] Model.
72+
- The [Conceptual Definition Language][CDL] [CDL] is a human-readable language for defining [CDS] models.
8373
- [Core Data Services][CDS] (CDS) in the Cloud Application Programming (CAP) Model.

extractors/cds/tools/index-files.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ then
2121
fi
2222

2323
# Set the _cwd variable to the present working directory (PWD) as the directory
24-
# from which this script was called, which we assume is the "sourc rooot" directory
24+
# from which this script was called, which we assume is the "source root" directory
2525
# of the project that to be scanned / indexed.
2626
_cwd="$PWD"
2727
_response_file_path="$1"

extractors/cds/tools/index-files.ts

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -74,10 +74,10 @@ if (
7474
}
7575

7676
// Read and validate response files.
77-
let responseFiles: string[] = [];
77+
let cdsFilePathsToProcess: string[] = [];
7878
try {
79-
responseFiles = readResponseFile(responseFile);
80-
if (!responseFiles.length) {
79+
cdsFilePathsToProcess = readResponseFile(responseFile);
80+
if (!cdsFilePathsToProcess.length) {
8181
const codeqlExe = osPlatform === 'win32' ? 'codeql.exe' : 'codeql';
8282
console.warn(
8383
`'${codeqlExe} database index-files --language cds' terminated early as response file '${responseFile}' is empty. This is because no CDS files were selected or found.`,
@@ -95,7 +95,7 @@ try {
9595
}
9696

9797
// Find all package.json directories that have a `@sap/cds` node dependency.
98-
const packageJsonDirs = findPackageJsonDirs(responseFiles);
98+
const packageJsonDirs = findPackageJsonDirs(cdsFilePathsToProcess);
9999

100100
// Install node dependencies in each directory.
101101
console.log('Pre-installing required CDS compiler versions ...');
@@ -107,7 +107,7 @@ const cdsCommand = determineCdsCommand();
107107
console.log('Processing CDS files to JSON ...');
108108

109109
// Compile each CDS file to JSON
110-
for (const rawCdsFilePath of responseFiles) {
110+
for (const rawCdsFilePath of cdsFilePathsToProcess) {
111111
try {
112112
// Use resolved path directly instead of passing through getArg
113113
const compilationResult = compileCdsToJson(rawCdsFilePath, sourceRoot, cdsCommand);

extractors/cds/tools/package.json

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@
1616
"test:coverage": "jest --coverage --collectCoverageFrom='src/**/*.ts'"
1717
},
1818
"dependencies": {
19-
"@sap/cds-dk": "^8.9.3",
2019
"child_process": "^1.0.2",
2120
"fs": "^0.0.1-security",
2221
"os": "^0.1.2",

extractors/cds/tools/src/cdsCompiler.ts

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,21 @@ export function determineCdsCommand(): string {
2727
// files for that sub-directory / project.
2828
try {
2929
execFileSync('cds', ['--version'], { stdio: 'ignore' });
30-
} catch {
31-
// If 'cds' command is not available, use npx to run it
30+
} catch (error) {
31+
// Check if the error is specifically about the command not being found
32+
const errorMsg = String(error);
33+
if (errorMsg.includes('command not found')) {
34+
// If 'cds' command is not available, use npx to run it
35+
console.log('CDS command not found, falling back to npx...');
36+
} else if (errorMsg.includes('ENOENT') || errorMsg.includes('not recognized')) {
37+
// If the error is related to the command not being recognized, use npx
38+
console.log('CDS command not recognized, falling back to npx...');
39+
} else {
40+
// For other errors, log them but still fall back to npx
41+
console.warn(
42+
`WARN: determining CDS command failed with error: ${errorMsg}. Falling back to npx...`,
43+
);
44+
}
3245
cdsCommand = 'npx -y --package @sap/cds-dk cds';
3346
}
3447
return cdsCommand;

0 commit comments

Comments
 (0)