Skip to content

Conversation

lcartey
Copy link
Collaborator

@lcartey lcartey commented Aug 27, 2025

Description

This pull request improves the identification of reserved names across the supported languages. I originally worked on this last year, and putting this up now as a draft PR for visibility. The main contributions are:

  • Adopting extensible predicates and data extension files to store the list of names defined by a given language standard - models-as-data is a clearer, more performant and more extensible way to describe the list of names defined in a language standard.
  • Creating a data extension files generator for C standard library names - this generator takes content copied from the C standard documentation and processes it to produce a data extension file with the identified APIs. As some cases are not fully scrapable from the documentation, we also augment this with a manual file with a few additional APIs.
  • Updating the existing C++ generator to produce data extension files - we had an existing generator for C++ which produced a hard-coded .qll file. This has been adapted to produce data extension files instead. It uses a "real" codebase to identify which APIs are available
  • Modifying the C++ data extension generator to filter out more internal and non-relevant APIs - many of the APIs produced by the generator previously were not considered to be reserved names by the standard - for example, internal names or outside the namespace std. These are now filtered out.
  • Improving the reporting of "owning" header for the C++ data extension generator - internally standard libraries often implement APIs in private headers which are then included by the header that should define it. The generator attempts to find the "best" header for any given API name in the standard library to improve reporting.
  • Reimplementing reserved name detection for C - a ReservedName.qll has been created which implements reserved name detection/reuse for C, based on the C Standard and the MISRA specific rules. This is adopted by MISRA Rules 21.1 and 21.2, AUTOSAR A17-0-1 and CERT C DCL37-C.
  • Reimplemented reserved name detection for C++ - ReservedName.qll is extended to support C++ reserved name detection. This has not yet been adopted by the various C++ reserved name queries - I recall that it was challenging to determine how to handle many of the edge cases due to a lack of clarity in both the language standard and the various Coding Standards.

Existing Issue references:

Change request type

  • Release or process automation (GitHub workflows, internal scripts)
  • Internal documentation
  • External documentation
  • Query files (.ql, .qll, .qls or unit tests)
  • External scripts (analysis report or other code shipped as part of a release)

Rules with added or modified queries

  • No rules added
  • Queries have been added for the following rules:
  • Queries have been modified for the following rules:
    • RULE-21-1
    • RULE-21-2
    • A17-0-1
    • DCL37-C

Release change checklist

A change note (development_handbook.md#change-notes) is required for any pull request which modifies:

  • The structure or layout of the release artifacts.
  • The evaluation performance (memory, execution time) of an existing query.
  • The results of an existing query in any circumstance.

If you are only adding new rule queries, a change note is not required.

Author: Is a change note required?

  • Yes
  • No

🚨🚨🚨
Reviewer: Confirm that format of shared queries (not the .qll file, the
.ql file that imports it) is valid by running them within VS Code.

  • Confirmed

Reviewer: Confirm that either a change note is not required or the change note is required and has been added.

  • Confirmed

Query development review checklist

For PRs that add new queries or modify existing queries, the following checklist should be completed by both the author and reviewer:

Author

  • Have all the relevant rule package description files been checked in?
  • Have you verified that the metadata properties of each new query is set appropriately?
  • Do all the unit tests contain both "COMPLIANT" and "NON_COMPLIANT" cases?
  • Are the alert messages properly formatted and consistent with the style guide?
  • Have you run the queries on OpenPilot and verified that the performance and results are acceptable?
    As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
  • Does the query have an appropriate level of in-query comments/documentation?
  • Have you considered/identified possible edge cases?
  • Does the query not reinvent features in the standard library?
  • Can the query be simplified further (not golfed!)

Reviewer

  • Have all the relevant rule package description files been checked in?
  • Have you verified that the metadata properties of each new query is set appropriately?
  • Do all the unit tests contain both "COMPLIANT" and "NON_COMPLIANT" cases?
  • Are the alert messages properly formatted and consistent with the style guide?
  • Have you run the queries on OpenPilot and verified that the performance and results are acceptable?
    As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
  • Does the query have an appropriate level of in-query comments/documentation?
  • Have you considered/identified possible edge cases?
  • Does the query not reinvent features in the standard library?
  • Can the query be simplified further (not golfed!)

lcartey added 30 commits August 27, 2025 10:43
This commit adds a module for representing names from the C/C++
standard libraries. It uses models-as-data to represent the names, and
provides modules for accessing names in C99, C11 and C++14.
This adds a python script for taking Appendix B of the C Standard
Library and converting it to the StandardLibrayNames models-as-data
format.
This repurposes the existing module generator to instead generate a mad
file for the C++ standard library. It makes the following changes:
 * Omits names outside the `std` namespace (as they cannot be
   distinguished from system headers).
 * Removes the macro query, and adds member variable and type models
   instead.
 * Move to a new generator directory.
 * Update the script to generate a mad file instead of a .qll file.
Ensure models-as-data are output to the correct location.
Exclude more internal and non `std` models.
The library generator did not correctly parse function prototypes with
pointer return types. These are now appropriately parsed.
Appendix B of the spec doesn't include the member variables, but there's
only a few of them so we specify by hand.
 * Update message
 * Require appropriate include in the non-external linkage case.
 * Add extra tests
Library macros are not under user control
Remove unnecessary filter on start locations.
Exclude NDEBUG which is not actually defined by any header, and update
the generated files (include the outdated C99 file).
MISRA has slightly different rules to CERT, so unshare the rule.
Determine more accurately which header a declaration belongs to:
 * Identify "closest" imported header
 * Use manual mapping to disambiguate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant