Skip to content

Conversation

@MathiasVP
Copy link
Collaborator

Turns out all the .NET methods are documented in a nice and parsable XML format here, and the PowerShell SDK is similarly documented here. With the help of Copilot I managed to write a Python script to extract all the methods and store them as data extensions so that they can be consumed by models-as-data to provide type information.

@dilanbhalla
Copy link
Collaborator

Where's the python script? Also that second link in your PR description doesn't work for me. Asking about the script btw because I would like to get it automated/running regularly so that we these model.yml files do not become stale.

@MathiasVP
Copy link
Collaborator Author

Where's the python script?

Uh, good point. Happy to add this script somewhere. Do you have any preference for where to put it? Maybe a misc folder in https://github.com/microsoft/codeql/tree/main/powershell? Or in https://github.com/microsoft/codeql/tree/main/misc/codegen/generators (which also exists on github/codeql so we risk them changing the folder structure in the future).

Also that second link in your PR description doesn't work for me. Asking about the script btw because I would like to get it automated/running regularly so that we these model.yml files do not become stale.

You need to join the MicrosoftDocs org. I can send you a link offline 🙂

@MathiasVP
Copy link
Collaborator Author

I've added the Python script as well as a new folder with some instructions on how to run it. Does that look good to you @dilanbhalla? Happy to move this somewhere else if you prefer!

@bdrodes
Copy link

bdrodes commented Feb 19, 2025

Chatting offline about this, but I honestly don't know how to read these codeql models. It might be nice to have an example pulled out (or a few) and explain what it is providing? I've not really used/developed these before so it's hard for me to take in what new capabilities now exist and the limits on those capabilities.

@MathiasVP
Copy link
Collaborator Author

Happy to provide more context on this in a call at some point (and even write stuff down in a more permanent place). What I can quickly say is:

What is a data extension?

You can read about data extensions in general here (and there is a separate document for other languages which provide models-as-data support since that's the killer feature for data extensions).

In our case, we can read a data extension such as:

extensions:
  - addsTo:
      pack: microsoft-sdl/powershell-all
      extensible: typeModel
    data:
    - ["System.Xml.XPath.XPathExpression", "System.Xml.XPath.XPathExpression", "Method[Clone].ReturnValue"]
  • It's an extension to the predicate typeModel declared in QL here which is defined in the microsoft-sdl/powershell-all qlpack.
  • It adds one results to the predicate so that after this data extension has been consumed by the CLI the following holds:
    • typeModel("System.Xml.XPath.XPathExpression", "System.Xml.XPath.XPathExpression", "Method[Clone].ReturnValue").

What does this mean?

They don't really have any meaning per se. Instead, we assign them meaning by the way we use the typeModel predicate which now has lots of results. The meaning we assign to typeModel(type2, type1, action) is the following:

If you have a variable x of type type1 and "perform" action then you obtain something of type type2.

At the moment action can be a method call or a property/field read in PowerShell. So for instance, with the above interpretation this:
typeModel("System.Xml.XPath.XPathExpression", "System.Xml.XPath.XPathExpression", "Method[Clone].ReturnValue") can be interpreted as:

If you have a variable x of type System.Xml.XPath.XPathExpression and you look up the method Clone and call it, you'll get something back of type "System.Xml.XPath.XPathExpression".

And thus, the analysis is now able to know that the return value of Clone has type System.Xml.XPath.XPathExpression when the qualifier has type System.Xml.XPath.XPathExpression.

Why is this useful?

Having type information is very useful when working with flow sources. For example, it's pretty clear that reading data from an UDP connection is a source of user input. So this method call returns user-controlled data: https://learn.microsoft.com/en-us/dotnet/api/system.net.sockets.udpclient.receive?view=net-9.0#system-net-sockets-udpclient-receive(system-net-ipendpoint@)

But in order to know that we're calling that method we need to know that the qualifier of the call is an UdpClient. Type information to the rescue!

@dilanbhalla
Copy link
Collaborator

Thanks @MathiasVP, appreciate you adding the script! I can set up a workflow for that now so this data stays synced, appreciate it. Approving.

@MathiasVP MathiasVP merged commit 61796da into main Feb 19, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants