PS: Add .NET and PowerShell SDK type models. #171

MathiasVP · 2025-02-19T00:10:35Z

Turns out all the .NET methods are documented in a nice and parsable XML format here, and the PowerShell SDK is similarly documented here. With the help of Copilot I managed to write a Python script to extract all the methods and store them as data extensions so that they can be consumed by models-as-data to provide type information.

dilanbhalla · 2025-02-19T00:18:12Z

Where's the python script? Also that second link in your PR description doesn't work for me. Asking about the script btw because I would like to get it automated/running regularly so that we these model.yml files do not become stale.

MathiasVP · 2025-02-19T13:29:22Z

Where's the python script?

Uh, good point. Happy to add this script somewhere. Do you have any preference for where to put it? Maybe a misc folder in https://github.com/microsoft/codeql/tree/main/powershell? Or in https://github.com/microsoft/codeql/tree/main/misc/codegen/generators (which also exists on github/codeql so we risk them changing the folder structure in the future).

Also that second link in your PR description doesn't work for me. Asking about the script btw because I would like to get it automated/running regularly so that we these model.yml files do not become stale.

You need to join the MicrosoftDocs org. I can send you a link offline 🙂

MathiasVP · 2025-02-19T14:11:14Z

I've added the Python script as well as a new folder with some instructions on how to run it. Does that look good to you @dilanbhalla? Happy to move this somewhere else if you prefer!

bdrodes · 2025-02-19T15:01:51Z

Chatting offline about this, but I honestly don't know how to read these codeql models. It might be nice to have an example pulled out (or a few) and explain what it is providing? I've not really used/developed these before so it's hard for me to take in what new capabilities now exist and the limits on those capabilities.

MathiasVP · 2025-02-19T15:29:39Z

Happy to provide more context on this in a call at some point (and even write stuff down in a more permanent place). What I can quickly say is:

What is a data extension?

You can read about data extensions in general here (and there is a separate document for other languages which provide models-as-data support since that's the killer feature for data extensions).

In our case, we can read a data extension such as:

extensions:
  - addsTo:
      pack: microsoft-sdl/powershell-all
      extensible: typeModel
    data:
    - ["System.Xml.XPath.XPathExpression", "System.Xml.XPath.XPathExpression", "Method[Clone].ReturnValue"]

It's an extension to the predicate typeModel declared in QL here which is defined in the microsoft-sdl/powershell-all qlpack.
It adds one results to the predicate so that after this data extension has been consumed by the CLI the following holds:
- typeModel("System.Xml.XPath.XPathExpression", "System.Xml.XPath.XPathExpression", "Method[Clone].ReturnValue").

What does this mean?

They don't really have any meaning per se. Instead, we assign them meaning by the way we use the typeModel predicate which now has lots of results. The meaning we assign to typeModel(type2, type1, action) is the following:

If you have a variable x of type type1 and "perform" action then you obtain something of type type2.

At the moment action can be a method call or a property/field read in PowerShell. So for instance, with the above interpretation this:
typeModel("System.Xml.XPath.XPathExpression", "System.Xml.XPath.XPathExpression", "Method[Clone].ReturnValue") can be interpreted as:

If you have a variable x of type System.Xml.XPath.XPathExpression and you look up the method Clone and call it, you'll get something back of type "System.Xml.XPath.XPathExpression".

And thus, the analysis is now able to know that the return value of Clone has type System.Xml.XPath.XPathExpression when the qualifier has type System.Xml.XPath.XPathExpression.

Why is this useful?

Having type information is very useful when working with flow sources. For example, it's pretty clear that reading data from an UDP connection is a source of user input. So this method call returns user-controlled data: https://learn.microsoft.com/en-us/dotnet/api/system.net.sockets.udpclient.receive?view=net-9.0#system-net-sockets-udpclient-receive(system-net-ipendpoint@)

But in order to know that we're calling that method we need to know that the qualifier of the call is an UdpClient. Type information to the rescue!

dilanbhalla · 2025-02-19T18:01:27Z

Thanks @MathiasVP, appreciate you adding the script! I can set up a workflow for that now so this data stays synced, appreciate it. Approving.

PS: Add .NET and PowerShell SDK type models.

6ef0941

PS: Add the type model generation script and add a short readme.

3dbe7f4

dilanbhalla approved these changes Feb 19, 2025

View reviewed changes

MathiasVP merged commit 61796da into main Feb 19, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PS: Add .NET and PowerShell SDK type models. #171

PS: Add .NET and PowerShell SDK type models. #171

Uh oh!

MathiasVP commented Feb 19, 2025

Uh oh!

dilanbhalla commented Feb 19, 2025

Uh oh!

MathiasVP commented Feb 19, 2025

Uh oh!

MathiasVP commented Feb 19, 2025

Uh oh!

bdrodes commented Feb 19, 2025

Uh oh!

MathiasVP commented Feb 19, 2025

Uh oh!

dilanbhalla commented Feb 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PS: Add .NET and PowerShell SDK type models. #171

PS: Add .NET and PowerShell SDK type models. #171

Uh oh!

Conversation

MathiasVP commented Feb 19, 2025

Uh oh!

dilanbhalla commented Feb 19, 2025

Uh oh!

MathiasVP commented Feb 19, 2025

Uh oh!

MathiasVP commented Feb 19, 2025

Uh oh!

bdrodes commented Feb 19, 2025

Uh oh!

MathiasVP commented Feb 19, 2025

What is a data extension?

What does this mean?

Why is this useful?

Uh oh!

dilanbhalla commented Feb 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants