Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the OpenSSH variant analysis. #560

Merged
merged 6 commits into from
Aug 6, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Improving the OpenSSH variant analysis.
  • Loading branch information
pgoodman committed Jul 8, 2024
commit 111b508b4373e569808c0d842644afb518f1dd6e
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -21,6 +21,9 @@ We like to say that with its APIs, *you can get everywhere from anywhere*.
* [Getting and building the code](docs/BUILD.md)
* [Installing a pre-built release](docs/INSTALLING.md)
* [How to index a codebase](docs/INDEXING.md)
* Writeups
* [regreSSHion OpenSSH variant analysis](docs/openssh-variant-analysis.md)
* [PHP variant analysis](docs/php-variant-analysis.md)
* Included tools
* [Find function calls inside macro argument lists](docs/mx-find-calls-in-macro-expansions.md)
* [Find possible divergent representations](docs/mx-find-divergent-candidates.md)
@@ -29,6 +32,7 @@ We like to say that with its APIs, *you can get everywhere from anywhere*.
* [Find "sketchy" casts flowing to function arguments and to return sites](docs/mx-find-sketchy-casts.md)
* [Extract an entity, e.g. a function, and all of its dependencies into a file](docs/mx-harness.md)
* [Highlight a specific entity within its surrounding code](docs/mx-highlight-entity.md)
* [Highlight all references to an entity](docs/mx-highlight-references.md)
* [Print a call graph](docs/mx-print-call-graph.md)
* [Print the reference graph](docs/mx-print-reference-graph.md)
* [Print a graph relating source code, macros, parsed tokens, and AST nodes](docs/mx-print-token-graph.md)
@@ -42,9 +46,6 @@ We like to say that with its APIs, *you can get everywhere from anywhere*.
* [List all indexed structures/unions/classes/enums](docs/mx-list-structures.md)
* [List all indexed variables](docs/mx-list-variables.md)
* [Search the code with regular expressions](docs/mx-regex-query.md)
* Writeups
* [regreSSHion OpenSSH variant analysis](docs/openssh-variant-analysis.md)
* [PHP variant analysis](docs/php-variant-analysis.md)

# License

77 changes: 73 additions & 4 deletions docs/openssh-variant-analysis.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# Variant analysis of CVE-2024-6387
# Variant analysis of regreSSHion (CVE-2024-6387)

This variant analysis looks for calls to signal-unsafe functions by signal handlers.
This variant analysis looks for calls to signal-unsafe functions by signal handlers. Note that the scope of this analysis is limited to identifying the potential of apparently reachable unsafe paths, not verifying whether reachibility conditions, nor verifying whether or not those paths are exploitable.

This analysis starts with showing how to checkout and index the relevant version of the code, then how to discover the unsafe paths through the call graph manually using example tools, and finally how to use the Python API to script up a simple but generic checker for this kind of issue.

* [Getting and indexing the code](#getting-and-indexing-the-code)
* [Configuring the build](#configuring-the-build)
* [Building the code](#building-the-code)
* [Indexing the code](#indexing-the-code)
* [Manually finding the issue](#manually-finding-the-issue)
* [Finding `SIGALRM` handlers](#finding-sigalrm-handlers)
* [Finding paths from signal handlers to `free`](#finding-paths-from-signal-handlers-to-free)
* [Confirming reachability](#confirming-reachability)
* [Automating the analysis](#automating-the-analysis)
* [Setting up a virtual environment](#setting-up-a-virtual-environment)
* [Interacting with a database](#interacting-with-a-database)

## Getting and indexing the code

@@ -79,7 +93,9 @@ The [description](https://www.qualys.com/2024/07/01/cve-2024-6387/regresshion.tx

> The `SIGALRM` handler of this OpenSSH version calls `packet_close()`, which calls `buffer_free()`, which calls `xfree()` and hence `free()`, which is not async-signal-safe.

We'll start by checking this with the test tools provided in the SDK. This is not the actual way I would recommend doing anything, as these tools are designed as examples of how to use the API, as well as functionality tests of the API -- they are not designed specifically for productivity or composition.
We'll start by checking this with the test tools provided in the SDK. This is not the actual way I would recommend doing anything in practice, as these tools are designed as examples of how to use the API, as well as functionality tests of the API -- they are not designed specifically for productivity or composition. Later we'll use scripting to turn this into automated checker.

### Finding `SIGALRM` handlers

We'll start by trying to understand the specific `SIGALRM` signal. First, lets locate the entity:

@@ -103,9 +119,11 @@ So this says there's a function, `ssh_signal`, taking in a signal number `signum
![Registering sig_alarm for SIGALRM](images/openssh-variant-analysis-sigalrm-ref-1.png)
![Registering grace_alarm_handler for SIGALRM](images/openssh-variant-analysis-sigalrm-ref-2.png)

### Finding paths from signal handlers to `free`

So next we can look for paths between `sig_alarm` or `grace_alarm_handler` and a async signal unsafe function, such as `free`.

First, we'll find `free`:
Next, we'll find `free`:

```bash
% mx-find-symbol --db /tmp/openssh.db --name free --exact
@@ -141,6 +159,8 @@ Next, lets see if we can find a path from `sig_alarm` or `grace_alarm_handler` t

This creates the call graphs of `free` rooted at `sig_alarm` and `grace_alarm_handler`, respectively. The output of the `mx-print-call-graph` is a [DOT digraph](https://graphviz.org/doc/info/lang.html). There are no edges in the `sig_alarm` to `free` graph, so we'll focus on the `grace_alarm_handler` to `free` graph:

### Confirming reachability

```bash
% xdot /tmp/grace_alarm_handler_to_free.dot
```
@@ -152,3 +172,52 @@ With output looking like this:
We can see that from `grace_alarm_handler`, we can reach `sshfata` via `xmalloc` or `get_sock_port`, and from there `cleanup_exit` provides paths to `free`.

We have now manually confirmed the rough reachability details of the CVE, i.e. that an async-signal unsafe function can potentially be invoked by a signal handler in OpenSSH.

## Automating the analysis

We can automate the analysis using Multiplier's Python API.

### Setting up a virtual environment

If you have unpacked a [release](https://github.com/trailofbits/multiplier/releases) of Multiplier to `/path/to/multiplier`, then inspect the `lib` subdirectory to see the Python version against which the API is built:

```bash
% ls /path/to/multiplier/lib
cmake libLTO.so.18.1 libRemarks.so.18.1 libgap-coro.a libmultiplier.so python3.11
```

Above we see a `python3.11` subdirectory inside of `lib`. Next, create and enter a virtual environment using a Python interpreter with a matching version number:

```bash
% python3.11 -m venv /path/to/multiplier
% source /path/to/multiplier/bin/activate
(multiplier) %
```

### Interacting with a database

Inside your virtual environment, open your Python interpreter and try the following:

```bash
% python
Python 3.11 ...
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiplier as mx
>>>
```

Next, we'll open a connection to the OpenSSH database. We do two things here: we open the database by its path, then we wrap that connection in an in-memory cache. In practice, you always want to wrap the connection in the cache. Having this as a separate API may seem strange or unintuitive; however, it's important to remember that the Python API is derived from the C++ API, and so this allows users of the C++ API to decide how many caches they want to have, giving them a measure of concurrency control (e.g. if there are multiple analysis threads).

```python
>>> index = mx.Index.in_memory_cache(mx.Index.from_database("/tmp/openssh.db"))
```

Lets verify that we can indeed find `SIGALRM` as before:

```python
>>> sigalrm = next(index.query_entities("SIGALRM"))
>>> sigalrm
<multiplier.frontend.DefineMacroDirective object at 0x104328c90>
>>> "".join(t.data for t in sigalrm.use_tokens.file_tokens)
'#define SIGALRM 14'
```