|
| 1 | +[Table of contents](../README.md#contents) |
| 2 | + |
| 3 | +# Appendix: Fitness for purpose: Automatic bug filing |
| 4 | + |
| 5 | +## Introduction |
| 6 | + |
| 7 | +This Appendix describes how to produce SARIF log files that are <a href="Glossary.md#fit-for-purpose">_fit for purpose_</a> for automatic bug filing. It is an example of the "playbook" for producing fit-for-purpose SARIF log files, described in the Appendix [Fitness for purpose: Overview](Fitness-for-purpose-overview.md). |
| 8 | + |
| 9 | +The standard described here works well for a particular bug filing system used internally at Microsoft. |
| 10 | +You might be able to use it without change, or you might have to adapt it to your company's engineering system. |
| 11 | +In that case, do read the [Overview](Fitness-for-purpose-overview.md) to understand the purpose of each component of the playbook given here. |
| 12 | + |
| 13 | +## The standard |
| 14 | + |
| 15 | +For automatic bug filing to be effective, the engineering system must compare the SARIF results from the current build to those from a previous build, to identify those results that are new in the current build. |
| 16 | +This process is known as <a href="Glossary.md#result-matching">_result matching_</a>. |
| 17 | +Some of the requirements of this standard help to ensure that result matching is accurate, reducing the number of duplicate bugs that are filed (and ensuring that bugs _are_ filed for truly new results). |
| 18 | +Other requirements ensure that the filed bugs contain enough information to be understood and acted upon. |
| 19 | + |
| 20 | +Every requirement in the the following list is a "must", meaning that the validator must produce an `"error"`-level message for all of them, and the bug filing system must refuse to accept a file that violates any of them. |
| 21 | + |
| 22 | +1. Result `message` objects must provide `id` and `arguments` (as opposed to only providing `text`). |
| 23 | + The result matching algorithm (implemented in the [SARIF Multitool's](Multitool.md) `match-results-forward` command) includes `id` and `arguments` in its matching criteria. |
| 24 | + |
| 25 | +2. The `run` object must provide `versionControlProvenance`. |
| 26 | + The bug filing system uses this information (which includes the URL of the repo containing the files that were analyzed) to decide which team should be assigned the filed bugs. |
| 27 | + |
| 28 | +3. The `run.tool.driver` object must provide at least one of `version`, `semanticVersion`, or `dottedQuadFileVersion`. |
| 29 | + The bug filing system uses this information to decide whether the current run was analyzed with a different tool version than the previous run. |
| 30 | + If the tool version has changed, the bug filing system must re-analyze the sources as they existed during the previous run with the new version of the tool. |
| 31 | + This prevents a discontinuity due to any new analysis rules that the new tool version might have introduced. |
| 32 | + The bug filing system will only file bugs for results introduced since the last run -- whether those results come from previously existing or newly introduced rules. |
| 33 | + |
| 34 | +4. The `run.tool.driver` object must provide `informationUri`. This helps the developer responsible for fixing the bug by providing a way to learn more about the tool. |
| 35 | + |
| 36 | +5. All result location URIs must be expressed as relative references with respect to the repo root. |
| 37 | + |
| 38 | + **TODO** Understand why it's not good enough to have absolute URIs if they start with repo root (as specified by `versionControlDetails.mappedTo`). |
| 39 | + |
| 40 | +6. Each rule must provide `helpUri`. This helps the developer responsible for fixing the bug by providing a way to learn more about the rule violation. |
| 41 | + |
| 42 | +7. Each result must enable the user to view the problem in context. |
| 43 | + The result must include a code snippet for a region containing a few lines on either side of the actual result (`result.locations[].physicalLocation.contextRegion.snippet`). |
| 44 | + It is not enough to provide `...physicalLocation.region.snippet`: the region designating the actual error might be only a few characters long, or even just an insertion point. |
| 45 | + |
| 46 | + In addition, the log file must embed the entire contents of every file in which the tool detected a result so that the appropriate file can be attached to each filed bug. |
| 47 | + This is important for two reasons. |
| 48 | + First, the user assessing the bug might not yet have an enlistment, and if they do, they might not have checked out the branch that was analyzed. |
| 49 | + In some source control systems, switching branches on a large code base can be time consuming. |
| 50 | + Second, the user assessing the bug might not have permission to enlist. For example, the user might be a security analyst in an organization where only developers can enlist. |
| 51 | + |
| 52 | +## The analysis rules |
| 53 | + |
| 54 | +The analysis rules that enforce these standards are: |
| 55 | + |
| 56 | +- `SARIF2002.ProvideMessageArguments`: Enforces Requirement <span>#</span>1 by ensuring that `result.message.id` and `.arguments` are present. |
| 57 | +- `SARIF2003.ProvideVersionControlProvenance`: Enforces Requirement <span>#</span>2 by ensuring that `run.versionControlProvenance` is present. |
| 58 | +- `SARIF2005.ProvideToolProperties`: Enforces Requirement <span>#</span>3 by ensuring that at least one of `version`, `semanticVersion`, and `dottedQuadFileVersion` is present, and <span>#</span>4 by ensuring that `informationUri` is present.<sup><a href="#note-1">1</a></sup> |
| 59 | +- `SARIF2007.ExpressPathsRelativeToRepoRoot`: Enforces Requirement <span>#</span>5 by ensuring that all result location URIs are expressed as relative references with respect to the repo root. |
| 60 | +- `SARIF2012.ProvideHelpUris`: Enforces Requirement <span>#</span>6 by ensuring that `helpUri` is present. |
| 61 | +- `SARIF2010.ProvideSupplementalCodeContext`: Enforces Requirement <span>#</span>7 by ensuring that context region snippets and embedded file content are present.<sup><a href="#note-2">2</a></sup> |
| 62 | + |
| 63 | +## The fixers |
| 64 | + |
| 65 | +The following procedure will address requirements <span>#</span>2 (provide `run.versionControlProvenance`), <span>#</span>5 (express URIs relative to repo roots), and <span>#</span>7 (embed file contents, and provide code snippets for context regions). |
| 66 | +To embed file content and provide code snippets, the code we describe here must be run from the root directory of the repository. |
| 67 | + |
| 68 | +There is no way to automatically enforce the rest of the criteria (for example, there's no way to provide a tool information URI if the tool itself didn't provide one). Also, this procedure will only provide code snippets for context regions if the tool itself provided context regions. |
| 69 | + |
| 70 | +1. In whatever program you are using to prepare SARIF for bug filing, add a NuGet reference to `Sarif.Multitool.Library`, version 2.3.5 or later. |
| 71 | + |
| 72 | +2. Write the following code: |
| 73 | + |
| 74 | +```C# |
| 75 | +var command = new RewriteCommand(); |
| 76 | +var options = new RewriteOptions |
| 77 | +{ |
| 78 | + InputFilePath = "<input-file>", |
| 79 | + OutputFilePath = "<output-file>", |
| 80 | + DataToInsert = new List<OptionallyEmittedData> |
| 81 | + { |
| 82 | + OptionallyEmittedData.ContextRegionSnippets, |
| 83 | + OptionallyEmittedData.TextFiles, |
| 84 | + OptionallyEmittedData.BinaryFiles, |
| 85 | + OptionallyEmittedData.VersionControlInformation |
| 86 | + } |
| 87 | +}; |
| 88 | + |
| 89 | +int exitCode = command.Run(options); |
| 90 | +``` |
| 91 | + |
| 92 | +This is equivalent to the SARIF Multitool command line |
| 93 | + |
| 94 | +``` |
| 95 | +sarif rewrite --insert ContextRegionSnippets,TextFiles,BinaryFiles,VersionControlInformation --output <output-file> input-file |
| 96 | +``` |
| 97 | + |
| 98 | +## The configuration file |
| 99 | + |
| 100 | +This is a standard validation XML configuration file, which explicitly enables all the analysis rules mentioned above, and sets their levels as `"error"`. |
| 101 | + |
| 102 | +## Notes |
| 103 | + |
| 104 | +<a id="note-1">1.</a> At the time of writing, `SARIF2005` is not satisfied by `dottedQuadFileVersion`, and it does not require `informationUri`. Issue [microsoft/sarif-sdk#2040](https://github.com/microsoft/sarif-sdk/issues/2040), "SARIF2005.ProvideToolProperties: Allow dottedQuadFileVersion; require informationUri", tracks this work. |
| 105 | + |
| 106 | +<a id="note-2">2.</a> At the time of writing, `SARIF2010` covers code snippets, `SARIF2011` covers context regions, and `SARIF2013` covers embedded file content. Issue [microsoft/sarif-sdk#2041](https://github.com/microsoft/sarif-sdk/issues/2041), "Combine 'code context' rules", tracks this work. |
| 107 | + |
| 108 | +<a id="note-3">3.</a> This should probably be just an option `--rebase-uris` to the `rewrite` command. |
| 109 | + |
| 110 | +[Table of contents](../README.md#contents) |
0 commit comments