-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[error-tracking] Apply general doc update #28724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
32b94fb
1bad36b
4ef1ce7
0cd9924
1799c0a
4aa19d1
81ef701
abd6ca9
2e71ac8
2999525
3532d9e
a872519
c835630
82eb884
56658c2
ceee26b
2a56ca1
7cfeece
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -24,6 +24,12 @@ Each item listed in the Error Tracking Explorer is an issue that contains high-l | |||||
| - Graph of occurrences over time | ||||||
| - Number of occurrences in the selected time period | ||||||
|
|
||||||
| Issue are also tagged as: | ||||||
| - `New` if the issue was first seen less than two days ago and is in state **FOR REVIEW** (see [Issue States][5]) | ||||||
| - `Regression` if the issue was **RESOLVED** and occurred again in a newer version (see [Regression Detection][6]) | ||||||
| - `Crash` if the application crashed | ||||||
| - Having a [Suspected Cause][3] | ||||||
|
|
||||||
| ### Time range | ||||||
|
|
||||||
| {{< img src="real_user_monitoring/error_tracking/time_range.png" alt="Error Tracking Time Range" style="width:80%;" >}} | ||||||
|
|
@@ -48,6 +54,36 @@ Click the Edit icon to see the list of available facets that you can show or hid | |||||
|
|
||||||
| {{< img src="/error_tracking/error-tracking-facets.png" alt="Click the pencil icon to hide or show available Error Tracking facets from view." style="width:100%;" >}} | ||||||
|
|
||||||
| ### Issue level filters | ||||||
|
|
||||||
| In addition to error events, Error Tracking offers issue level filters to refine the list of displayed issues. | ||||||
|
|
||||||
| {{< img src="error_tracking/issue-level-filters.png" alt="Issue level filters in Error Tracking" style="width:100%;" >}} | ||||||
|
|
||||||
| #### Sources | ||||||
|
|
||||||
| Error Tracking consolidates errors from multiple Datadog products (Rum, Logs, APM) into a unified view, allowing you to watch and troubleshoot errors across your entire stack. You can choose to display **All**, **Browser**, **Mobile**, or **Backend** issues in the explorer. | ||||||
|
|
||||||
| For more granular filtering, you can narrow down issues by specific log sources or by SDK and scope to a programming language. | ||||||
|
|
||||||
| #### Fix available | ||||||
|
|
||||||
| Display only issues that have an AI generated fix available to quickly remediate problems. | ||||||
|
|
||||||
| #### Teams filters | ||||||
|
|
||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we also mention service owners here? |
||||||
| Issue Team Ownership helps you quickly identify issues and focus on relevant errors by using Git `CODEOWNERS`. Datadog will automatically filter your issues so your team can cut through noise and prioritize what matters. | ||||||
|
|
||||||
| Issue ownership is derived from the `CODEOWNERS` files of your repositories. To use this feature, you need to link your Datadog teams to their GitHub counterparts. All errors coming from RUM and APM are eligible for Team Ownership. | ||||||
|
|
||||||
| #### Assigned to | ||||||
|
|
||||||
| Track and assign issues to yourself or the most knowledgeable team members, and easily refine the issue list by assignee. | ||||||
|
|
||||||
| #### Suspected Cause | ||||||
|
|
||||||
| [Suspected Cause][3] enables quicker filtering and prioritization of errors, empowering teams to address potential root causes more effectively. | ||||||
|
|
||||||
| ## Inspect an issue | ||||||
|
|
||||||
| Click on any issue to open the issue panel and see more information about it. | ||||||
|
|
@@ -64,21 +100,19 @@ The lower part of the issue panel gives you the ability to navigate error sample | |||||
|
|
||||||
| ## Get alerted on new errors | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| Seeing a new issue as soon as it happens gives you the chance to proactively identify and fix it before it becomes critical. Error Tracking generates a [Datadog event][1] whenever an issue is first seen in a given service and environment and, as a result, gives you the ability to be alerted in such cases by configuring [Event Monitors][2]. | ||||||
| Seeing a new issue as soon as it happens gives you the chance to proactively identify and fix it before it becomes critical. Error Tracking monitors allow you to track any new issue or issues that have a high impact in your systems or on your users (see [Error Tracking Monitors][7]) | ||||||
|
|
||||||
| Each event generated is tagged with the version, the service, and the environment so that you have a fine-grained control over issues you want to be alerted for. You can directly export your search query from the explorer to create an event monitor on the related scope: | ||||||
| You can directly export your search query from the explorer to create an Error Tracking Monitor on the related scope: | ||||||
|
|
||||||
| {{< img src="/error_tracking/create-monitor.mp4" alt="Export your search query to an Error Tracking monitor" video=true >}} | ||||||
|
|
||||||
| ## Suspected Cause | ||||||
|
|
||||||
| [Suspected Cause][3] enables quicker filtering and prioritization of errors, empowering teams to address potential root causes more effectively. | ||||||
|
|
||||||
| ## Further Reading | ||||||
|
|
||||||
| {{< partial name="whats-next/whats-next.html" >}} | ||||||
|
|
||||||
| [1]: /events | ||||||
| [2]: /monitors/types/event/ | ||||||
| [3]: /error_tracking/suspected_causes | ||||||
| [4]: /real_user_monitoring/explorer/search/#event-types | ||||||
| [4]: /real_user_monitoring/explorer/search/#event-types | ||||||
| [5]: /error_tracking/issue_states | ||||||
| [6]: /error_tracking/regression_detection | ||||||
| [7]: /monitors/types/error_tracking | ||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -19,15 +19,15 @@ You can define what data is included in Error Tracking in two ways: | |||||
| - [Rules](#rules-inclusion) | ||||||
| - [Rate limits](#rate-limits) | ||||||
|
|
||||||
| You can configure both rules and rate limits on the [**Error Tracking** > **Settings**][1] page. | ||||||
| You can configure both rules and rate limits on the [**Error Tracking** > **Settings**][1] page. | ||||||
|
|
||||||
| ## Rules | ||||||
|
|
||||||
| Rules allow you to select which errors are ingested into Error Tracking. They apply to both billable and non-billable errors. | ||||||
| Rules allow you to select which errors are ingested into Error Tracking. They apply to both billable and non-billable errors. | ||||||
|
|
||||||
| Each rule consists of: | ||||||
| - A scope: an inclusion filter, which contains a search query, such as `service:my-web-store`. | ||||||
| - Optionally, one or more nested exclusion filters to further refine the rule. For example, an exclusion filter might use the `env:staging` query to exclude staging errors. | ||||||
| - Optionally, one or more nested exclusion filters to further refine the rule and ignore some of the matching events. For example, an exclusion filter might use the `env:staging` query to exclude staging errors. | ||||||
|
|
||||||
| A given rule can be toggled on or off. An error event is included if it matches a query in one of the active inclusion filters _and_ it does not match any active nested exclusion queries. | ||||||
|
|
||||||
|
|
@@ -39,6 +39,33 @@ Each error event is checked against the rules in order. The event is processed o | |||||
|
|
||||||
| Rules are evaluated in order, with the evaluation stopping at the first matching rule. The priority of the rules and their nested filters depends on their order in the list. | ||||||
|
|
||||||
| {{% collapse-content title="Example" level="p" %}} | ||||||
| Given a list of rules: | ||||||
| - Rule 1: `env:prod` | ||||||
| - Exclusion filter 1-1: `service:api` | ||||||
| - Exclusion filter 1-2: `status:warn` | ||||||
| - Rule 2: `service:web` | ||||||
| - Rule 3 (this rule is disabled): `team:security` | ||||||
| - Rule 4: `service:foo` | ||||||
|
|
||||||
|
|
||||||
| {{< img src="error_tracking/error-tracking-filters-example.png" alt="Error Tracking Filters example of setup" style="width:75%;" >}} | ||||||
|
|
||||||
| The processing flow is as follows: | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
In this section, it might be helpful to explain the result of the example given. In essence describe the diagram in words. (i.e. what is happening as the flow progresses and what is the end result for this specific event?)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can do that! I thought it's pretty clear, we used to have nothing.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh I see! Think that the image and steps are wonderful additions. I think an explanation around it might help when users need to create their own steps to satisfy their use cases. Sort of like this explanation for workflows ( but less in-depth |
||||||
| {{< img src="error_tracking/error-tracking-filters-diagram-brand-design.png" alt="Error Tracking Filters" style="width:90%;" >}} | ||||||
|
|
||||||
|
|
||||||
| An event with `env:prod service:my-service status:warn` | ||||||
| - will match rule 1 and go to its exclusion filters | ||||||
| - will not match exclusion 1-1 so will go to exclusion 1-2 | ||||||
| - at exclusion 1-2, it will be a match, so the event will be discarded | ||||||
|
|
||||||
| An event with `env:staging service:web` | ||||||
| - will not match rule 1, so will go to rule 2 | ||||||
| - at rule 2, it will be a match, so the event will be kept | ||||||
|
|
||||||
| {{% /collapse-content %}} | ||||||
|
|
||||||
| ### Default rules | ||||||
|
|
||||||
| By default, Error Tracking has an `*` inclusion filter and no exclusion filters. This means all error with the [requirements][2] to be fingerprinted are ingested into Error Tracking. | ||||||
|
|
@@ -54,7 +81,7 @@ To add a rule (inclusion filter): | |||||
| 6. Click **Save Changes** | ||||||
| 7. Optionally, reorder the rules to change their [evaluation order](#evaluation-order). Click and drag the six-dot icon on a given rule to move the rule up or down in the list. | ||||||
|
|
||||||
| {{< img src="logs/error_tracking/reorder_filters.png" alt="On the right side of each rule is a six-dot icon, which you can drag vertically to reorder rules." style="width:80%;">}} | ||||||
| {{< img src="error_tracking/reorder-filters.png" alt="On the right side of each rule is a six-dot icon, which you can drag vertically to reorder rules." style="width:80%;">}} | ||||||
|
|
||||||
|
|
||||||
| ## Rate limits | ||||||
|
|
@@ -71,22 +98,55 @@ To set a rate limit: | |||||
| 1. Edit the **errors/month** field. | ||||||
| 1. Click **Save Rate Limit**. | ||||||
|
|
||||||
| {{< img src="logs/error_tracking/rate_limit.png" alt="On the left side of this page, under 'Set your Rate Limit below,' is a drop-down menu where you can set your rate limit." style="width:70%;">}} | ||||||
| {{< img src="error_tracking/rate-limit.png" alt="On the left side of this page, under 'Set your Rate Limit below,' is a drop-down menu where you can set your rate limit." style="width:70%;">}} | ||||||
|
|
||||||
| A `Rate limit applied` event is generated when you reach the rate limit. See the [Event Management documentation][4] for details on viewing and using events. | ||||||
|
|
||||||
| {{< img src="logs/error_tracking/rate_limit_reached_event.png" alt="Screenshot of a 'Rate limit applied' event in the Event Explorer. The event's status is INFO, the source is Error Tracking, the timestamp reads '6h ago', and the title is 'Rate limit applied.' The event is tagged 'source:error_tracking'. The message reads 'Your rate limit has been applied as more than 60000000 logs error events were sent to Error Tracking. Rate limit can be changed from the ingestion control page. " style="width:70%;">}} | ||||||
|
|
||||||
| ## Monitoring usage | ||||||
|
|
||||||
| You can monitor your Error Tracking on Logs usage by setting up monitors and alerts for the `datadog.estimated_usage.error_tracking.logs.events` metric, which tracks the number of ingested error logs. | ||||||
| You can monitor your Error Tracking on Logs usage by setting up monitors and alerts for the `datadog.estimated_usage.error_tracking.logs.events` metric, which tracks the number of ingested error logs. | ||||||
|
|
||||||
| This metric is available by default at no additional cost, and its data is retained for 15 months. | ||||||
|
|
||||||
| ## Dynamic Sampling | ||||||
|
|
||||||
| Because Error Tracking billing is based on the number of errors, large increases in the errors for a single issue can quickly consume your Error Tracking budget. Dynamic Sampling protects you by establishing a threshold for the error rate per issue based on your daily rate limit and historical error volumes, sampling errors when that threshold is reached. Dynamic Sampling automatically deactivates when the error rate of your issue decreases below the given threshold. | ||||||
|
|
||||||
| ### Setup | ||||||
|
|
||||||
| Dynamic Sampling is automatically enabled with Error Tracking with a default intake threshold based on your daily rate limit and historical volume. | ||||||
|
|
||||||
| For best results, set up a daily rate limit on the [Error Tracking Rate Limits page][5]: Click **Edit Rate Limit** and enter a new value. | ||||||
|
|
||||||
| {{< img src="error_tracking/dynamic-sampling-rate-limit.png" alt="Error Tracking Rate Limit" style="width:90%" >}} | ||||||
|
|
||||||
| ### Disable Dynamic Sampling | ||||||
|
|
||||||
| Dynamic Sampling can be disabled on the [Error Tracking Settings page][4]. | ||||||
|
|
||||||
| {{< img src="error_tracking/dynamic-sampling-settings.png" alt="Error Tracking Dynamic Sampling Settings" style="width:90%" >}} | ||||||
|
|
||||||
| ### Monitor Dynamic Sampling | ||||||
|
|
||||||
| A `Dynamic Sampling activated` event is generated when Dynamic Sampling is applied to an issue. See the [Event Management documentation][4] for details on viewing and using events. | ||||||
|
|
||||||
| {{< img src="error_tracking/dynamic-sampling-event.png" alt="Error Tracking Rate Limit" style="width:90%" >}} | ||||||
|
|
||||||
| #### Investigation and mitigation steps | ||||||
|
|
||||||
| When Dynamic Sampling is applied, the following steps are recommended: | ||||||
|
|
||||||
| - Check which issue is consuming your quota. The issue to which Dynamic Sampling is applied is linked in the event generated in Event Management. | ||||||
| - If you'd like to collect additional samples for this issue, raise your daily quota on the [Error Tracking Rate Limits page][5]. | ||||||
| - If you'd like to avoid collecting samples for this issue in the future, consider creating an exclusion filter to prevent additional events from being ingested into Error Tracking. | ||||||
|
|
||||||
| ## Further Reading | ||||||
|
|
||||||
| {{< partial name="whats-next/whats-next.html" >}} | ||||||
|
|
||||||
| [1]: https://app.datadoghq.com/error-tracking/settings/rules | ||||||
| [2]: /error_tracking/troubleshooting/?tab=java#errors-are-not-found-in-error-tracking | ||||||
| [4]: /service_management/events/ | ||||||
| [5]: https://app.datadoghq.com/error-tracking/settings/rate-limits | ||||||
Uh oh!
There was an error while loading. Please reload this page.