-
Notifications
You must be signed in to change notification settings - Fork 521
Adding blog post to show case the new PPL capabilities and CLI tool #3994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding blog post to show case the new PPL capabilities and CLI tool #3994
Conversation
Signed-off-by: Anas Alkouz <[email protected]>
|
Thank you for submitting a blog post! The blog post review process is: Submit a PR -> (Optional) Peer review -> Doc review -> Editorial review -> Marketing review -> Published. |
Signed-off-by: Anas Alkouz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for drafting this! The examples are technically solid, there's some great coverage here. I learned some new stuff about commands I didn't directly work with.
I think we could stand to restructure it a bit to be more story-driven and less encyclopedic. Especially for a blog I'd like to see something more beginner-friendly.
| has_science_table: false | ||
| --- | ||
|
|
||
| OpenSearch's Piped Processing Language (PPL) evolves significantly with new and enhanced capabilities that reshape how you handle log analytics and observability workflows. This comprehensive update streamlines how you troubleshoot applications, monitor system performance, and analyze security events, providing essential tools to extract meaningful insights from your observability data. Through enhanced features and refined functionality, teams can navigate complex log analysis with greater precision and clarity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This intro is very abstract. It talks a lot about reshaping workflows without any point of reference.
Some readers might not know what PPL is, and those that use it might not have a great feeling for what the current state is. Could we maybe start with a brief introduction of what PPL is and what problem it solves? Something like "PPL is OpenSearch's query language for..."
| OpenSearch's Piped Processing Language (PPL) evolves significantly with new and enhanced capabilities that reshape how you handle log analytics and observability workflows. This comprehensive update streamlines how you troubleshoot applications, monitor system performance, and analyze security events, providing essential tools to extract meaningful insights from your observability data. Through enhanced features and refined functionality, teams can navigate complex log analysis with greater precision and clarity. | ||
|
|
||
| ## What's new in OpenSearch PPL? | ||
| Let's explore the new PPL commands and functions through practical examples of common log analytics use cases. These examples demonstrate how PPL enhanced capabilities can help you analyze logs more effectively, from combining multiple data sources to processing unstructured log data and performing time-series analysis. We'll also cover significant performance improvements in this release, including the integration with Apache Calcite as the query engine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think before getting straight to what's new, we should cover some historic pain points we've resolved.
For people who have tried it before and found it unsatisfactory, this might be a good opportunity to win them back.
| has_science_table: false | ||
| --- | ||
|
|
||
| OpenSearch's Piped Processing Language (PPL) evolves significantly with new and enhanced capabilities that reshape how you handle log analytics and observability workflows. This comprehensive update streamlines how you troubleshoot applications, monitor system performance, and analyze security events, providing essential tools to extract meaningful insights from your observability data. Through enhanced features and refined functionality, teams can navigate complex log analysis with greater precision and clarity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "(PPL) evolves evolved significantly"
| OpenSearch's Piped Processing Language (PPL) evolves significantly with new and enhanced capabilities that reshape how you handle log analytics and observability workflows. This comprehensive update streamlines how you troubleshoot applications, monitor system performance, and analyze security events, providing essential tools to extract meaningful insights from your observability data. Through enhanced features and refined functionality, teams can navigate complex log analysis with greater precision and clarity. | ||
|
|
||
| ## What's new in OpenSearch PPL? | ||
| Let's explore the new PPL commands and functions through practical examples of common log analytics use cases. These examples demonstrate how PPL enhanced capabilities can help you analyze logs more effectively, from combining multiple data sources to processing unstructured log data and performing time-series analysis. We'll also cover significant performance improvements in this release, including the integration with Apache Calcite as the query engine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a point of preference, I would also rather avoid hedging with "look at these examples," just start with the examples.
If we want to show how much we've improved, let's take a case that we previously couldn't do, that's now easy.
We also hedge a second time in the next paragraph, with "Below are scenarios where new commands..."
| Let's explore the new PPL commands and functions through practical examples of common log analytics use cases. These examples demonstrate how PPL enhanced capabilities can help you analyze logs more effectively, from combining multiple data sources to processing unstructured log data and performing time-series analysis. We'll also cover significant performance improvements in this release, including the integration with Apache Calcite as the query engine. | ||
|
|
||
| ### 1. New commands and functions | ||
| The OpenSearch 3.3 (https://opensearch.org/blog/explore-opensearch-3-3/) release marks a substantial expansion of PPL functionality with the introduction of 9 new commands, 7 evaluation functions, and 8 statistical functions. The syntax of existing commands has also been refined for improved usability, creating a more intuitive experience for users across various analytical scenarios. Below are scenarios where new commands and functions can help you analyze your data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"evaluation functions" and "statistical functions" is a strange distinction to make here, I'd just say "15 functions."
The exact semantics of evaluation vs statistics can be left to the PPL reference.
|
|
||
| ``` | ||
| # Combines web log data with geographical IP data. | ||
| # Which allowing you to see which countries generate the most traffic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "Which allowing allows"
| ``` | ||
|
|
||
| #### Time-series analysis #### | ||
| PPL introduces streamlined temporal and distribution analysis with new `timechart`, `bin` and `eventstats` commands. The `timechart` command aggregates data over time intervals with flexible span controls, automatically handling time gap filling and result ordering for time-series analysis. It provides visualization-ready formatting with time as the primary axis and supports grouping by additional fields. The `bin` command automatically groups numeric data into ranges or buckets, facilitating distribution analysis for understanding data spread and frequency patterns. The `eventstats` command which is essential for generating summary statistics from fields in events while *preserving* the original events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a comma: "the eventstats command, which is"
|  | ||
|
|
||
| #### Unstructured log processing at query time #### | ||
| Text processing features have been included to PPL with the addition of `regex`, `rex`, and `spath` commands. These features enable users to filter, extract, and parse unstructured text directly at query time without requiring data preprocessing. The `regex` command provides pattern-based filtering to isolate relevant log entries, while `rex` extracts structured fields from raw text using regular expressions. The `spath` command extracts fields from JSON data, enabling access to nested objects and arrays. Together, these commands enable instant adaptation to new log formats without requiring reindexing operations, allowing users to analyze previously unstructured data immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "have been included to in PPL"
|
|
||
|
|
||
| #### Complex data type support | ||
| With these latest upgrades, customers have the ability to perform complex data transformations at *search time* rather than *index time*. The existing PPL function set works well for primitive data types (e.g., strings, numbers, timestamps). We increase the support to cover complex data types with multi-value statistics aggregation functions `(list, values)`. The `list` and `values` functions collect multiple values into structured arrays during aggregation operations with `list` preserving duplicates while `values `returns unique values. The `mvjoin` function combines multi-value fields into single strings using specified delimiters, enabling array manipulation within queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "values returns" -> "values returns"
| Example for using `values`: | ||
|
|
||
| ``` | ||
| # Analyize the User journey while they navigate across pages, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Analyize -> Analyze
|
@kolchfa-aws @natebower - Adding you both to push this into review. |
Description
Technical blog post to show case the OpenSearch's new Piped Processing Language Capabilities
Issues Resolved
Closes #3974
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.