Skip to content

Conversation

@anasalkouz
Copy link
Member

Description

Technical blog post to show case the OpenSearch's new Piped Processing Language Capabilities

Issues Resolved

Closes #3974

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

@github-actions
Copy link

Thank you for submitting a blog post!

The blog post review process is: Submit a PR -> (Optional) Peer review -> Doc review -> Editorial review -> Marketing review -> Published.

Copy link

@Swiddis Swiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for drafting this! The examples are technically solid, there's some great coverage here. I learned some new stuff about commands I didn't directly work with.

I think we could stand to restructure it a bit to be more story-driven and less encyclopedic. Especially for a blog I'd like to see something more beginner-friendly.

has_science_table: false
---

OpenSearch's Piped Processing Language (PPL) evolves significantly with new and enhanced capabilities that reshape how you handle log analytics and observability workflows. This comprehensive update streamlines how you troubleshoot applications, monitor system performance, and analyze security events, providing essential tools to extract meaningful insights from your observability data. Through enhanced features and refined functionality, teams can navigate complex log analysis with greater precision and clarity.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This intro is very abstract. It talks a lot about reshaping workflows without any point of reference.

Some readers might not know what PPL is, and those that use it might not have a great feeling for what the current state is. Could we maybe start with a brief introduction of what PPL is and what problem it solves? Something like "PPL is OpenSearch's query language for..."

OpenSearch's Piped Processing Language (PPL) evolves significantly with new and enhanced capabilities that reshape how you handle log analytics and observability workflows. This comprehensive update streamlines how you troubleshoot applications, monitor system performance, and analyze security events, providing essential tools to extract meaningful insights from your observability data. Through enhanced features and refined functionality, teams can navigate complex log analysis with greater precision and clarity.

## What's new in OpenSearch PPL?
Let's explore the new PPL commands and functions through practical examples of common log analytics use cases. These examples demonstrate how PPL enhanced capabilities can help you analyze logs more effectively, from combining multiple data sources to processing unstructured log data and performing time-series analysis. We'll also cover significant performance improvements in this release, including the integration with Apache Calcite as the query engine.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think before getting straight to what's new, we should cover some historic pain points we've resolved.

For people who have tried it before and found it unsatisfactory, this might be a good opportunity to win them back.

has_science_table: false
---

OpenSearch's Piped Processing Language (PPL) evolves significantly with new and enhanced capabilities that reshape how you handle log analytics and observability workflows. This comprehensive update streamlines how you troubleshoot applications, monitor system performance, and analyze security events, providing essential tools to extract meaningful insights from your observability data. Through enhanced features and refined functionality, teams can navigate complex log analysis with greater precision and clarity.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "(PPL) evolves evolved significantly"

OpenSearch's Piped Processing Language (PPL) evolves significantly with new and enhanced capabilities that reshape how you handle log analytics and observability workflows. This comprehensive update streamlines how you troubleshoot applications, monitor system performance, and analyze security events, providing essential tools to extract meaningful insights from your observability data. Through enhanced features and refined functionality, teams can navigate complex log analysis with greater precision and clarity.

## What's new in OpenSearch PPL?
Let's explore the new PPL commands and functions through practical examples of common log analytics use cases. These examples demonstrate how PPL enhanced capabilities can help you analyze logs more effectively, from combining multiple data sources to processing unstructured log data and performing time-series analysis. We'll also cover significant performance improvements in this release, including the integration with Apache Calcite as the query engine.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a point of preference, I would also rather avoid hedging with "look at these examples," just start with the examples.

If we want to show how much we've improved, let's take a case that we previously couldn't do, that's now easy.

We also hedge a second time in the next paragraph, with "Below are scenarios where new commands..."

Let's explore the new PPL commands and functions through practical examples of common log analytics use cases. These examples demonstrate how PPL enhanced capabilities can help you analyze logs more effectively, from combining multiple data sources to processing unstructured log data and performing time-series analysis. We'll also cover significant performance improvements in this release, including the integration with Apache Calcite as the query engine.

### 1. New commands and functions
The OpenSearch 3.3 (https://opensearch.org/blog/explore-opensearch-3-3/) release marks a substantial expansion of PPL functionality with the introduction of 9 new commands, 7 evaluation functions, and 8 statistical functions. The syntax of existing commands has also been refined for improved usability, creating a more intuitive experience for users across various analytical scenarios. Below are scenarios where new commands and functions can help you analyze your data:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"evaluation functions" and "statistical functions" is a strange distinction to make here, I'd just say "15 functions."

The exact semantics of evaluation vs statistics can be left to the PPL reference.


```
# Combines web log data with geographical IP data.
# Which allowing you to see which countries generate the most traffic.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "Which allowing allows"

```

#### Time-series analysis ####
PPL introduces streamlined temporal and distribution analysis with new `timechart`, `bin` and `eventstats` commands. The `timechart` command aggregates data over time intervals with flexible span controls, automatically handling time gap filling and result ordering for time-series analysis. It provides visualization-ready formatting with time as the primary axis and supports grouping by additional fields. The `bin` command automatically groups numeric data into ranges or buckets, facilitating distribution analysis for understanding data spread and frequency patterns. The `eventstats` command which is essential for generating summary statistics from fields in events while *preserving* the original events.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a comma: "the eventstats command, which is"

![Visualization tab in Discover page](/assets/media/blog-images/2025-10-29-opensearch-new-ppl-capabilities/timechart.png)

#### Unstructured log processing at query time ####
Text processing features have been included to PPL with the addition of `regex`, `rex`, and `spath` commands. These features enable users to filter, extract, and parse unstructured text directly at query time without requiring data preprocessing. The `regex` command provides pattern-based filtering to isolate relevant log entries, while `rex` extracts structured fields from raw text using regular expressions. The `spath` command extracts fields from JSON data, enabling access to nested objects and arrays. Together, these commands enable instant adaptation to new log formats without requiring reindexing operations, allowing users to analyze previously unstructured data immediately.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "have been included to in PPL"



#### Complex data type support
With these latest upgrades, customers have the ability to perform complex data transformations at *search time* rather than *index time*. The existing PPL function set works well for primitive data types (e.g., strings, numbers, timestamps). We increase the support to cover complex data types with multi-value statistics aggregation functions `(list, values)`. The `list` and `values` functions collect multiple values into structured arrays during aggregation operations with `list` preserving duplicates while `values `returns unique values. The `mvjoin` function combines multi-value fields into single strings using specified delimiters, enabling array manipulation within queries.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "values returns" -> "values returns"

Example for using `values`:

```
# Analyize the User journey while they navigate across pages,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Analyize -> Analyze

@pajuric
Copy link

pajuric commented Nov 5, 2025

@kolchfa-aws @natebower - Adding you both to push this into review.

@pajuric pajuric added the Tech review The blog is under tech review label Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Tech review The blog is under tech review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BLOG] Better observability , deeper insights: OpenSearch's new Piped Processing Language Capabilities

3 participants