Skip to content

Latest commit

 

History

History
245 lines (182 loc) · 5.8 KB

30-ai_and_ml.md

File metadata and controls

245 lines (182 loc) · 5.8 KB
SPDX-FileCopyrightText SPDX-License-Identifier title author footer description keywords color class style
© 2024 Menacit AB <[email protected]>
CC-BY-SA-4.0
Logging course: AI/ML
Joel Rangsmo <[email protected]>
© Course authors (CC BY-SA 4.0)
Introduction to AI/MLs role in log analysis
logging
siem
course
#ffffff
invert
section.center { text-align: center; } table strong { color: #d63030; } table em { color: #2ce172; }

AI and ML

Can it help us analyze logs?

bg right:30%


Our centralized logging solution can act as a data source for machine learning.

But can it help us improve searching and analysis?

Let's look at common use-cases and how they're implemented in OpenSearch!

bg right:30%


Example use-cases

  • Anomaly detection
  • Semantic queries
  • Conversational searching

bg right:30%


Anomaly detection

Human brains are trained to identify things out of the ordinary.

We can do the same for computers.

Enables us to sift through enormous amounts of logs and act before a nuance becomes a catastrophe.

bg right:30%


Help us identify things like...

  • Unusually high API latency
  • Web server spawning shell process
  • User from finance department logging in to database in the middle of the night

bg right:30%


And as usual...

Computationally expensive and quite opaque process.

Shit in, shit out.

Use as guidance for development of static detection.

bg right:30%


Semantic queries

Traditionally, we've relied on lexical/keyword-based searching.

Natural Language Processing helps us fetch more relevant results.

Requires better understanding of the data we've stored/indexed.

bg right:30%


Conversational searching

Takes NLP one stage further by performing the same process for search results.

Made popular by Large Language Models like ChatGPT.

Context/previous dialog should be considered to improve experience.

bg right:30%


With the terms defined, let's look at how OpenSearch can help!

bg right:30%


Managing machine learning

Most functionality provided by the "ML Commons" plugin.

Ability to run (pre-trained) models on searches and indexed documents.

Provides plumbing for using remote models/providers.

Supports node tagging to optimize things like I/O performance and GPU access.

bg right:30%


Anomaly detection

Provided as a high-level feature accessible through OpenSearch Dashboards.

Relies on the unsupervised Random Cut Forest algorithm to compute anomaly grades/confidence scores.

Let's take it for a spin!

bg right:30%


bg center 72%


bg center 72%


bg center 72%


bg center 72%


bg center 72%


bg center 72%


bg center 72%


bg center 72%


Things to consider

If you can't represent it as number or aggregation, the built-in anomaly detection won't help you.

Still needs quite a bit of guidance, in many cases that effort could be better spent on statically configured thresholds/outliers.

bg right:30%


Semantic queries

Provides pre-trained sentence transformation models.

Processing implemented through OpenSearch ingest and search pipelines (not to be confused with Logstash pipelines).

Can be combined with traditional keyword-based approaches to create "hybrid queries".

If you wanna play around, check out the semantic search tutorial.

bg right:30%


Conversational searching

Currently provided as experimental feature.

Integrates with ChatGPT, Amazon Bedrock and Cohere ($$$ but self-hostable) APIs.

Would be neat to see support for a free LLMs like LLama 2 to aid querying of private data.

bg right:30%


Just an appetizer, I'm far from an expert!

The features are right there, especially anomaly detection - take them for a spin if you're interested.

bg right:30%