This demo showcases how to use LogDelta for analyzing log data from Hadoop hosted by loghub.
-
YouTube Playlist: A comprehensive walkthrough of the LogDelta tool.
-
Script with Visuals and Timestamps: A detailed script of the video, including visuals and timestamped links for easy navigation.
- File: script_with_visuals.md
- Anomaly Detection: Supports both run-level and line-level anomaly analysis. Use visualizations and anomaly detection models to identify mislabeled logs.
- Visualization Options: File names, textual content, U-MAP projections, anomaly scores.
- LogDelta offers a variety of tools for visual and machine learning analysis of log data, making it easier to uncover patterns and anomalies.
- Using LogDelta multiple incorrect labels in Hadoop dataset were found.
- Corrected Labels:
| ID | Orig Label | Fixed Label |
|---|---|---|
| 1445144423722_0024 | Normal | Disk Full |
| 1445182159119_0017 | Machine Down | Normal |
| 1445062781478_0020 | Machine Down | Normal |
| 1445182151478_0015 | Machine Down | Disk Full |
| 1445182159119_0013 | Disk Full | Machine Down |
| 1445182159119_0011 | Disk Full | Machine Down |
Follow these steps to get started with LogDelta:
Clone the repository:
git clone https://github.com/EvoTestOps/LogDelta.git
cd LogDelta/demo/label_investigationCreate new virtual environment and install LogDelta and set up the environment:
conda create -n logdelta python=3.11
conda activate logdelta
pip install logdeltaDownload the Hadoop dataset and extract it:
wget -O Hadoop.zip https://zenodo.org/records/8196385/files/Hadoop.zip?download=1
unzip Hadoop.zip -d HadoopRename the runs with labels using the provided script:
python label_hadoop_runs_orig.pyRun the demo configurations: For file name visualization:
python -m logdelta.config_runner -c 1_viz_file_names.ymlFor textual content visualization:
python -m logdelta.config_runner -c 2_viz_run_content.ymlFor anomaly detection with run content:
python -m logdelta.config_runner -c 3_ano_run_content.ymlFor line-level anomaly detection:
python -m logdelta.config_runner -c 4_ano_line_content.ymlOutputs will be saved in out_1, out_2, out_3, and out_4 folders, respectively.