This project integrates the Global Cybersecurity Threats (2015-2024) dataset from Kaggle.
pnpm installThis will install:
axios- For downloading the dataset from Kaggle APIcsv-parser- For parsing CSV files
- Go to Kaggle Account Settings
- Scroll to the "API" section
- Click "Create New API Token"
- Download the
kaggle.jsonfile
Add to your .env file:
KAGGLE_USERNAME=your_kaggle_username
KAGGLE_KEY=your_kaggle_api_keyYou can find these values in the downloaded kaggle.json file:
{
"username": "your_kaggle_username",
"key": "your_kaggle_api_key"
}The dataset will be automatically downloaded and loaded when you start the server:
pnpm dev:serverOr run both frontend and backend:
pnpm dev:allIf you prefer to download the dataset manually:
- Go to the dataset page
- Click "Download" and extract the files
- Place the CSV file(s) in the
./data/global-cybersecurity-threats-2015-2024/directory - The server will automatically detect and load the CSV file(s)
Once the dataset is loaded, the AI assistant can:
- Tool:
queryCybersecurityThreats - Capabilities:
- Filter by year (2015-2024)
- Filter by threat type (malware, phishing, ransomware, DDoS, etc.)
- Filter by severity (low, medium, high, critical)
- Filter by country
- Limit results (default: 10, max: 50)
Example queries:
- "Show me all critical threats from 2023"
- "What phishing threats occurred in the United States?"
- "List ransomware incidents from 2022"
- Tool:
getCybersecurityStats - Returns:
- Total number of records
- Year range (min/max)
- Number of unique threat types
- Number of unique countries
- Sample record structure
Example queries:
- "What's the overview of the cybersecurity dataset?"
- "How many records are in the dataset?"
- "What years does the dataset cover?"
The dataset contains global cybersecurity threats from 2015-2024. The exact structure may vary, but common fields include:
- Year
- Threat Type
- Severity
- Country/Location
- Description
- Date
The query functions automatically adapt to the actual CSV structure.
- Check Kaggle credentials: Ensure
KAGGLE_USERNAMEandKAGGLE_KEYare set in.env - Check file location: CSV files should be in
./data/global-cybersecurity-threats-2015-2024/ - Check file format: Ensure the file is a valid CSV
- Check server logs: Look for error messages in the console
- The dataset may still be downloading
- Check that the CSV file exists in the data directory
- Verify the file is not corrupted
- Check server logs for specific error messages
Kaggle API has rate limits. If you hit them:
- Wait a few minutes and try again
- Consider downloading the dataset manually
data/
└── global-cybersecurity-threats-2015-2024/
└── [CSV files from the dataset]
The data/ directory is gitignored to avoid committing large dataset files.