An AI-powered security chatbot that provides vulnerability analysis and security recommendations using Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs).
This chatbot analyzes security risks based on:
- Real-time CVE (Common Vulnerabilities and Exposures) data from NIST National Vulnerability Database
- Infrastructure details (OS versions, services, applications)
- Security best practices and threat intelligence
The system uses RAG to retrieve relevant security information and generates contextual responses using LLMs.
- Real-time CVE Database: Fetches latest vulnerability data from NIST NVD
- RAG Pipeline: Uses FAISS vector store and semantic search for accurate retrieval
- Dual LLM Support:
- OpenAI GPT-3.5/4 (API-based)
- Ollama (Free, local LLM)
- Interactive Web Interface: Streamlit-based UI
- Infrastructure Analysis: Assesses risks based on your system configurations
- Actionable Recommendations: Provides specific mitigation strategies
User Query โ Streamlit UI โ Security Chatbot
โ
RAG Pipeline
โ
โโโโโโโโโโโโดโโโโโโโโโโโ
โ โ
FAISS Vector Store LLM (OpenAI/Ollama)
โ โ
Retrieved Context Generated Response
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
Final Answer
- Python 3.13.7
- Windows 11
- Internet connection (for CVE data fetching)
- OpenAI API key OR Ollama installation
# Navigate to your project directory
cd security_chatbot# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activateOption A: One-Click Install (Recommended)
# Windows users
install.bat
# Linux/Mac users
chmod +x install.sh
./install.shThe script will automatically:
- Check Python version
- Create virtual environment
- Install all packages
- Verify installation
Option B: Manual Install
pip install -r requirements.txtNote for Python 3.13 users: If you encounter issues with faiss-cpu or langchain, the requirements.txt has been updated with compatible versions. If problems persist, see INSTALLATION_FIX.md for detailed solutions.
Verify installation:
python test_imports.pyCreate a .env file in the project root:
# Copy the example file
copy .env.example .envEdit .env with your settings:
Option A: Using OpenAI (Recommended)
OPENAI_API_KEY=sk-your-api-key-here
USE_OLLAMA=falseOption B: Using Ollama (Free, Local)
USE_OLLAMA=true
OLLAMA_MODEL=llama2- Create account at OpenAI Platform
- Generate API key at API Keys page
- Add API key to
.envfile
- Download Ollama from ollama.ai
- Install Ollama on Windows
- Open terminal and run:
# Download the model (this may take a few minutes)
ollama pull llama2
# Start Ollama server
ollama serve- Keep the Ollama server running in the background
# Activate virtual environment first (if not already activated)
venv\Scripts\activate
# Run the Streamlit app
streamlit run app.pyThe application will open in your browser at http://localhost:8501
- Configure LLM:
- Choose OpenAI or Ollama in the sidebar
- Enter API key if using OpenAI
- Initialize Chatbot: Click "๐ Initialize Chatbot"
- Wait for Setup: The system will:
- Fetch CVE data from NIST NVD
- Build vector embeddings
- Initialize the knowledge base
- Start Chatting: Ask security questions!
- What are the most critical vulnerabilities in our infrastructure?
- How can we protect against recent Apache vulnerabilities?
- Tell me about CVE-2024-XXXXX and its impact
- What security measures should we implement for our web servers?
- Perform a risk assessment of our Windows servers
- What are the high-severity CVEs from the last month?
security_chatbot/
โโโ app.py # Streamlit web interface
โโโ chatbot.py # Main chatbot logic and LLM integration
โโโ rag_pipeline.py # RAG implementation with FAISS
โโโ cve_collector.py # CVE data collection from NVD API
โโโ requirements.txt # Python dependencies
โโโ .env.example # Environment variables template
โโโ .env # Your environment configuration (create this)
โโโ data/ # CVE data storage (created automatically)
โ โโโ cve_data.json
โโโ vector_store/ # FAISS index storage (created automatically)
โโโ faiss_index.bin
โโโ documents.pkl
Test CVE Data Collection:
python cve_collector.pyTest RAG Pipeline:
python rag_pipeline.pyTest Chatbot:
python chatbot.pyThe system includes sample infrastructure data for testing:
- Web Server Cluster (Ubuntu, Apache, WordPress)
- Database Server (RHEL, PostgreSQL)
- Application Server (Windows Server, .NET, IIS)
- Network Infrastructure (Cisco firewalls, routers)
- CVE Data: NIST National Vulnerability Database (NVD) API
- Infrastructure: Sample data included (can be customized)
- Update Frequency: Can be refreshed on-demand via UI
- Embedding Model:
all-MiniLM-L6-v2(HuggingFace) - Vector Store: FAISS (Facebook AI Similarity Search)
- Chunk Size: 800 characters with 100-character overlap
- Retrieval: Top-5 most relevant documents
- Temperature: 0.3 (for more factual responses)
- Max Tokens: 1000
- System Prompt: Configured as cybersecurity expert
The system can be evaluated on:
-
Retrieval Quality
- Relevance of retrieved documents
- Coverage of relevant CVEs
- Precision of semantic search
-
Response Quality
- Accuracy of vulnerability assessments
- Actionability of recommendations
- Citation of sources
-
Performance
- Query response time
- Embedding generation speed
- Knowledge base update time
To fetch latest CVE data:
- Click "๐ Update CVE Database" in the sidebar
- System will fetch CVEs from the last 30 days
- Knowledge base will be automatically rebuilt
- Continue chatting with updated information
If you encounter faiss-cpu version errors:
- See
INSTALLATION_FIX.mdfor detailed solutions - Run
python test_imports.pyto verify installation - Consider using Python 3.11 or 3.12 if Python 3.13 has issues
- Ensure Ollama is installed and running
- Run
ollama servein a terminal - Check if
http://localhost:11434is accessible
- Verify API key is correct in
.env - Check if you have available credits
- Ensure internet connection is stable
- Check internet connection
- NIST NVD API may be temporarily down
- Try again after a few minutes
- Consider adding NVD_API_KEY for higher rate limits
- Ensure virtual environment is activated
- Run
pip install -r requirements.txtagain - Verify Python version is 3.13.7
- NIST NVD
- OpenAI API Documentation
- Ollama Documentation
- LangChain Documentation
- FAISS Documentation
- Sentence Transformers
- โ Data Collection: Automated CVE fetching from NVD API
- โ Knowledge Base Construction: Vector embeddings with FAISS
- โ LLM Integration: Support for both API and local models
- โ RAG Implementation: Retrieval-augmented generation pipeline
- โ User Interface: Interactive Streamlit web application
- โ Testing & Validation: Sample scenarios and test cases
Edit rag_pipeline.py function create_sample_infrastructure():
{
'name': 'Your Server Name',
'type': 'server_type',
'description': '''
Your infrastructure description including:
- Operating System
- Applications
- Versions
- Network exposure
- Critical assets
'''
}Edit in cve_collector.py or in your code:
cves = collector.fetch_recent_cves(
days=60, # Look back 60 days
max_results=200 # Fetch up to 200 CVEs
)This is an academic project. For improvements:
- Test thoroughly
- Document changes
- Maintain code quality
- Cite any external resources used
This project is created for educational purposes as part of a university course assignment.
Created as part of LLM Security Project coursework.
- NIST National Vulnerability Database for CVE data
- OpenAI for GPT models
- Ollama team for local LLM infrastructure
- HuggingFace for embedding models
- Facebook Research for FAISS
- Streamlit for the web framework