Skip to content

imouiche/Threat-Intelligence-Knowledge-Graphs

Repository files navigation

🛡️ Threat Intelligence Knowledge Graphs (TiKG) Entity & Relation Extraction for Cyber Threat Intelligence

From Unstructured Threat Reports ➜ Structured Cybersecurity Knowledge Graphs

🌟 Overview TiKG (Threat Intelligence Knowledge Graphs) is a pipeline-based entity and relation extraction framework designed to transform unstructured cyber threat intelligence (CTI) into structured, machine-readable knowledge graphs. The framework addresses fundamental challenges in CTI automation, including:

❌ Error propagation in traditional extraction pipelines

❌ Domain ambiguity in generic language models

❌ Noisy entity spans and inconsistent relations

❌ Limited explainability in downstream threat analysis

TiKG enables high-fidelity knowledge graph construction, supporting threat detection, attribution, correlation, and investigation.

📄 Reference (How to Cite) If you use TiKG, please cite the following peer-reviewed journal paper: @article{mouiche2025tikg, title={Entity and relation extractions for threat intelligence knowledge graphs}, author={Mouiche, Inoussa and Saad, Sherif}, journal={Computers & Security}, volume={148}, pages={104120}, year={2025}, issn={0167-4048}, doi={10.1016/j.cose.2024.104120} }

🔗 DOI: https://doi.org/10.1016/j.cose.2024.104120

🧠 Key Contributions 🔹 Context-Aware Pipeline Architecture TiKG introduces a security-aware extraction pipeline that integrates:

SecureBERT / SecureBERT⁺ embeddings (cybersecurity-adapted transformers)

Sequential modeling (BiLSTM / BiGRU)

CRF decoding for valid entity boundaries

Ontology-guided error control to reduce cascading extraction errors

🔹 High-Quality Knowledge Graph Construction Extracted entities and relations are stored as structured triples, enabling:

Threat actor–tool–malware correlation

Campaign attribution

Cross-report reasoning

Query-driven threat investigation

🏗️ Architecture

Pipeline Flow:

Tokenization & contextual embeddings

Sequence modeling for NER

CRF-based entity decoding

Relation extraction with pooled entity representations

Knowledge graph population

📊 Experimental Validation TiKG was extensively evaluated across multiple cybersecurity datasets, including:

DNRTI

CyNER

STUCCO

✔️ Key Results

Strong improvements in Precision, Recall, and F1

Robust generalization across heterogeneous CTI sources

Reduced noise propagation compared to baseline pipelines

The table below (from the paper) highlights consistent performance gains across datasets.

Image Image

🧩 Sample Knowledge Graph Visualization

🔍 Example insights enabled by TiKG:

Malware reuse across campaigns

Shared infrastructure among threat actors

Hidden relations not obvious from raw reports

🔧 Use Cases TiKG supports real-world cybersecurity workflows, including:

🚨 Threat detection & alert enrichment

🕵️ Threat attribution & campaign tracking

📊 Knowledge-driven SOC dashboards

🤖 Automated CTI reasoning & analysis

🌍 Beyond Cybersecurity While designed for CTI, TiKG is domain-agnostic and transferable to:

🧬 Biomedical text mining

💰 Financial intelligence

🏥 Healthcare analytics

🔐 Safety-critical AI systems

📦 Code & Reproducibility

Sample experiments and configurations are provided

Datasets are publicly available

Extended work (CTiKG) builds on this framework with enhanced context awareness

📩 Questions, collaborations, or industry partnerships are welcome.

👨‍💻 Authors

Inoussa Mouiche: PhD Candidate, Computer Science Cybersecurity | AI | ML | Knowledge Graphs

Sherif Saad: Associate Professor, Computer Science

⭐ If this repository is useful to your work, please star it and cite the paper!

About

- Entity and Relation Extractions for Threat Intelligence Knowledge Graphs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published