Large Language Models for Cyber Security: A Systematic Literature Review

Abstract

This systematic literature review comprehensively analyzes the emerging intersection of Large Language Models (LLMs) and various cybersecurity applications, synthesizing current research trends, effective methodologies, and common challenges. It maps the landscape of LLM adoption in defensive tasks like vulnerability identification, anomaly detection, and automated threat response, as well as their use in offensive security. The review critically evaluates current implementation limitations, including model trustworthiness and the high computational requirements necessary for robust, real-world deployment.

Report

Key Highlights

Comprehensive Mapping: The review systematically organizes and categorizes the proliferation of LLM applications across the cybersecurity pipeline, including network security, software vulnerability analysis, malware detection, and human factor security (e.g., phishing).
Core Applications: The most frequently studied areas involve code generation/auditing, automated threat intelligence gathering (summarization and correlation), and the development of intelligent security assistants.
Identified Challenges: Key technical hurdles include the risk of data poisoning and adversarial attacks against LLMs, ensuring model explainability and transparency in security decisions, and mitigating hallucinations that could lead to false positives or critical security oversights.
Focus on Adaptation: The review highlights the importance of fine-tuning techniques (e.g., Instruction Tuning and Reinforcement Learning from Human Feedback specific to cyber contexts) to tailor general-purpose LLMs for highly specialized security tasks.

Technical Details

Model Architectures: The review likely covers the adaptation and performance metrics of prominent Transformer-based architectures (e.g., BERT, GPT variants, CodeGen) when trained on cyber-specific corpora like exploit databases, vulnerability reports (CVEs), and sanitized malware code.
Methodology Focus: Emphasis is placed on transfer learning, few-shot learning, and specialized prompt engineering methodologies designed to elicit actionable security insights, such as detecting zero-day vulnerabilities or generating effective intrusion prevention rules.
Evaluation Metrics: Discussion centers on custom metrics necessary for security contexts, such as False Positive Rate (FPR) minimization, detection efficacy against polymorphic threats, and the speed/latency of inference necessary for real-time security operations centers (SOCs).
Data Requirements: A significant technical section addresses the necessity and difficulty of creating high-quality, sanitized, and domain-specific datasets required for pre-training and fine-tuning robust LLMs in security environments.

Implications

Hardware Acceleration Demand: The review underscores that the successful operationalization of complex LLM-based security systems necessitates immense computational power. This drives urgent demand for highly efficient, customized AI accelerators (NPUs/DSPs) and high-throughput memory interfaces, creating a critical market opportunity for optimized hardware solutions.
RISC-V Customization Potential: The findings directly support the use case for the RISC-V architecture. Its open standard and instruction set extensibility (e.g., vector extensions P) enable the creation of highly specialized processor cores and security modules optimized specifically for the rapid matrix operations and token processing required for LLM inference at the edge (e.g., embedded intrusion detection systems in IoT devices).
Enhanced Security Assurance: LLMs, if robustly implemented, offer the potential to dramatically improve the automation of security verification and vulnerability scanning in hardware and firmware. This is particularly relevant for new, rapidly evolving platforms like RISC-V, allowing faster identification of bugs and architectural flaws during the design and deployment phases.
Trust and Resilience: The challenges identified (e.g., adversarial attacks on LLMs) stress the need for secure enclave architectures and verifiable computing mechanisms, which RISC-V platforms are uniquely positioned to integrate, ensuring the integrity of the AI models running critical security tasks.

Abstract

Report

Key Highlights

Technical Details

Implications

Prof. B's Student