Cybersecurity

A Complete Guide to Fortifying Your LLM Against Prompt Injection with StruQ and SecAlign

2026-05-04 04:44:25

Introduction

Prompt injection attacks are among the most critical threats to applications powered by large language models (LLMs). These attacks exploit the model's tendency to follow instructions embedded within untrusted data, potentially overriding the intended system prompt. To help you defend your LLM application, this guide presents a clear, step-by-step process for implementing two effective fine-tuning defenses: StruQ (Structured Instruction Tuning) and SecAlign (preference optimization). These methods require no additional computation or human labor beyond standard fine-tuning, preserve utility, and have been shown to reduce attack success rates dramatically—sometimes to near zero.

A Complete Guide to Fortifying Your LLM Against Prompt Injection with StruQ and SecAlign
Source: bair.berkeley.edu

What You Need

Step-by-Step Implementation

Step 1: Understand Your Threat Model

Before applying any defense, map out where untrusted data enters your system. In a typical LLM-integrated application, the system prompt (instructions from the developer) is trusted, but external data—such as user documents, web retrieval results, API outputs, or reviews—is untrusted. Attackers can embed malicious instructions inside this data. Recognize that prompt injection occurs because:

Step 2: Set Up a Secure Front-End with Delimiters

Create a separation between trusted and untrusted parts of the input. This is the first line of defense, called the Secure Front-End. Reserve special tokens (e.g., [MARK], [DATA]) as delimiters. Then implement a filter that strips any occurrence of these special tokens from the untrusted data before it reaches the model. This ensures that only the system designer can enforce the separation. When constructing the final input, wrap the data segment with the delimiters so the model can learn to distinguish instructions in the data part from those in the prompt part.

Step 3: Apply Structured Instruction Tuning (StruQ)

StruQ teaches the LLM to ignore injected instructions within the data section. Generate a training dataset containing two types of samples:

Then perform supervised fine-tuning on the LLM, using the full dataset. The objective is to condition the model to always respond to the intended instruction from the prompt, ignoring any conflicting instructions in the data. This step significantly reduces the success rate of optimization-free prompt injection attacks—often down to near 0%.

A Complete Guide to Fortifying Your LLM Against Prompt Injection with StruQ and SecAlign
Source: bair.berkeley.edu

Step 4: Enhance with SecAlign (Preference Optimization)

While StruQ handles standard (optimization-free) attacks, SecAlign tackles stronger, optimization-based attacks. SecAlign uses preference optimization—a form of reinforcement learning from human feedback (RLHF)—to further align the model. You will need:

Fine-tune the LLM using a preference optimization objective (e.g., Direct Preference Optimization). This approach teaches the model to inherently prefer following the intended instruction even when under attack. SecAlign reduces success rates of strong optimization-based attacks to below 15%—a more than four-fold improvement over previous state-of-the-art methods across multiple LLMs.

Step 5: Test and Iterate

Evaluate the robustness of your fine-tuned model using a variety of prompt injection attacks, including both naive and advanced optimization-based ones. Measure success rate, false positive rate, and utility (e.g., task accuracy). If the success rate is still too high, consider:

Tips for Success

By following these steps—understanding the threat, separating input with delimiters, fine-tuning with StruQ, and reinforcing with SecAlign—you can build an LLM application that resists even sophisticated prompt injection attacks while maintaining its functionality.

Explore

How to Distinguish AI That Truly Understands from AI That Just Memorizes 10 Key Upgrades in AMD's Latest GAIA Release for Local AI Mastery New Kaiju Titan X Steals Spotlight in Monarch: Legacy of Monsters Season 2 Walmart and ABB's 400 kW EV Fast Chargers: Full Q&A Guide Checkmarx Under Siege: A Deep Dive into the Recent Supply-Chain Attacks