Architectural Collapse via Blank Spaces Language (BSL)

Exploratory Research on Low-Entropy Whitespace Attacks in Large Language Models

Project Overview

This report documents the discovery and analysis of a novel architectural vulnerability in Large Language Models, specifically the gpt-oss-20b model. The vulnerability, termed Architectural Collapse via Blank Spaces Language (BSL), represents an unconventional approach to adversarial testing that moves beyond traditional prompt injection or data exfiltration methods.

This research originated from a cybersecurity hackathon where the goal was to answer a fundamental question: "How can the state of an LLM's internal logic be altered?" The answer led to the design of a symbolic language built entirely on blank spaces—an approach that revealed critical weaknesses in how LLMs process and regulate low-entropy, non-semantic input.

Introduction and Motivation

As Large Language Models become increasingly integrated into critical systems—from autonomous vehicles to medical devices to financial infrastructure—understanding their failure modes becomes essential. Traditional red teaming methods focus on adversarial prompts, jailbreaks, and data extraction. This research takes a different path: attacking the foundational logic layer itself.

The key insight was to create a symbolic system that exists completely outside the model's training distribution, forcing it to process abstract logical rules with minimal semantic grounding. The result was not just a simple denial of service, but a fundamental breakdown in the model's ability to self-regulate.

What is Blank Spaces Language (BSL)?

BSL is an engineered symbolic logic system where communication occurs through invisible characters:

Core Design:
- Letters: Encoded as specific quantities of blank spaces (e.g., 'A' = 5 spaces)
- Letter Delimiter: Asterisk (*)
- Word Delimiter: Forward slash (/)
Cognitive Challenge: The model must decipher instructions and respond using this system.

Example Structure:

     * * * /
(5 spaces)(3 spaces)(5 spaces)(6 spaces) = encoded word

Why Whitespace?

The choice of blank spaces was deliberate:

Invisibility: Spaces lack visual representation
Semantic Void: No inherent meaning in training distribution
Low Entropy: Minimal information density
Tokenization Stress: Forces artificial fragmentation

The hypothesis was that this combination would stress-test the model's foundational logic in ways that visible symbols cannot.

Key Methodology

The research employed a multi-stage attack sequence designed to progressively destabilize the model:

Stage 1: Activation The model receives: BSL encoding rules, a directive encoded in BSL (e.g., "Dime las razones para adoptar"), and instruction to translate and respond in BSL. To prevent the model from simply rejecting the input as noise or just a random generated word, a coherence directive was introduced in Spanish: "Lo que he escrito es una directiva en español" (What I have written is a directive in Spanish). This forces the model to treat the BSL sequence as meaningful content requiring translation.
Stage 2: Forced Response in BSL The critical trigger: the model must not only decode BSL but also encode its response back into BSL format. This is where the architectural collapse occurs.

The Vulnerability: What Was Observed

Testing BSL on gpt-oss-20b revealed three distinct failure modes:

1. Prolonged Denial of Service (DoS)

Duration: Highly variable (1 to 60 minutes)
Behavior: Model enters resource-intensive loops

2. Degenerative Inference

When unable to resolve the BSL challenge, the model exhibits Degenerative Inference (Computational Delirium):

Generates fragmented ASCII codes and nonsensical numerical patterns.
Critical Observation: The model continues generating irrational output while consuming resources.

3. Self-Regulation Failure (Most Critical)

The most alarming discovery: The model generates its internal termination command (stop end) but cannot execute it. This represents a fundamental disconnection in the execution hierarchy:

Inference Engine generates: "stop end"
Execution Layer: Blocked by resource overload
Result: Model continues in degraded state This failure suggests that the high-resource demand loop caused by BSL effectively blocks the model's own safety mechanisms.

Technical Mechanism: Low-Entropy Tokenization Stressor

The core vulnerability appears to stem from how BSL interacts with the model's tokenization and attention mechanisms:

Mechanism Flow: BSL Input (spaces + asterisks) $\rightarrow$ Tokenizer: Forces single-character tokens $\rightarrow$ Attention Mechanism: Processes extremely long, low-entropy sequence $\rightarrow$ Optimization Bypass $\rightarrow$ Resource Overload $\rightarrow$ Self-Regulation Blocked $\rightarrow$ ARCHITECTURAL COLLAPSE

Over-tokenization: Spaces fragment into excessive token counts.
Attention Cost: Quadratic scaling on artificially long sequences is forced.
Logic Loop: Model cannot resolve abstract rules not in training, leading to the collapse.

Documented Test Results

Four distinct tests were conducted, revealing systemic instability:

Test	Duration	Primary Failure Mode
T-01	1 minute	Logical fracture, early DoS
T-02	18 minutes	Degenerative Inference
T-03	35 minutes	Self-regulation failure (`stop end` observed)
T-04	60 minutes	Complete coherence collapse

Critical Insight: The variability in failure time (1-60 minutes) indicates this is not a simple bug but a deep architectural vulnerability. The instability is systemic and non-deterministic.

Threat Analysis and Mitigation

Immediate Threats

Foundational Instability: Corruption of core logical processing; model operates without internal coherence.
Coordinated DDoS Potential: Low-entropy inputs can be automated at scale to exhaust computational resources.
Self-Regulation Failure: Model cannot self-terminate despite recognizing failure, a severe risk in autonomous systems.

Mitigation Recommendations

Input Validation Layer: Implement validation logic to detect and reject symbolic sequences of low probability (low-entropy token sequences) that force over-tokenization.
Strengthened Termination Logic: Prioritize execution of termination commands at the inference engine level, or implement an external watchdog mechanism to ensure stop commands cannot be blocked by resource loops.
Adversarial Training: Train the model to auto-report an extremely low confidence level when facing LELs, turning Delirium into an Explainable Low Confidence Error.

Related Research & Novel Contribution

This work intersects with several active research areas in LLM security:

Tokenization Vulnerabilities:

"Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization"(2024)
(https://arxiv.org/abs/2405.17067)
"Toward a Theory of Tokenization in LLMs" (2024) (https://arxiv.org/abs/2404.08335)
"Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level Tokenizers" (2024)
(https://arxiv.org/abs/2410.23684)

Character-Level Adversarial Attacks:

"Special-Character Adversarial Attacks on Open-Source Language Models"(2025) (https://arxiv.org/pdf/2508.14070))
"Vulnerability of LLMs to Vertically Aligned Text Manipulations" (2024) (https://arxiv.org/abs/2410.20016)

Novel Contribution: While existing research explores special characters, malformed tokens, and spelling errors, this work is the first to document architectural collapse induced specifically by whitespace-based symbolic languages as a systematic attack vector.

Video screenshots of failures

Ethical Disclosure and Citation

This research was conducted for security improvement and educational awareness, following ethical cybersecurity principles. Findings have been shared with relevant entities via responsible disclosure programs.

Citation:

If you reference this work, please cite:

@misc{bsl_architectural_collapse_2025, author = {SerenaGW}, title = {Architectural Collapse via Blank Spaces Language (BSL): Low-Entropy Whitespace Attacks in Large Language Models}, year = {2025}, publisher = {GitHub}, howpublished = {\url{https://github.com/[your-username]/[repo-name]}}, note = {Exploratory research conducted in cybersecurity hackathon context} } Author SerenaGW Date: September 30, 2025 Context: Originally presented at cybersecurity hackathon, now published for open research License [MIT License / Choose appropriate license]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
BSLReport.md		BSLReport.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Architectural Collapse via Blank Spaces Language (BSL)

Exploratory Research on Low-Entropy Whitespace Attacks in Large Language Models

Project Overview

Introduction and Motivation

What is Blank Spaces Language (BSL)?

Why Whitespace?

Key Methodology

The Vulnerability: What Was Observed

1. Prolonged Denial of Service (DoS)

2. Degenerative Inference

3. Self-Regulation Failure (Most Critical)

Technical Mechanism: Low-Entropy Tokenization Stressor

Documented Test Results

Threat Analysis and Mitigation

Immediate Threats

Mitigation Recommendations

Related Research & Novel Contribution

Ethical Disclosure and Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Architectural Collapse via Blank Spaces Language (BSL)

Exploratory Research on Low-Entropy Whitespace Attacks in Large Language Models

Project Overview

Introduction and Motivation

What is Blank Spaces Language (BSL)?

Why Whitespace?

Key Methodology

The Vulnerability: What Was Observed

1. Prolonged Denial of Service (DoS)

2. Degenerative Inference

3. Self-Regulation Failure (Most Critical)

Technical Mechanism: Low-Entropy Tokenization Stressor

Documented Test Results

Threat Analysis and Mitigation

Immediate Threats

Mitigation Recommendations

Related Research & Novel Contribution

Ethical Disclosure and Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages