Skip to content
View eauchs's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report eauchs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
eauchs/README.md

Théophile Lafargue

20. Founder. Paris.

I build AI infrastructure for constrained environments — the edges where most systems stop working.

Patent FR2511116 (Sep 2025) — stateful LLM communication over 2G / SMS / LoRa / satellite. End-to-end PoC validated: real 2G SIM → Flask gateway → Qwen 122B → response in <10s on a 50€ phone.

First institutional validation: Gendarmerie Nationale (DGGN), Apr 2026. Active defense pipeline with DGA and état-major des armées.

Student-Entrepreneur — Pépite PEIPS, Université Paris-Saclay (SNEE).

ORCID: 0009-0001-5727-2475


Open-source contributions

PR What it does Status
llama.cpp #20075 Fix state corruption in speculative decoding on hybrid SSM/MoE models. +45% inference speed on Apple Silicon Metal. Merged · Cited as prior work in #20428 + #20649
llama.cpp #20649 Diagnostic + flake8 fix for Mistral Small 4 (119B MoE). Merged · Cited alongside ggerganov + ngxson
unslothai/unsloth #4901 Fix RoPE offset cast crashing Gemma 4 inference on Apple Silicon. Merged by danielhanchen (creator)
StepFun Cookbook #14 Local deployment architecture for Step-3.5-Flash on Apple Silicon. Merged

Projects

Project Description
mlx-dflash Native MLX port of DFlash speculative decoding. 3.41× faster inference on Apple Silicon — Qwen3-8B bf16, M3 Max 128GB, 1024 tokens. Acceptance 8.75/16. Single mx.eval() per step, intra-GPU verify_ids.
LACE Semantic compression under LoRa/SMS physical constraints. Cognitive Emergence Law: N/K < C·d_cog, C_emp=0.391≈1/e. K=16 optimal deployment parameter (p=0.0034). Preprint: HAL hal-05596229 · Zenodo 10.5281/zenodo.19664121
mythos-distillation Behavioral distillation of Anthropic Mythos into Gemma 4 26B MoE via LoRA (r=64, 30 layers). 551 pairs, val loss 1.398, 7/7 out-of-distribution questions generalized without system prompt. 80 t/s on M3 Max.
patent-low-bandwidth-ai Hybrid RAG gateway for AI over 2G/SMS — production backend for FR2511116. Real SIM tested.
Phantom On-device behavioral AI OS. Two-Tower (LSTM 256d + action embeddings), full RLHF loop via local Qwen 122B as reward model. MLX, zero cloud.
VoxTape Local voice dictation for macOS. MLX Whisper on Metal GPU: 8.3s audio → 0.4s inference (20× real-time). Open-source alternative to SuperWhisper.
benchmark-422-qec 11 LLMs (cloud + local M3 Max) on the [[4,2,2]] CritPt QEC problem. 0/11 correct. Failure patterns documented.

HuggingFace (ox-ox)

Description Downloads
ox-ox/MiniMax-M2.7-GGUF First GGUF quants of MiniMax-M2.7 (229B MoE). Q3_K_L + Q8_0. PPL 8.44 · 28.52 t/s · 7.4k views r/LocalLLaMA 522
ox-ox/MiniMax-M2.5-GGUF First GGUF quants of MiniMax-M2.5 (229B MoE). PPL 8.79 · 28.7 t/s · 19k views r/LocalLLaMA. Recommended by llama.cpp community. 437
ox-ox/lace-semantic-compression 198 operational tasks (defense/medical/industrial), VQ codebook, LACE v2 dataset 40
ox-ox/mythos-character-distillation 551 behavioral pairs for Mythos-style LoRA distillation 66

Patent

FR2511116 — Hybrid State-Preserving Gateway for LLM Inference over Low-Bandwidth Protocols (2G / SMS / LoRa / satellite)
Filed: Sep 27, 2025 · INPI · 11 claims · Examination in progress


Stack

Inference — llama.cpp · MLX · Metal · GGUF · speculative decoding
AI/ML — LoRA · Transformers · VQ-VAE · RAG (ChromaDB) · Whisper · RLHF
Protocols — LoRa / 2G / SMS / satellite · Flask gateway
Languages — Python · C++
Infra — Tailscale · bare-metal homelab · M3 Max 128GB


LinkedIn · Substack · HuggingFace · ORCID

Pinned Loading

  1. enigma-shell enigma-shell Public

    An experimental web shell to control a full Linux OS (v86) with natural language via local LLMs.

    JavaScript 3

  2. speech-to-speech-pipeline speech-to-speech-pipeline Public

    A real-time, interruptible (barge-in) conversational AI pipeline (STT-LLM-TTS) running locally. Optimized for Apple Silicon (MLX).

    Python 5

  3. gui-agent gui-agent Public

    A two-layer GUI agent for macOS. A VLM (Vision Language Model) handles perception, while a separate LLM (Qwen) manages high-level strategy and decision-making. Built with Python, OpenAI API, and Py…

    Python 1

  4. patent-low-bandwidth-ai patent-low-bandwidth-ai Public

    Backend for my 'Stateful AI over Low-Bandwidth Networks' patent (FR2511116). A hybrid RAG pipeline (SmolDocling, ChromaDB, Reranker) with SMS support, separating local VLM (perception) from the mai…

    Python 1