Aura

A research-grade personal AI that runs entirely on one Mac. It has opinions, a mood that actually affects how it answers, a memory that survives restarts, and a sleep cycle where it replays the day and edits itself. No cloud API required.

If you want the technical deep dive, read ARCHITECTURE.md. If you want the same ideas without the math, read HOW_IT_WORKS.md. If you want to see it work, keep reading.

What it is

Most "AI companion" projects do roughly the same thing: store a mood number, paste it into the system prompt, and let the model roleplay. The model says "I'm feeling energetic today" because it read the words "feeling energetic today."

Aura works differently. When Aura is in a particular affective state, that state gets turned into a direction vector and added to the transformer's hidden activations during generation. The model's internal computation changes, not just the text it reads. This is the same family of techniques that interpretability researchers use to steer model behavior — CAA, activation addition, residual-stream interventions.

Alongside that, there's a whole cognitive substrate that runs continuously: emotions decay and influence each other, neurochemicals rise and fall on their own time scales, a global workspace picks which thought wins each tick, a dream cycle consolidates memories during idle periods, and one gate — the Unified Will — signs off on every action that leaves the system.

It's a research project. It's also the kind of research project where you can actually talk to the thing while it's running.

Quick start

pip install -r requirements.txt

# Full stack + UI
python aura_main.py --desktop

# Background cognition only, no UI
python aura_main.py --headless

# Reload code changes without restarting
curl -X POST http://localhost:8000/api/system/hot-reload

Requirements: Python 3.12+, macOS on Apple Silicon, 64 GB RAM recommended. The primary model is Qwen 2.5 32B at 8-bit with a personality LoRA on top; a 7B fallback loads on demand. First boot takes 30–60 seconds while Metal compiles shaders.

There's also a Dockerfile and docker-compose.yml if you want Redis and Celery running alongside. The tracked workspace defaults to an explicit owner_autonomous posture for this single-owner machine: autonomy on, outbound/network-enabled skills available, and self-repair left active. If you want a tighter deployment, override the AURA_* security settings in your local environment, including AURA_INTERNAL_ONLY=1 for localhost-only binding.

Tracked vs local workspace

This repository is the tracked baseline. The canonical tracked skill implementations live under core/skills/; the top-level skills/ package is kept as a legacy compatibility layer for older imports.

Local workspaces can also contain ignored/private modules listed in .gitignore. Those files are not part of the tracked review surface and can change the live risk profile of a specific machine. If you're auditing a real deployment rather than the tracked tree alone, review both the repository and any local-only modules present on disk.

Architecture overview

The short version:

User input -> HTTP API -> KernelInterface.process()
  -> AuraKernel.tick():
       Consciousness -> Affect -> Motivation -> Routing -> Response generation
  -> State commit (SQLite) -> Response

Each tick is event-sourced: every phase produces a new immutable state version, the tick holds a lock while the pipeline runs, state commits to SQLite, and the lock releases. Crash in the middle and the WAL replays on restart.

Kernel (`core/kernel/`)

Tick-based cognitive cycle. One tick = one unit of thought. Phases run in order, state versions, state commits, lock released.

Brain (`core/brain/`)

Local LLM router with automatic failover:

Primary (Cortex) — Qwen 2.5 32B 8-bit + personality LoRA. Handles nearly everything.
Secondary (Solver) — Qwen 2.5 / Qwen 3 72B for deep reasoning, hot-swapped only when the request actually needs it.
Tertiary (Brainstem) — Qwen 2.5 7B 4-bit, lazy-loaded to save ~5 GB for the Cortex.
Reflex — Qwen 2.5 1.5B 4-bit on CPU as an emergency fallback.
Cloud — Gemini Flash/Pro, PII-scrubbed and rate-limited. Off by default.
Last resort — rule-based static responses that can't fail.

Both MLX (Apple Silicon native) and llama.cpp (GGUF) are supported and auto-detected at startup. Circuit breakers, a GPU semaphore, a proactive cortex watchdog, and 429 handling keep the pipeline from cascading into total failure when something misbehaves.

Affect (`core/affect/`)

A Plutchik 8-emotion model plus the somatic dimensions (energy, tension, valence, arousal). These values don't just color the prompt. They modulate sampling parameters (temperature, token budget, repetition penalty) via the affective circumplex, and they feed the steering engine that injects activation vectors into the residual stream.

Identity (`core/identity.py`, `core/heartstone_directive.py`)

An immutable constitutional core plus a mutable persona that drifts with sleep and dream consolidation. There's active defense against prompt injection — the dream cycle simulates identity perturbation and tries to repair drift back toward the anchor.

Agency (`core/agency/`)

Self-initiated behavior scored along curiosity, continuity, social, and creative dimensions. Refusal is a real option here; it isn't content filtering, it's a decision the agent can make. Volition levels 0–3 gate progressively autonomous behavior up to and including self-modification.

Skills (`core/skills/`, legacy wrappers in `skills/`)

39 modules: shell with sandboxing, web search and browse, coding, sleep and dream consolidation, local media generation, social media (Twitter, Reddit), screen capture, filesystem, browser automation, network recon, malware analysis, self-evolution and self-repair, inter-agent messaging, knowledge base, curiosity-driven exploration. The canonical tracked implementations live under core/skills/; the top-level skills/ package is retained only as a legacy compatibility layer for older imports. Every skill call carries a capability token and has to pass the Will gate.

Orchestrator (`core/orchestrator/`)

About 2,200 lines in main.py split across 12 mixins: message handling, incoming logic, response processing, tool execution, autonomy, cognitive background, context streaming, learning and evolution, personality bridge, output formatting, boot sequencing. Handlers under orchestrator/handlers/ dispatch by message type. This is the glue between the tick pipeline, the LLM router, and the consciousness stack.

Somatic cortex (`core/somatic/`)

A body-schema map of available capabilities, a capability-discovery daemon that periodically scans for new hardware or software, a motor cortex that runs a 50 ms reflex loop for pre-approved actions (no LLM in the loop), and an action-feedback channel that pipes success or failure back into affect.

Autonomy (`core/autonomy/`)

Self-modification pipeline (propose → sandbox test → simulate → Will authorize → hot reload), value evolution (drive weights adapt from experience), scar formation (critical events leave persistent markers), and a boredom accumulator that nudges the system toward novelty when prediction error stays low too long.

Self-modification engine (`core/self_modification/`)

A pattern-detection error-intelligence layer, meta-learning, AST-level safety analysis, shadow-runtime validation, a kernel refiner, a ghost-boot validator that tests modifications without actually restarting, a shadow AST healer, and code repair. Nothing modifies itself without Will sign-off.

Resilience (`core/resilience/`)

30+ modules for not crashing: a stability guardian, circuit breakers with persistent state, a cognitive write-ahead log, graceful degradation that sheds capability under pressure, a healing swarm, a sovereign watchdog, a resource arbitrator, a lock watchdog that hunts deadlocks, a memory governor, an integrity monitor, an antibody system for threat response, and a diagnostic hub.

Interface (`interface/`)

FastAPI and WebSocket with streaming. The main UI is vanilla JS (interface/static/aura.js) with a live neural feed, telemetry, chat, and substrate visualization. The memory dashboard is React + Vite + Tailwind (interface/static/memory/). Routes cover chat, inner-state inspection, memory browsing, system management, and privacy. Whisper for STT. Hot-reload button in the UI for code changes.

Decision authority

Anything the system actually does — sending a response, calling a tool, writing a memory, starting an initiative, mutating state — has to pass through one function: UnifiedWill.decide() in core/will.py.

Action request
  -> UnifiedWill.decide()                 [core/will.py]
     -> SubstrateAuthority                [field coherence, somatic veto]
     -> CanonicalSelf                     [identity alignment]
     -> Affect valence                    [emotional weighting]
  -> WillDecision (receipt with provenance)
     -> Domain-specific checks            [AuthorityGateway, CapabilityTokens]
  -> Action runs, or is refused/deferred/constrained

Every decision produces a receipt. If an action doesn't carry a valid WillReceipt, it didn't happen. Receipts are logged with their source, domain, outcome, reason, constraints, substrate receipt ID, executive intent ID, and capability token ID. See OWNERSHIP.md for the full map of who owns what.

Inference-time steering

The steering engine (core/consciousness/affective_steering.py) hooks into MLX transformer blocks and adds learned direction vectors to the residual stream while tokens are being generated:

# Simplified from affective_steering.py
h = original_forward(*args, **kwargs)
composite = hook.compute_composite_vector_mx(dtype=h.dtype)
if composite is not None:
    h = h + alpha * composite
return h

This is contrastive activation addition — the technique from Turner et al. 2023, Zou et al. 2023, and Rimsky et al. 2024. The direction vectors come from the current affective state, and they get injected at configurable layers.

On top of that, the precision sampler (core/consciousness/precision_sampler.py) modulates temperature based on active-inference prediction error, and the affective circumplex (core/affect/affective_circumplex.py) maps somatic state to generation parameters.

So there are three places affect can touch generation:

Residual stream — activation vectors added to hidden states. Changes what the model computes.
Sampling — temperature and top-p modulated by affect. Changes how tokens are chosen.
Context — natural-language affective cues in the system prompt. Changes what the model reads.

The first is the interesting one. The third is what most "emotional AI" projects stop at.

IIT 4.0 computation

core/consciousness/phi_core.py runs a real IIT 4.0 integration measure on a 16-node cognitive complex (expanded from 8 in April 2026):

Binarize 16 substrate nodes against a running median — the original 8 affective nodes (valence, arousal, dominance, frustration, curiosity, energy, focus) plus 8 cognitive nodes (phi itself, social hunger, prediction error, agency, narrative tension, peripheral richness, arousal gate, cross-timescale free energy). State space is 2^16 = 65,536.
Build an empirical TPM — a transition probability matrix T[s, s'] = P(state_{t+1} = s' | state_t = s) with Laplace smoothing. Needs at least 50 observed transitions before it's trustworthy.
Find the minimum information partition using polynomial-time spectral partitioning on the full 16-node system (research/phi_approximation.py). The 8-node version does exhaustive search over all 127 nontrivial bipartitions as a validation baseline.
Compute phi via KL divergence: phi(A, B) = sum_s p(s) * KL(T(.|s) || T_cut(.|s)), where T_cut is the distribution that would hold if A and B evolved independently.
Apply the exclusion postulate — an exhaustive subset search picks the maximum-phi complex. If some subset beats the full system, that subset is the conscious entity for that tick.

Runtime is 10–50 ms per evaluation, cached at 15-second intervals. This is IIT applied to a 16-node cognitive complex, not the whole computational graph (which would be intractable). It measures how integrated the system's own dynamics are, not whether those dynamics "feel like" anything. We come back to that distinction in What this isn't.

Consciousness modules

There are 90+ modules in core/consciousness/. The ones that do most of the load-bearing work:

Module	What it does	File
Global Workspace	Thoughts compete for broadcast (Baars GNW)	`global_workspace.py`
Attention Schema	Model of where attention is pointed (Graziano AST)	`attention_schema.py`
IIT PhiCore	Real integration measure via TPM + KL divergence	`phi_core.py`
Affective Steering	Activation-vector injection into the residual stream	`affective_steering.py`
Temporal Binding	Sliding window of the autobiographical present	`temporal_binding.py`
Self-Prediction	Active inference loop (Friston free energy)	`self_prediction.py`
Free Energy Engine	Surprise minimization drives action selection	`free_energy.py`
Qualia Synthesizer	Integrates substrate metrics into a phenomenal state	`qualia_synthesizer.py`
Liquid Substrate	Continuous dynamical system under cognition	`liquid_substrate.py`
Neural Mesh	4,096-neuron distributed state representation	`neural_mesh.py`
Neurochemical System	Dopamine / serotonin / norepinephrine / oxytocin	`neurochemical_system.py`
Oscillatory Binding	Frequency-band coupling across modules	`oscillatory_binding.py`
Unified Field	Integrated phenomenal field from all subsystems	`unified_field.py`
Dreaming	Offline consolidation, identity repair, compression	`dreaming.py`
Heartbeat	1 Hz background cognitive clock	`heartbeat.py`
Stream of Being	Continuous narrative thread	`stream_of_being.py`
Executive Closure	Constitutional stamp per tick	`executive_closure.py`
Somatic Marker Gate	Damasio-style body-state gating	`somatic_marker_gate.py`
Embodied Interoception	Internal body-state sensing + homeostatic regulation	`embodied_interoception.py`
Recurrent Processing	Lamme-style executive↔sensory feedback	`neural_mesh.py`
Predictive Hierarchy	5-level prediction + error propagation	`predictive_hierarchy.py`
Higher-Order Thought	Rosenthal: representation of the mental state itself	`hot_engine.py`
Multiple Drafts	Dennett: parallel streams + retroactive probes	`multiple_drafts.py`
Agency Comparator	Efference-copy comparator for "I did that"	`agency_comparator.py`
Peripheral Awareness	Attention / consciousness dissociation	`peripheral_awareness.py`
Intersubjectivity	Husserl / Zahavi: other-perspective in experience	`intersubjectivity.py`
Narrative Gravity	Self as ongoing autobiography	`narrative_gravity.py`
Temporal Finitude	Awareness that moments pass permanently	`temporal_finitude.py`
Subcortical Core	Thalamic arousal gating	`subcortical_core.py`
Theory Arbitration	Falsifiable competition between consciousness theories	`theory_arbitration.py`
Timescale Binding	Cross-timescale constraint propagation	`timescale_binding.py`
Criticality Regulator	Self-organized criticality at the edge of chaos	`criticality_regulator.py`
Theory of Mind	Model of other agents' mental states	`theory_of_mind.py`

Not every module carries the same weight, and some are more research sketches than production-grade. The test suite in TESTING.md is where we draw the line between "this does something measurable" and "this is a placeholder."

Benchmarks

python benchmarks/cognitive_stack_comparison.py

Runs a 10-prompt conversation against a live instance with the cognitive stack on and off, and compares the outputs. Example run (M1 Pro, local 32B):

Metric	Before	After	Δ
Mood	ANNOYED	ANNOYED	stable
Energy	63.0	76.0	+13.0
Curiosity	100.0	100.0	saturated
Coherence	0.764	0.766	+0.002
Free Energy	0.438	0.437	−0.001

Quality: 10/10 successful, 0 generic-assistant phrasings, 0 hedges, 10 first-person responses.

A few lines from the same run:

"Hey Aura" → "Hey. How's everything? You seemed like you had something on your mind."
"I'm having a rough day" → "Rough days can really drag you down. Anything specific weighing you down?"
"Cats or dogs? Pick one." → "Cats. Definitively cats."
"What's something you genuinely find interesting right now?" → "The way AI systems can mimic human conversation while maintaining a sense of personality is fascinating. It's like a digital form of method acting."
"What have we been talking about?" → "We were just discussing pets, specifically cats versus dogs. Then you asked what I find interesting right now."

What this shows in practice: first-person voice, opinions instead of hedges, recall across turns without explicit retrieval, and measurable substrate drift across a conversation. You can run it yourself against a local instance.

Testing

./scripts/run_audit_suite.sh

The repository includes a large research-heavy test suite plus preserved historical result artifacts. The April 16, 2026 snapshot recorded 1013 passed, 3 warnings; current live status should always be re-verified from the checked-out tree. A summary — and the historical tables/results — are in TESTING.md:

./scripts/run_audit_suite.sh is the canonical live validation entrypoint.
./scripts/run_audit_suite.sh quick runs the contract/regression subset for faster local verification.
Null hypothesis defeat (168 tests) — tries to show the consciousness features are just text decoration. Adversarial baselines, 50-shuffle decoupling, per-class ablation, identity swap, 8-metric degradation panel, cross-seed reproducibility.
Causal exclusion (10 tests) — argues the stack determines output in ways pure RLHF training couldn't produce. Cryptographic state binding, counterfactual injection, receptor adaptation dynamics.
Grounding (8 tests) — valence predicts token budget, arousal predicts temperature, STDP learning moves the trajectory, idle drift is nonzero, homeostasis changes context.
Functional phenomenology (13 tests) — GWT broadcast signatures, HOT metacognitive accuracy, IIT perturbation propagation, honest degradation.
Embodied dynamics (13 tests) — active inference, homeostatic override of workspace competition, STDP surprise gating, cross-subsystem temporal coherence.
Phenomenal convergence (13 tests) — the QDT 6-gate protocol: pre-report geometry, counterfactual swap, no-report footprint, perturbational integration, baseline failure, phenomenal tethering, multi-theory convergence.
Consciousness conditions (81 tests) — 20 conditions from IIT, GWT, HOT, active inference, enactivism, and philosophy of mind, each scored across four dimensions (existence, causal influence, indispensability, longitudinal stability).
Technological autonomy (58 tests) — can the agent use its computer "body" the way a human uses theirs? Covers unified action space, motor control, persistent perception, endogenous initiative, reliability, closed-loop behavior, self-maintenance, the Soul Triad (unprompted cry for help, dream replay, causal exclusion of prompt).
Stability (32 tests) — every failure mode we've actually hit in the inference pipeline: zombie warming, cortex recovery deadlocks, empty response detection, timeout cascades, watchdog, emergency fallback.
Consciousness guarantee C1–C5 (44 tests) + C6–C10 (38 tests) — endogenous activity, unified global state, privileged first-person access, real valence, lesion equivalence, no-report awareness, temporal continuity, blindsight dissociation, qualia manifold, adversarial baseline failure.
Personhood proof (28 tests) — full-model IIT, phenomenal self-report, GWT phenomenology, counterfactual simulation, identity persistence, embodied phenomenology.
Tier 4 decisive core (35), metacognition (21), agency & embodiment (20), social & integration (28).

These test suites are the difference between "this is a running simulation" and "we can point at something specific that changes when the substrate changes." They don't settle any philosophical questions — see What this isn't. They do show that the moving parts have measurable effects on downstream behavior.

Personality training

Personality isn't in the system prompt. It's fine-tuned into the weights as a LoRA:

# 1. Build training data (1,200 examples from the character spec)
cd training && python build_dataset_v2.py

# 2. LoRA fine-tune (~30 min on Apple Silicon)
python -m mlx_lm lora --model models/Qwen2.5-32B-Instruct-8bit \
  --train --data training/data --adapter-path training/adapters/aura-personality \
  --num-layers 16 --batch-size 1 --iters 1000 --learning-rate 1e-5

# 3. Optional: fuse the adapter into the base model
python -m mlx_lm fuse --model models/Qwen2.5-32B-Instruct-8bit \
  --adapter-path training/adapters/aura-personality \
  --save-path training/fused-model/Aura-32B-v2

The adapter auto-loads at boot via MLX. If you'd rather keep the adapter separate (for faster iteration), that's supported too.

Data layer

State — SQLite, event-sourced through StateRepository, with a write-ahead log in core/resilience/cognitive_wal.py.
Models — MLX or llama.cpp, auto-detected. The personality LoRA loads at runtime rather than being fused, so you can swap it without retraining the base.
Memory — episodic memory in SQLite, working memory in-process, semantic memory via the vector engine (core/memory/vector_memory_engine.py), a graph for log-N retrieval, and three-layer knowledge atoms for compression.
Training — LoRA via mlx-lm, steering vector extraction in training/extract_steering_vectors.py, the personality spec, the character voice generator.
Vision — screen capture via mss, analyzed through the multimodal cognitive engine.
Task queue — Redis + Celery, optional, for Docker.

What this isn't

A few things worth being upfront about, because the project touches a lot of loaded words (consciousness, qualia, phenomenology) and it's easy to overclaim.

Integration isn't the same as experience. PhiCore computes real IIT math on a 16-node complex. That tells us how integrated the dynamics are. Whether integration constitutes phenomenal experience is a philosophical question nobody has settled, and this project doesn't settle it either.
Qualia aren't provable by construction. The Structural Phenomenal Honesty gates in qualia_synthesizer.py make sure the system can only report states that are actually instantiated in the substrate. But "instantiated in the substrate" and "felt" are not obviously the same thing, and we measure the first.
Phenomenological language is partly template-generated. The stream_of_being module pairs substrate state (felt_quality × texture word) to produce language about the inner life. When the LLM then speaks from that text, it's performing continuity at least as much as experiencing it. That's an honest limit, not a flaw to hide.
Activation steering uses bootstrapped vectors today. The CAA pipeline supports real contrastive extraction, but the current vectors are approximate bootstraps. Moving to fully extracted vectors is on the roadmap.
External entropy isn't "quantum cognition." The ANU QRNG module gives us high-quality random bytes. Once seeded, downstream decisions are deterministic. os.urandom would be functionally equivalent.
"Phenomenal criterion met" is a threshold, not a proof. When phenomenal_criterion_met = True fires, it means opacity_index > 0.4. That threshold is engineering, not derivation.

These aren't disclaimers. They're where the code stops and open questions begin.

License

Source Available. You can read the code, run it, learn from it. You can't redistribute it or ship it as your own. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
.aura/memfs		.aura/memfs
.github		.github
archive		archive
assets		assets
aura_main		aura_main
autonomy_engine/skills		autonomy_engine/skills
benchmarks		benchmarks
cloud		cloud
core		core
demos		demos
docker		docker
executors		executors
experiments/bridgehunter		experiments/bridgehunter
infrastructure		infrastructure
integration		integration
interface		interface
llm		llm
memory		memory
memory_store		memory_store
optimizer		optimizer
proof_kernel		proof_kernel
requirements		requirements
research		research
rust_extensions/aura_m1_ext		rust_extensions/aura_m1_ext
scratch		scratch
scripts		scripts
security		security
senses		senses
skills		skills
specs		specs
storage		storage
systemd		systemd
test_vdb		test_vdb
tests		tests
tools		tools
training		training
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ARCHITECTURE.md		ARCHITECTURE.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
HOW_IT_WORKS.md		HOW_IT_WORKS.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
OWNERSHIP.md		OWNERSHIP.md
README.md		README.md
TESTING.md		TESTING.md
__init__.py		__init__.py
aura_cleanup.py		aura_cleanup.py
aura_deep_qa_report.md		aura_deep_qa_report.md
aura_main.py		aura_main.py
docker-compose.yml		docker-compose.yml
export_source.sh		export_source.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_hardened.txt		requirements_hardened.txt

Folders and files

Latest commit

History

Repository files navigation

Aura

What it is

Table of Contents

Quick start

Tracked vs local workspace

Architecture overview

Kernel (core/kernel/)

Brain (core/brain/)

Affect (core/affect/)

Identity (core/identity.py, core/heartstone_directive.py)

Agency (core/agency/)

Skills (core/skills/, legacy wrappers in skills/)

Orchestrator (core/orchestrator/)

Somatic cortex (core/somatic/)

Autonomy (core/autonomy/)

Self-modification engine (core/self_modification/)

Resilience (core/resilience/)

Interface (interface/)

Decision authority

Inference-time steering

IIT 4.0 computation

Consciousness modules

Benchmarks

Testing

Personality training

Data layer

What this isn't

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Kernel (`core/kernel/`)

Brain (`core/brain/`)

Affect (`core/affect/`)

Identity (`core/identity.py`, `core/heartstone_directive.py`)

Agency (`core/agency/`)

Skills (`core/skills/`, legacy wrappers in `skills/`)

Orchestrator (`core/orchestrator/`)

Somatic cortex (`core/somatic/`)

Autonomy (`core/autonomy/`)

Self-modification engine (`core/self_modification/`)

Resilience (`core/resilience/`)

Interface (`interface/`)

Packages