From 746073e0024352ee42a746eba126b5dee20ed1e9 Mon Sep 17 00:00:00 2001 From: Tim Black Date: Thu, 26 Feb 2026 21:54:39 -0800 Subject: [PATCH 1/4] docs: rewrite tests/emulation/README.md from 912 to 120 lines Remove vsim-era content: two mermaid diagrams, session history table, stale path references (hosts/n100-*, modules/hardware/, modules/roles/, VSIM-INTEGRATION-PLAN.md, tests/integration/), redundant architecture sections, oversized platform/troubleshooting/ARM64/resource-control sections, and "Integrated from vsim project" footer. Retain: framework purpose with clear "not primary test infra" callout, platform requirements, directory structure, build/run commands, inner-VM usage (virsh, OVS, tc profiles), compact ASCII architecture diagram, traffic control profile table, flake outputs, and references. --- tests/emulation/README.md | 926 +++----------------------------------- 1 file changed, 67 insertions(+), 859 deletions(-) diff --git a/tests/emulation/README.md b/tests/emulation/README.md index 9019bab..e474c02 100644 --- a/tests/emulation/README.md +++ b/tests/emulation/README.md @@ -1,912 +1,120 @@ -# n3x Emulation Testing Framework - -**Status**: Operational (Session 12 - Headless Mode) -**Branch**: `main` -**Last Updated**: 2025-12-12 -**Platform**: Native Linux only (see [Platform Compatibility](#platform-compatibility)) - -A declarative NixOS-based nested virtualization platform for testing n3x's production k3s cluster configurations before bare-metal deployment. - ---- - -## Executive Summary - -The emulation testing framework enables comprehensive validation of n3x deployments using nested virtualization. It provides: - -- **Nested virtualization** - Run production n3x configs as libvirt VMs within a hypervisor VM -- **Network simulation** - OVS switch fabric with QoS, traffic control, and constraint profiles -- **ARM64 emulation** - Test Jetson configs via QEMU TCG on x86_64 hosts -- **Resource constraints** - Validate behavior under embedded system limits -- **Production parity** - Tests use actual n3x modules from `hosts/` and `modules/` - -**Primary Use Case**: Test automation for k3s cluster deployments that require realistic multi-node simulation without cloud dependencies or physical hardware. - -### Emulated Architecture - -```mermaid -flowchart TB - subgraph host["Physical Host (Laptop / Server / Cloud VM)"] - direction TB - subgraph outer["Outer VM — NixOS Hypervisor Layer"] - direction TB - - subgraph services["System Services"] - libvirt["libvirtd"] - ovs_daemon["openvswitch"] - systemd_net["systemd-networkd"] - tc["tc (traffic control)"] - end - - subgraph switch["OVS Switch Fabric: ovsbr0"] - direction TB - - subgraph mgmt_port["Management Port"] - vnet0["vnet0
192.168.100.1/24"] - end - - switch_fabric["OVS Bridge
VLAN / QoS / tc shaping"] - - subgraph port_1["Port 1"] - tap_1["vnet-server-1"] - end - - subgraph port_2["Port 2"] - tap_2["vnet-server-2"] - end - - subgraph port_3["Port 3"] - tap_3["vnet-agent-1"] - end - end - - subgraph vms["Inner VMs (k3s Cluster)"] - direction TB - - subgraph vm_1["server-1 (x86_64)"] - vm_1_info["k3s Server
4GB RAM · 2 vCPU"] - end - - subgraph vm_2["server-2 (x86_64)"] - vm_2_info["k3s Server
4GB RAM · 2 vCPU"] - end - - subgraph vm_3["agent-1 (x86_64 or arm64)"] - vm_3_info["k3s Agent
2GB RAM · 2 vCPU"] - end - end - - vnet0 <--> switch_fabric - switch_fabric <--> tap_1 - switch_fabric <--> tap_2 - switch_fabric <--> tap_3 - - tap_1 <--> vm_1 - tap_2 <--> vm_2 - tap_3 <--> vm_3 - - libvirt --> vm_1 - libvirt --> vm_2 - libvirt --> vm_3 - ovs_daemon --> switch_fabric - tc --> switch_fabric - systemd_net --> vnet0 - end - end - - style host fill:#f5f5f5,stroke:#333 - style outer fill:#e1f5ff,stroke:#1976d2 - style switch fill:#fff4e1,stroke:#f57c00 - style services fill:#e8f5e9,stroke:#388e3c - style vms fill:#fce4ec,stroke:#c2185b - style vm_1 fill:#e3f2fd,stroke:#1565c0 - style vm_2 fill:#fff3e0,stroke:#ef6c00 - style vm_3 fill:#f3e5f5,stroke:#7b1fa2 - style switch_fabric fill:#90EE90,stroke:#2e7d32 -``` - -Each inner VM connects to the OVS bridge through a dedicated tap interface, emulating physical switch ports. The outer VM's system services — libvirtd for VM lifecycle, openvswitch for the network fabric, and tc for traffic shaping — provide the simulation control plane. systemd-networkd manages the host-side management interface. - ---- - -## Platform Compatibility - -> **Important**: This emulation framework uses **nested virtualization** (VMs inside VMs). -> It requires a platform that supports at least 2 levels of virtualization depth. - -### Platform Support Matrix - -| Platform | Emulation Framework | Reason | -|----------|---------------------|--------| -| **Native Linux** (bare metal) | YES | Full KVM nested virtualization support | -| **Native Linux** (cloud VM with nested virt) | YES | Works on AWS metal instances, GCP with `--enable-nested-virtualization` | -| **WSL2** (Windows 10/11) | NO | Hyper-V Enlightened VMCS limits to 2 levels; inner VMs hang indefinitely | -| **Docker Desktop** | NO | No nested virtualization support | -| **macOS** (Intel) | PARTIAL | Requires VMware Fusion or Parallels with nested virt enabled | -| **macOS** (Apple Silicon) | NO | No nested x86_64 virtualization; ARM64-only via UTM/Virtualization.framework | - -### Why WSL2 Doesn't Work - -The emulation framework attempts 3-level nesting: -``` -Hyper-V (L0) → WSL2 (L1) → Outer VM (L2) → Inner VMs (L3) - ↑ BLOCKED -``` - -Hyper-V's Enlightened VMCS architecture does not support L3 guests: -- eVMCS v1 disables Shadow VMCS required for deeper nesting -- Microsoft's TLFS defines no L3 terminology—it stops at L2 -- Inner VMs hang indefinitely with no boot output +# Emulation Testing Framework -See: [docs/hyper-v-enlightened-vmcs-caps-nested-virt-at-2-levels.md](../../docs/hyper-v-enlightened-vmcs-caps-nested-virt-at-2-levels.md) +A NixOS-based nested virtualization platform for interactive debugging and exploration of n3x k3s cluster configurations. It runs production n3x configs as libvirt VMs inside a hypervisor VM, connected by an OVS switch fabric with traffic control. -### Recommended Alternatives +> **This is NOT the primary test infrastructure.** Automated CI/CD testing uses `nixosTest` multi-node — see [tests/README.md](../README.md). The emulation framework is for interactive debugging on native Linux only. -For k3s cluster testing on platforms without nested virtualization support: +## Platform Requirements -1. **nixosTest multi-node** (Recommended for CI/CD) - - Each nixosTest "node" IS a k3s cluster node—no inner VMs needed - - Works on WSL2, Darwin, and cloud platforms - - See: `tests/integration/k3s-*.nix` - ```bash - nix build '.#checks.x86_64-linux.k3s-cluster-formation' - nix build '.#checks.x86_64-linux.k3s-storage' - nix build '.#checks.x86_64-linux.k3s-network' - ``` +This framework requires nested virtualization (VMs inside VMs). It works on: -2. **Cloud-based testing** - - Run tests on AWS metal instances (e.g., `c5.metal`, `i3.metal`) - - Use GCP with nested virtualization enabled - - GitLab CI with self-hosted KVM-enabled runners +- **Native Linux** (bare metal or cloud VMs with nested virt enabled) -3. **Native Linux workstation** - - Dual-boot or dedicated Linux machine - - Full nested virtualization support +It does **not** work on WSL2 (Hyper-V caps nesting at 2 levels), Docker Desktop, or macOS Apple Silicon. See [docs/hyper-v-enlightened-vmcs-caps-nested-virt-at-2-levels.md](../../docs/hyper-v-enlightened-vmcs-caps-nested-virt-at-2-levels.md) for details. -### Verifying Your Platform +Verify nested virtualization before use: ```bash -# Check if nested virtualization is available cat /sys/module/kvm_intel/parameters/nested # Intel: Y or 1 cat /sys/module/kvm_amd/parameters/nested # AMD: 1 - -# On WSL2 (will show nested=1 but L3 still won't work due to eVMCS) -cat /sys/module/kvm_intel/parameters/nested # Shows: Y -# However, L3 guests still hang—this is an architectural limitation - -# Verify you're NOT in WSL2 before using emulation framework -if [ -n "$WSL_DISTRO_NAME" ]; then - echo "WARNING: WSL2 detected. Emulation framework will NOT work." - echo "Use nixosTest multi-node instead: nix build '.#checks.x86_64-linux.k3s-cluster-formation'" -fi ``` ---- - -## Quick Start - -### Prerequisites - -Verify nested virtualization is enabled on your host: - -```bash -# Intel CPUs -cat /sys/module/kvm_intel/parameters/nested -# Expected: Y or 1 - -# AMD CPUs -cat /sys/module/kvm_amd/parameters/nested -# Expected: 1 +## Directory Structure -# Enable if needed (Intel example) -echo "options kvm_intel nested=1" | sudo tee /etc/modprobe.d/kvm-nested.conf -sudo modprobe -r kvm_intel && sudo modprobe kvm_intel +``` +tests/emulation/ +├── README.md # This file +├── embedded-system.nix # Main emulator: VM topology, outer VM services +└── lib/ + ├── inner-vm-base.nix # Base NixOS module for inner VMs (virtio, serial, auth) + ├── mkInnerVM.nix # n3x host configs → libvirt domain XML + ├── mkInnerVMImage.nix # n3x host configs → bootable qcow2 images + ├── mkOVSBridge.nix # OVS switch fabric + systemd-networkd host interface + └── mkTCProfiles.nix # Traffic control constraint profiles (tc/netem) ``` -**System Requirements**: -- CPU with nested virtualization support (Intel VT-x/AMD-V) -- 16GB+ RAM recommended (outer VM uses 12GB by default) -- 60GB+ disk space for VM images and build artifacts - -### Build and Run +## Build and Run ```bash -# Build the emulation VM package -nix build .#packages.x86_64-linux.emulation-vm +# Build the emulation VM (includes pre-built inner VM images) +nix build '.#packages.x86_64-linux.emulation-vm' -# Option 1: Interactive mode (foreground, console on stdio) +# Interactive mode (foreground, serial console on stdio) ./result/bin/run-nixos-vm -# Option 2: Background mode (daemon, connect via socat) -nix run .#emulation-vm-bg -# Then connect with: socat -,raw,echo=0 unix-connect:$XDG_RUNTIME_DIR/n3x-emulation/serial.sock +# Background mode (daemon, connect via socat) +nix run '.#emulation-vm-bg' +socat -,raw,echo=0 unix-connect:$XDG_RUNTIME_DIR/n3x-emulation/serial.sock # Stop background VM echo 'quit' | socat - unix-connect:$XDG_RUNTIME_DIR/n3x-emulation/monitor.sock ``` -**Note**: The VM runs in **headless mode** (no graphics window). Serial console is available via stdio in interactive mode, or via Unix socket in background mode. +System requirements: CPU with nested virt (VT-x/AMD-V), 16GB+ RAM, 60GB+ disk. -### Inside the Outer VM +## Inside the Outer VM ```bash -# List all defined inner VMs +# List and start inner VMs virsh list --all +for vm in n100-1 n100-2 n100-3; do virsh start "$vm"; done -# Start VMs -virsh start n100-1 -virsh start n100-2 -virsh start n100-3 - -# Start all VMs at once -for vm in n100-1 n100-2 n100-3; do virsh start $vm; done - -# Access VM console (press Ctrl+] to exit) +# Console access (Ctrl+] to exit) virsh console n100-1 -# View OVS switch topology +# OVS switch topology ovs-vsctl show -# Apply network constraints -/etc/tc-simulate-constraints.sh constrained -/etc/tc-simulate-constraints.sh lossy -/etc/tc-simulate-constraints.sh status -/etc/tc-simulate-constraints.sh default # Remove constraints +# Traffic control constraint profiles +/etc/tc-simulate-constraints.sh constrained # Embedded system limits +/etc/tc-simulate-constraints.sh lossy # Packet loss + jitter +/etc/tc-simulate-constraints.sh status # Show current config +/etc/tc-simulate-constraints.sh default # Remove all constraints ``` ---- +Inner VMs boot directly to NixOS with pre-built disk images. Login: `root` / `test`. ## Architecture -### Nested Virtualization Structure - -``` -Physical Host (Laptop / Server / Cloud VM) -└── Outer VM (NixOS Hypervisor Layer - 12GB RAM, 8 vCPU) - ├── libvirtd (VM management) - ├── openvswitch (ovsbr0 bridge - simulates switch fabric) - ├── dnsmasq (DHCP/DNS for inner VMs) - ├── systemd-networkd (host interface: vnet0 @ 192.168.100.1/24) - └── Inner VMs (n3x production configs): - ├── n100-1 (x86_64) - k3s Server - 192.168.100.10 (4GB, 2 vCPU) [KVM] - ├── n100-2 (x86_64) - k3s Server - 192.168.100.11 (4GB, 2 vCPU) [KVM] - ├── n100-3 (x86_64) - k3s Agent - 192.168.100.12 (2GB, 2 vCPU, +10GB disk) [KVM] - └── jetson-1 (arm64) - k3s Agent - 192.168.100.20 (2GB, 2 vCPU) [TCG] -``` - -### Architecture Diagram - -```mermaid -flowchart TB - subgraph host["Physical Host (Laptop / Server / Cloud VM)"] - direction TB - subgraph outer["Outer VM - NixOS Hypervisor Layer"] - direction TB - - subgraph services["System Services"] - libvirt["libvirtd"] - ovs_daemon["openvswitch"] - dnsmasq["dnsmasq"] - tc["tc (traffic control)"] - end - - subgraph switch["OVS Switch Fabric: ovsbr0"] - direction TB - vnet0["vnet0
192.168.100.1/24"] - switch_fabric["Switch Fabric
QoS / tc shaping"] - end - - subgraph vms["Inner VMs (n3x Production Configs)"] - direction TB - n100_1["n100-1 (x86)
k3s Server
192.168.100.10"] - n100_2["n100-2 (x86)
k3s Server
192.168.100.11"] - n100_3["n100-3 (x86)
k3s Agent
192.168.100.12"] - jetson_1["jetson-1 (arm64)
k3s Agent
192.168.100.20"] - end - - vnet0 <--> switch_fabric - switch_fabric <--> n100_1 - switch_fabric <--> n100_2 - switch_fabric <--> n100_3 - switch_fabric <--> jetson_1 - - libvirt --> vms - ovs_daemon --> switch_fabric - tc --> switch_fabric - dnsmasq --> vnet0 - end - end -``` - ---- - -## When to Use Emulation Testing - -### Testing Hierarchy - -n3x uses a multi-layer testing approach. Choose the right layer for your needs: - -| Layer | Tool | Speed | Use Case | Platforms | -|-------|------|-------|----------|-----------| -| **1. Fast Automated** | `nixosTest` multi-node | Seconds-Minutes | Multi-node k3s clusters, CI/CD, VLAN testing | All (WSL2, Darwin, Cloud) | -| **2. Emulation** | This framework (OVS + nested virt) | Minutes | Interactive testing, OVS topology visualization | Native Linux only | -| **3. Manual VMs** | `tests/vms/` | Minutes | Interactive debugging, exploration | All platforms | -| **4. Bare-metal** | Physical hardware | Hours | Final production validation | N/A | - -### Use OVS Emulation Framework When You Need To: - -1. **Interactive network testing** - SSH into VMs, run commands manually, explore behavior -2. **OVS topology visualization** - See switch fabric with `ovs-vsctl show` -3. **Test specific switch configurations** - Validate OpenFlow rules, VLAN trunk behavior -4. **ARM64 cross-architecture validation** - Test Jetson configs via QEMU TCG (very slow) -5. **Resource constraint scenarios** - Test behavior under extreme memory/CPU limits - -**Platform Requirement**: Native Linux with KVM nested virtualization only. - -### Use nixosTest Multi-Node For: - -1. **Automated CI/CD testing** - Fast, reproducible, works on all platforms -2. **Multi-node k3s cluster validation** - 2+ servers, agents, cluster formation -3. **VLAN tagging testing** - Validate 802.1Q VLANs before hardware deployment - ```bash - nix build '.#checks.x86_64-linux.k3s-cluster-vlans' # VLAN tagging - nix build '.#checks.x86_64-linux.k3s-cluster-bonding-vlans' # Bonding + VLANs - ``` -4. **Network constraints testing** - tc/netem directly on nodes (no OVS needed) - ```bash - nix build '.#checks.x86_64-linux.k3s-network-constraints' - ``` -5. **Storage and networking integration** - PVC, Longhorn prerequisites, CoreDNS - ```bash - nix build '.#checks.x86_64-linux.k3s-storage' - nix build '.#checks.x86_64-linux.k3s-network' - ``` - -**Platform Support**: WSL2, Darwin (via Lima/UTM), Cloud VMs, Native Linux. - -### Don't Use Emulation Framework For: - -- **CI/CD pipelines** - Use `nixosTest` multi-node instead (faster, works everywhere) -- **VLAN testing** - Use `k3s-cluster-vlans` nixosTest (production parity without nesting) -- **Performance benchmarking** - Nested virtualization adds significant overhead -- **Final production validation** - Use bare-metal hardware -- **WSL2 or macOS development** - Use `nixosTest` multi-node (emulation won't work) - -### Key Insight: Both Approaches Are Complementary - -The emulation framework (OVS + nested virt) and nixosTest multi-node serve **different purposes**: - -| Feature | OVS Emulation | nixosTest Multi-Node | -|---------|---------------|---------------------| -| **Primary Use** | Interactive testing, visualization | Automated CI/CD, validation | -| **Network Sim** | OVS switch fabric with topology | Direct node connections | -| **VLAN Support** | OVS VLAN trunks (manual setup) | 802.1Q VLAN tagging (automated) | -| **Platform** | Native Linux only | All platforms | -| **Speed** | Slower (3-level nesting) | Faster (1-2 levels) | -| **Best For** | Exploration, debugging | Continuous integration | - -**Use both**: Run `k3s-cluster-vlans` in CI, then use emulation framework for interactive debugging on bare metal Linux. - ---- - -## Cluster Configuration - -### Default VM Definitions - -The framework creates four VMs using n3x production configurations: - -| VM | Arch | Role | IP | Resources | Emulation | -|----|------|------|-----|-----------|-----------| -| **n100-1** | x86_64 | k3s Server | 192.168.100.10 | 4GB RAM, 2 vCPU | KVM (fast) | -| **n100-2** | x86_64 | k3s Server | 192.168.100.11 | 4GB RAM, 2 vCPU | KVM (fast) | -| **n100-3** | x86_64 | k3s Agent | 192.168.100.12 | 2GB RAM, 2 vCPU, +10GB disk | KVM (fast) | -| **jetson-1** | aarch64 | k3s Agent | 192.168.100.20 | 2GB RAM, 2 vCPU | TCG (slow) | - -### Customizing VMs - -Edit `tests/emulation/embedded-system.nix` to modify VM definitions: - -```nix -innerVMs = [ - (mkInnerVM { - hostname = "n100-1"; # Uses hosts/n100-1/configuration.nix - mac = "52:54:00:12:34:01"; - ip = "192.168.100.10"; - memory = 4096; # 4GB RAM - vcpus = 2; - qosProfile = "default"; # Full gigabit speed - }) - - # ARM64 Jetson is included by default for cross-architecture testing: - (mkInnerVM { - hostname = "jetson-1"; # Uses hosts/jetson-1/configuration.nix - mac = "52:54:00:12:34:10"; - ip = "192.168.100.20"; - memory = 2048; - vcpus = 2; - arch = "aarch64"; # Forces QEMU TCG emulation (slow!) - qosProfile = "constrained"; # Simulate embedded ARM limits - }) -]; -``` - -### ARM64 Notes - -The `jetson-1` VM uses QEMU TCG (Tiny Code Generator) for software emulation of ARM64. -This is automatically configured when `arch = "aarch64"` is specified: - -- **UEFI firmware**: Uses EDK2 aarch64 firmware from QEMU package -- **CPU model**: Cortex-A57 (compatible with Jetson Orin Nano) -- **GIC**: Version 3 (Generic Interrupt Controller) -- **Performance**: Expect 10-20x slower than native KVM - -Use jetson-1 for validating ARM64 compatibility, not for performance testing. - ---- - -## Network Simulation - -### OVS Switch Fabric - -The framework uses Open vSwitch to simulate the network fabric: - -```bash -# View switch topology -ovs-vsctl show - -# Expected output: -# Bridge ovsbr0 -# Port ovsbr0 -# Interface ovsbr0 -# type: internal -# Port vnet0 -# Interface vnet0 -# type: internal -``` - -### Traffic Control Profiles - -The `/etc/tc-simulate-constraints.sh` script applies network constraints: - -| Profile | Effect | Use Case | -|---------|--------|----------| -| `default` | No constraints (full speed) | Normal operation | -| `constrained` | 10-100Mbps + latency | Embedded system simulation | -| `lossy` | Packet loss + jitter | Network resilience testing | -| `status` | Show current config | Debugging | - -#### Profile Details - -**Constrained Profile** (embedded system limits): -- Server nodes: 10Mbps, 100ms latency -- Agent nodes: 100Mbps, 10ms latency - -**Lossy Profile** (resilience testing): -- Server nodes: 2% packet loss, 50±20ms delay (normal distribution) -- Agent nodes: 0.5% packet loss, 20±10ms delay - -#### Usage Examples - -```bash -# Apply constrained profile -/etc/tc-simulate-constraints.sh constrained - -# Check bandwidth with constraints -iperf3 -c 192.168.100.10 -t 10 - -# Apply lossy profile for resilience testing -/etc/tc-simulate-constraints.sh lossy - -# Verify packet loss -ping -c 100 192.168.100.10 | tail -2 - -# Remove all constraints -/etc/tc-simulate-constraints.sh default - -# View current tc rules -/etc/tc-simulate-constraints.sh status -``` - -### QoS Profiles - -VM network interfaces have libvirt QoS settings in addition to tc rules: - -| Profile | Bandwidth | Peak | Description | -|---------|-----------|------|-------------| -| `default` | 1 Gbps | 2 Gbps | Full speed | -| `constrained` | 100 Mbps | 200 Mbps | Embedded limits | -| `lossy` | 50 Mbps | 100 Mbps | Unreliable network | - ---- - -## Directory Structure - ``` -tests/emulation/ -├── README.md # This documentation -├── embedded-system.nix # Main emulator configuration -└── lib/ - ├── mkInnerVM.nix # Generator: n3x configs → libvirt VMs - ├── mkInnerVMImage.nix # Generator: n3x configs → bootable qcow2 images - ├── mkOVSBridge.nix # OVS switch fabric configuration - ├── mkTCProfiles.nix # Traffic control profile generator - └── inner-vm-base.nix # Base NixOS module for inner VMs +Physical Host (bare metal / cloud VM with nested virt) +└── Outer VM (NixOS Hypervisor — 12GB RAM, 8 vCPU) + ├── libvirtd VM lifecycle management + ├── openvswitch ovsbr0 bridge (simulated switch fabric) + │ └── vnet0 Host management interface (192.168.100.1/24) + ├── dnsmasq DHCP/DNS for inner VMs + ├── tc Traffic shaping on VM tap interfaces + └── Inner VMs + ├── n100-1 x86_64 k3s Server 192.168.100.10 4GB/2vCPU [KVM] + ├── n100-2 x86_64 k3s Server 192.168.100.11 4GB/2vCPU [KVM] + ├── n100-3 x86_64 k3s Agent 192.168.100.12 2GB/2vCPU [KVM] + └── jetson-1 arm64 k3s Agent 192.168.100.20 2GB/2vCPU [TCG] ``` -### Module Descriptions - -**`embedded-system.nix`** - Main emulator configuration -- Imports library functions -- Defines VM cluster topology -- Configures outer VM services (libvirtd, OVS, dnsmasq) -- Sets up inner VM initialization service - -**`lib/mkInnerVM.nix`** - VM generator function -- Converts n3x host configs to libvirt VM definitions -- Generates libvirt domain XML -- Handles architecture detection (x86_64 vs aarch64) -- Applies QoS profiles and resource limits - -**`lib/mkOVSBridge.nix`** - OVS bridge configuration -- Creates OVS switch with internal host interface -- Configures systemd-networkd for host connectivity -- Returns NixOS module configuration - -**`lib/mkTCProfiles.nix`** - Traffic control script generator -- Generates bash script for tc rule management -- Supports multiple constraint profiles -- Dynamically detects VM interfaces - -**`lib/mkInnerVMImage.nix`** - VM image builder -- Imports actual n3x host configs from `hosts/` -- Overlays `inner-vm-base.nix` for VM-specific settings -- Uses NixOS `make-disk-image` to create bootable qcow2 images -- Handles both x86_64 and aarch64 architectures - -**`lib/inner-vm-base.nix`** - Base NixOS module for inner VMs -- VM-specific hardware settings (virtio, serial console, QEMU guest support) -- Simplified storage (no disko partitioning, just single root disk) -- Network configuration via systemd-networkd (DHCP from dnsmasq) -- Test-friendly authentication (root/test) - ---- - -## Integration with n3x - -### Production Module Usage - -The emulation framework uses n3x's production modules directly: +The ARM64 `jetson-1` VM uses QEMU TCG (software emulation), which is 10-20x slower than native. Use it for cross-architecture validation only. -| Component | Source | -|-----------|--------| -| Host configs | `hosts/n100-*/configuration.nix` | -| Hardware modules | `modules/hardware/n100.nix`, `modules/hardware/jetson-orin-nano.nix` | -| Role modules | `modules/roles/k3s-server.nix`, `modules/roles/k3s-agent.nix` | -| Network modules | `modules/network/dual-ip-bonding.nix` | -| Storage modules | `modules/storage/*` | +## Traffic Control Profiles -This ensures tests validate the exact configurations that will be deployed to hardware. +| Profile | Effect | +|---------|--------| +| `default` | No constraints (full speed) | +| `constrained` | 10-100 Mbps bandwidth limits + latency | +| `lossy` | Packet loss + jitter for resilience testing | -### Flake Outputs +Profiles apply tc/netem rules to inner VM tap interfaces. VMs also have libvirt QoS bandwidth limits configured in their domain XML. -The emulation environment is exposed via several flake outputs: +## Flake Outputs ```bash -# Show available outputs -nix flake show | grep emulation - -# Build the emulation VM package -nix build .#packages.x86_64-linux.emulation-vm - -# Run VM interactively (foreground) -nix run .#emulation-vm - -# Run VM in background with connection info -nix run .#emulation-vm-bg - -# Build and run the check (verifies boot) -nix build .#checks.x86_64-linux.emulation-vm-boots - -# Build the nixosConfiguration -nix build .#nixosConfigurations.emulator-vm.config.system.build.vm -``` - -### Running Automated Checks - -```bash -# Run all flake checks including emulation -nix flake check - -# Run only the emulation boot test -nix build .#checks.x86_64-linux.emulation-vm-boots +nix build '.#packages.x86_64-linux.emulation-vm' # VM package +nix run '.#emulation-vm' # Interactive run +nix run '.#emulation-vm-bg' # Background run +nix build '.#checks.x86_64-linux.emulation-vm-boots' # Automated boot check +nix build '.#nixosConfigurations.emulator-vm.config.system.build.vm' # Raw VM build ``` ---- - -## ARM64 Emulation - -### Jetson/ARM64 Testing - -The framework supports ARM64 emulation via QEMU TCG for testing Jetson configurations: - -```nix -(mkInnerVM { - hostname = "jetson-1"; - mac = "52:54:00:12:34:10"; - ip = "192.168.100.20"; - memory = 2048; - vcpus = 2; - arch = "aarch64"; # Forces TCG emulation - qosProfile = "constrained"; # Simulate embedded limits -}) -``` - -### Performance Expectations - -ARM64 emulation via QEMU TCG is significantly slower than native: - -| Operation | Native ARM64 | Emulated (TCG) | Slowdown | -|-----------|--------------|----------------|----------| -| Boot time | 30-60s | 5-10 minutes | 10-20x | -| k3s startup | 15-30s | 3-5 minutes | 10-15x | -| General compute | baseline | 10-20x slower | 10-20x | -| Network I/O | baseline | 2-5x slower | 2-5x | - -### Development Strategy - -- Use x86_64 VMs for rapid iteration and most testing -- Use ARM64 emulation for final cross-architecture validation -- Consider native ARM64 hosts for intensive ARM64 testing: - - Apple Silicon Mac with Virtualization.framework - - AWS Graviton instances - - Raspberry Pi 4/5 cluster - ---- - -## Troubleshooting - -### Common Issues - -| Issue | Solution | -|-------|----------| -| VM extremely slow | If ARM64: Expected (TCG emulation). If x86_64: Check nested virt is enabled on host | -| `virsh list` shows no VMs | `systemctl restart setup-inner-vms` | -| VMs don't get IP addresses | Check `systemctl status dnsmasq` and `journalctl -u dnsmasq` | -| VM won't start | Check `/dev/kvm` exists; verify nested virtualization enabled | -| OVS bridge missing | `systemctl status openvswitch`; check `networking.vswitches` config | -| tc rules not applying | VMs must be running first (interfaces created dynamically) | -| Build fails with merge conflict | Check for duplicate module options; use `lib.mkMerge` | - -### Debugging Commands - -```bash -# Verify services are running (inside outer VM) -systemctl is-active libvirtd openvswitch setup-inner-vms dnsmasq - -# Check service logs -journalctl -u libvirtd -journalctl -u openvswitch -journalctl -u setup-inner-vms -journalctl -u dnsmasq - -# Restart VM setup if VMs don't appear -systemctl restart setup-inner-vms - -# Check VM network interface assignments -virsh domiflist n100-1 -virsh domiflist n100-2 -virsh domiflist n100-3 - -# View VM details -virsh dominfo n100-1 - -# Check host network interface -ip addr show vnet0 # Should have 192.168.100.1/24 - -# Test connectivity (after VMs are started and booted) -ping -c 3 192.168.100.10 -ping -c 3 192.168.100.11 -ping -c 3 192.168.100.12 -``` - -### Checking Nested Virtualization - -```bash -# On physical host (before running outer VM) -cat /sys/module/kvm_intel/parameters/nested # Intel: Y or 1 -cat /sys/module/kvm_amd/parameters/nested # AMD: 1 - -# Inside outer VM -ls -la /dev/kvm # Should exist and be accessible -``` - ---- - -## Inner VM Installation - -### Current State - -Inner VMs are created with empty disks. For full testing, you need to install NixOS into each VM. - -### Installation Options - -**Option 1: Manual Installation via Console** - -```bash -# Download NixOS ISO (from host or inside outer VM) -curl -LO https://channels.nixos.org/nixos-24.11/latest-nixos-minimal-x86_64-linux.iso - -# Attach ISO to VM -virsh attach-disk n100-1 /path/to/nixos.iso sdc --type cdrom --mode readonly - -# Boot from CD and install -virsh destroy n100-1 # Stop if running -virsh start n100-1 -virsh console n100-1 -``` - -**Option 2: nixos-anywhere (Recommended for Automation)** - -```bash -# From a machine that can SSH to the VMs -nixos-anywhere --flake .#n100-1 root@192.168.100.10 -``` - -**Option 3: Pre-built Disk Images (Implemented)** - -The emulation framework now builds pre-installed NixOS disk images automatically: - -```bash -# Build emulation VM (includes inner VM image generation) -nix build .#packages.x86_64-linux.emulation-vm - -# Run outer VM - images are copied on first boot -./result/bin/run-emulator-vm-vm - -# Inner VMs boot directly to NixOS! -virsh start n100-1 && sleep 10 && virsh console n100-1 -# Login: root / test -``` - -**How It Works**: -1. `mkInnerVMImage.nix` imports actual n3x host configs from `hosts/` -2. Overlays `inner-vm-base.nix` for VM-specific settings -3. Uses NixOS `make-disk-image` to create bootable qcow2 images -4. Images are copied from Nix store to `/var/lib/libvirt/images/` at boot - -**Note**: ARM64 image building is disabled by default (TCG emulation is slow). -Enable in `embedded-system.nix` by uncommenting `innerVMImages.jetson-1`. - -### Post-Installation Verification - -After installing NixOS on inner VMs and deploying k3s: - -```bash -# From outer VM, check k3s cluster -kubectl --kubeconfig=/path/to/kubeconfig get nodes -o wide - -# Expected output: -# NAME STATUS ROLES AGE VERSION -# n100-1 Ready control-plane,master 10m v1.28+k3s1 -# n100-2 Ready control-plane,master 8m v1.28+k3s1 -# n100-3 Ready 5m v1.28+k3s1 -``` - ---- - -## Resource Control - -### VM-Level Controls (libvirt) - -Each VM has resource limits defined in the libvirt domain XML: - -```xml -4096 -2 - - - 2048 - - - - 4224 - 4096 - - - - - - -``` - -### Traffic Control (tc) - -Additional constraints via Linux tc: - -```bash -# Bandwidth limit with latency -tc qdisc replace dev vnet-n100-1 root tbf rate 100mbit latency 50ms burst 1540 - -# Packet loss and delay -tc qdisc replace dev vnet-n100-1 root netem loss 1% delay 20ms 5ms - -# View current rules -tc qdisc show dev vnet-n100-1 -``` - -### Kubernetes-Level Controls (k3s) - -After k3s deployment, pods have their own resource constraints: - -```yaml -resources: - limits: - cpu: "500m" - memory: "256Mi" - requests: - cpu: "100m" - memory: "128Mi" -``` - ---- - -## Implementation Status - -### Completed Sessions - -| Session | Description | Status | -|---------|-------------|--------| -| 0 | Branch cleanup | Complete | -| 1 | Fix flake check issues | Complete | -| 2 | Create directory structure | Complete | -| 3 | Implement mkInnerVM.nix | Complete | -| 4 | Implement network simulation modules | Complete | -| 5 | Refactor embedded-system emulator | Complete | -| 6 | Integrate with flake outputs | Complete | -| 7 | Create documentation (this file) | Complete | -| 8 | Update main project documentation | Complete | -| 9 | Create network resilience tests | Complete | -| 10 | ARM64 emulation (jetson-1 via TCG) | Complete | -| 11 | Inner VM installation automation | Complete | -| 12 | Headless mode with background runner | Complete | - -### Planned Sessions - -| Session | Description | Status | -|---------|-------------|--------| -| 13 | K3s cluster formation testing | Planned | -| 14 | End-to-end provisioning tests | Planned | - -See `CLAUDE.md` in the project root for the detailed roadmap. - ---- - ## References -### Project Documentation - -- **VSIM-INTEGRATION-PLAN.md** - Complete integration roadmap -- **CLAUDE.md** - Project guidelines and task tracking -- **README.md** - Main project documentation -- **tests/integration/** - Fast automated tests using nixosTest -- **tests/vms/** - Manual VM testing configurations - -### External Documentation - -- [NixOS Manual](https://nixos.org/manual/nixos/stable/) -- [libvirt Domain XML](https://libvirt.org/formatdomain.html) -- [Open vSwitch Documentation](https://docs.openvswitch.org/) -- [k3s Documentation](https://docs.k3s.io/) -- [QEMU ARM System Emulation](https://www.qemu.org/docs/master/system/target-arm.html) -- [Linux Traffic Control (tc)](https://man7.org/linux/man-pages/man8/tc.8.html) - -### Related Projects - -- [k3s-nix](https://github.com/rorosen/k3s-nix) - Reproducible k3s clusters in Nix -- [NixOS on ARM](https://nixos.wiki/wiki/NixOS_on_ARM) -- [jetpack-nixos](https://github.com/anduril/jetpack-nixos) - Jetson support for NixOS - ---- - -**Author**: Integrated from vsim project -**License**: MIT (same as n3x) -**Support**: See main project documentation +- [tests/README.md](../README.md) — Primary test infrastructure (nixosTest multi-node) +- [embedded-system.nix](embedded-system.nix) — Header comments describe full architecture and usage +- [docs/hyper-v-enlightened-vmcs-caps-nested-virt-at-2-levels.md](../../docs/hyper-v-enlightened-vmcs-caps-nested-virt-at-2-levels.md) — Why WSL2 doesn't work From 22f460b04f69dd131024f9bfa6e58ceabb2b92cb Mon Sep 17 00:00:00 2001 From: Tim Black Date: Fri, 27 Feb 2026 07:09:28 -0800 Subject: [PATCH 2/4] fix: use stable kas-container mount points for DL_DIR/SSTATE_DIR base.yml used ${HOME}/.cache/yocto/{downloads,sstate} in local_conf_header, but inside kas-container, kas overrides HOME to an ephemeral tmpdir (/tmp/tmpXXXXXX). BitBake expanded ${HOME} to that tmpdir, so all downloads were destroyed when the container exited. Neither devShell nor kas-build exported DL_DIR/SSTATE_DIR as host environment variables, so kas-container never mounted a persistent cache directory. Fix: - kas-build wrapper (both Darwin and Linux): export DL_DIR and SSTATE_DIR with defaults of ~/.cache/yocto/{downloads,sstate}, using ${VAR:-default} so CI can override - base.yml: hardcode /downloads and /sstate (the stable kas-container mount points that correspond to the host DL_DIR/SSTATE_DIR) - Delete ci-cache.yml overlay (existed solely to override the broken ${HOME} paths for CI; now redundant) - Remove ci-cache.yml from build-matrix.nix mkCiKasCommand - Remove .git-downloads workaround from isar-build-all.sh and CI workflow (DL_DIR now resolves to /downloads which is stable across container sessions, so .git-downloads symlinks no longer go stale) - Update CLAUDE.md: replace stale workaround docs with new caching docs References: siemens/kas#52, siemens/kas#148 --- .github/workflows/ci.yml | 21 +++++++++--------- CLAUDE.md | 26 +++++------------------ backends/debian/kas/base.yml | 21 +++++++++++++----- backends/debian/kas/opt/ci-cache.yml | 26 ----------------------- backends/debian/scripts/isar-build-all.sh | 3 --- flake.nix | 18 ++++++++++++++++ lib/debian/build-matrix.nix | 7 +++--- 7 files changed, 52 insertions(+), 70 deletions(-) delete mode 100644 backends/debian/kas/opt/ci-cache.yml diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index d24dfa6..6fe18f2 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -12,8 +12,8 @@ # ISAR builds use kas-container with --privileged (required for mmdebstrap's # unshare(2) mount namespace isolation). Build cache (downloads + sstate) is # persisted via actions/cache and forwarded into the container via kas-container's -# DL_DIR/SSTATE_DIR mount mechanism, with the ci-cache.yml kas overlay ensuring -# BitBake uses the container-mounted paths. +# DL_DIR/SSTATE_DIR environment variable mount mechanism. base.yml hardcodes +# /downloads and /sstate (the stable kas-container mount points). name: CI @@ -232,6 +232,9 @@ jobs: env: KAS_CONTAINER_ENGINE: docker KAS_CONTAINER_IMAGE: ghcr.io/siemens/kas/kas-isar:5.1 + # CRITICAL: kas-container mounts these host paths into the container at + # /downloads and /sstate. Without these, builds use ephemeral container + # storage and lose all cached downloads between runs. DL_DIR: ${{ github.workspace }}/.ci-cache/downloads SSTATE_DIR: ${{ github.workspace }}/.ci-cache/sstate MACHINE: ${{ matrix.machine }} @@ -240,8 +243,7 @@ jobs: set -euo pipefail # Get all variants for this machine with CI overlays applied. - # mkCiKasCommand appends ci-cache.yml always, native-build.yml - # when host arch matches target (replaces shell case statement). + # mkCiKasCommand appends native-build.yml when host arch matches target. HOST_ARCH=$(uname -m) VARIANTS=$(nix eval --json '.#lib.debian.buildMatrix' --apply " matrix: builtins.map (v: { @@ -260,9 +262,6 @@ jobs: echo "::group::[$((IDX + 1))/$VARIANT_COUNT] Building variant: $ID" echo "kas command: $FULL_CMD" - # Clean stale .git-downloads symlink (kas-container tmpdir changes between sessions) - rm -f build/tmp/work/debian-trixie-*/.git-downloads 2>/dev/null || true - kas-container --isar build "$FULL_CMD" echo "::endgroup::" done @@ -383,6 +382,9 @@ jobs: env: KAS_CONTAINER_ENGINE: docker KAS_CONTAINER_IMAGE: ghcr.io/siemens/kas/kas-isar:5.1 + # CRITICAL: kas-container mounts these host paths into the container at + # /downloads and /sstate. Without these, builds use ephemeral container + # storage and lose all cached downloads between runs. DL_DIR: ${{ github.workspace }}/.ci-cache/downloads SSTATE_DIR: ${{ github.workspace }}/.ci-cache/sstate VARIANTS: ${{ matrix.variants }} @@ -401,11 +403,8 @@ jobs: ") echo "kas command: $FULL_CMD" - # Clean stale .git-downloads symlink - cd backends/debian - rm -f build/tmp/work/debian-trixie-*/.git-downloads 2>/dev/null || true - # Build + cd backends/debian kas-container --isar build "$FULL_CMD" cd ../.. diff --git a/CLAUDE.md b/CLAUDE.md index 0eb7ef1..db1a074 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -251,28 +251,12 @@ nix develop -c bash -c "cd backends/debian && kas-container --isar cleansstate k nix develop -c bash -c "cd backends/debian && kas-container --isar cleanall kas/machine/.yml:..." ``` -**Stale `.git-downloads` symlink** (common issue): -- Each kas-container session creates a new tmpdir (`/tmp/tmpXXXXXX`); `.git-downloads` symlink in the build work dir points to the previous session's tmpdir -- **Fix**: Remove before EVERY new build after a container session change: - ```bash - rm -f backends/debian/build/tmp/work/debian-trixie-arm64/.git-downloads - rm -f backends/debian/build/tmp/work/debian-trixie-amd64/.git-downloads - ``` -- Integrate into build command: `rm -f backends/debian/build/tmp/work/debian-trixie-*/.git-downloads && nix develop -c bash -c "cd backends/debian && kas-build ..."` - -**Download cache collision** (multi-arch): -- k3s recipe uses `downloadfilename=k3s` for BOTH architectures — x86_64 and arm64 binaries share the same cache key -- If switching architectures (e.g., qemuamd64 → jetson-orin-nano), the cached `k3s` binary is the wrong architecture -- **Fix**: Delete the cached binary AND its fetch stamps: - ```bash - rm -f ~/.cache/yocto/downloads/k3s ~/.cache/yocto/downloads/k3s.done - rm -f backends/debian/build/tmp/stamps/debian-trixie-arm64/k3s-server/1.32.0-r0.do_fetch* - rm -f backends/debian/build/tmp/stamps/debian-trixie-arm64/k3s-agent/1.32.0-r0.do_fetch* - ``` -- TODO: Fix k3s recipe to use arch-specific `downloadfilename` (e.g., `k3s-arm64` or `k3s-amd64`) - ### ISAR Build Cache -- Shared cache: `DL_DIR="${HOME}/.cache/yocto/downloads"`, `SSTATE_DIR="${HOME}/.cache/yocto/sstate"` +- The kas-build wrapper (flake.nix) exports `DL_DIR` and `SSTATE_DIR` with defaults of `~/.cache/yocto/{downloads,sstate}` +- kas-container mounts these host paths at `/downloads` and `/sstate` inside the container +- base.yml hardcodes `DL_DIR = "/downloads"` and `SSTATE_DIR = "/sstate"` (the stable mount points) +- **WARNING**: Do NOT use `${HOME}` in kas `local_conf_header` — inside the container, kas overrides HOME to an ephemeral tmpdir (`/tmp/tmpXXXXXX`), so any path referencing `${HOME}` is destroyed on exit (siemens/kas#148) +- CI sets its own `DL_DIR`/`SSTATE_DIR` values which the wrapper respects via `${VAR:-default}` pattern ### ZFS Replication Limitations diff --git a/backends/debian/kas/base.yml b/backends/debian/kas/base.yml index c433f82..51bc408 100644 --- a/backends/debian/kas/base.yml +++ b/backends/debian/kas/base.yml @@ -25,11 +25,22 @@ target: isar-image-base # Common variables local_conf_header: shared-cache: | - # Shared user-level cache directories for all Yocto/ISAR projects - # This enables cache reuse across multiple projects/builds - # BitBake auto-creates these directories if they don't exist - DL_DIR = "${HOME}/.cache/yocto/downloads" - SSTATE_DIR = "${HOME}/.cache/yocto/sstate" + # Download and sstate cache paths inside the kas-container. + # + # kas-container mounts host DL_DIR at /downloads and SSTATE_DIR at /sstate + # when these environment variables are set on the host. The kas-build wrapper + # (flake.nix mkKasBuildWrapper) exports them with a default of + # ~/.cache/yocto/{downloads,sstate}, enabling cross-project cache sharing. + # + # WARNING: Do NOT use ${HOME} here. Inside the container, kas overrides HOME + # to an ephemeral tmpdir for build isolation (siemens/kas#148). Any path + # referencing ${HOME} resolves to /tmp/tmpXXXXXX and is destroyed on exit. + # + # If DL_DIR/SSTATE_DIR are not set on the host (e.g., running kas directly + # without the wrapper), these paths won't exist and BitBake will fall back + # to creating them inside the container (ephemeral). Always use the wrapper. + DL_DIR = "/downloads" + SSTATE_DIR = "/sstate" n3x-base: | # n3x common configuration diff --git a/backends/debian/kas/opt/ci-cache.yml b/backends/debian/kas/opt/ci-cache.yml deleted file mode 100644 index 09ee450..0000000 --- a/backends/debian/kas/opt/ci-cache.yml +++ /dev/null @@ -1,26 +0,0 @@ -# CI Build Cache Configuration -# -# Overrides DL_DIR and SSTATE_DIR to use kas-container's forwarded mount -# paths. When these environment variables are set on the host, kas-container -# mounts the host directories at /downloads and /sstate inside the container. -# This overlay ensures BitBake uses those paths instead of the defaults -# from base.yml (which reference ${HOME}/.cache/yocto/). -# -# This overlay must appear AFTER base.yml in the kas config chain so that -# its local_conf_header entries override base.yml's assignments. -# -# Usage (CI only): -# DL_DIR=/host/cache/downloads SSTATE_DIR=/host/cache/sstate \ -# kas-container --isar build kas/base.yml:...:kas/opt/ci-cache.yml - -header: - version: 14 - -local_conf_header: - # Key MUST sort alphabetically after base.yml's "shared-cache" key. - # kas merges local_conf_header entries by key name and writes them - # in sorted order. BitBake uses last-assignment-wins, so this key - # ("zzz-ci-cache") ensures our DL_DIR/SSTATE_DIR override takes effect. - zzz-ci-cache: | - DL_DIR = "/downloads" - SSTATE_DIR = "/sstate" diff --git a/backends/debian/scripts/isar-build-all.sh b/backends/debian/scripts/isar-build-all.sh index d6172f1..aadbb38 100755 --- a/backends/debian/scripts/isar-build-all.sh +++ b/backends/debian/scripts/isar-build-all.sh @@ -164,9 +164,6 @@ process_variant() { if ${dry_run}; then echo " [DRY-RUN] Would run: nix develop '.' -c bash -c \"cd backends/debian && kas-build ${full_kas_cmd}\"" else - # Clean stale .git-downloads symlink - rm -f backends/debian/build/tmp/work/debian-trixie-*/.git-downloads 2>/dev/null || true - nix develop '.' -c bash -c "cd backends/debian && kas-build ${full_kas_cmd}" fi fi diff --git a/flake.nix b/flake.nix index 5b9de81..a61b236 100644 --- a/flake.nix +++ b/flake.nix @@ -394,6 +394,15 @@ log_warn "Could not detect machine from kas config — skipping arch detection" fi + # Persistent download and sstate cache directories. + # kas-container reads these from the environment and mounts them into the + # container at /downloads and /sstate respectively. Without these exports, + # BitBake's DL_DIR inside the container resolves to kas's ephemeral tmpdir + # HOME (/tmp/tmpXXXXXX) and all downloads are lost between builds. + # See: siemens/kas#52, siemens/kas#148 + export DL_DIR="''${DL_DIR:-''${HOME}/.cache/yocto/downloads}" + export SSTATE_DIR="''${SSTATE_DIR:-''${HOME}/.cache/yocto/sstate}" + export KAS_CONTAINER_IMAGE="ghcr.io/siemens/kas/kas-isar:5.1" log_info "Starting kas-container build (engine: $KAS_CONTAINER_ENGINE)..." @@ -603,6 +612,15 @@ log_info "Config: $kas_config" echo + # Persistent download and sstate cache directories. + # kas-container reads these from the environment and mounts them into the + # container at /downloads and /sstate respectively. Without these exports, + # BitBake's DL_DIR inside the container resolves to kas's ephemeral tmpdir + # HOME (/tmp/tmpXXXXXX) and all downloads are lost between builds. + # See: siemens/kas#52, siemens/kas#148 + export DL_DIR="''${DL_DIR:-''${HOME}/.cache/yocto/downloads}" + export SSTATE_DIR="''${SSTATE_DIR:-''${HOME}/.cache/yocto/sstate}" + # ISAR commit 27651d51 (Sept 2024) requires bubblewrap for rootfs sandboxing # kas-isar:4.7 does NOT have bwrap; kas-isar:5.1+ does # Use KAS_CONTAINER_IMAGE to override the full image path (not KAS_CONTAINER_IMAGE_NAME) diff --git a/lib/debian/build-matrix.nix b/lib/debian/build-matrix.nix index 75013fb..8a1da2c 100644 --- a/lib/debian/build-matrix.nix +++ b/lib/debian/build-matrix.nix @@ -167,15 +167,14 @@ let # CI and release helpers # =========================================================================== - # CI-aware kas command: appends ci-cache.yml always, native-build.yml when - # the runner's host architecture matches the target machine's architecture. + # CI-aware kas command: appends native-build.yml when the runner's host + # architecture matches the target machine's architecture. mkCiKasCommand = { hostArch }: variant: let machineInfo = machines.${variant.machine}; baseCommand = mkKasCommand variant; isNative = hostArch == machineInfo.arch; - ciOverlays = [ "kas/opt/ci-cache.yml" ] - ++ lib.optional isNative "kas/opt/native-build.yml"; + ciOverlays = lib.optional isNative "kas/opt/native-build.yml"; in lib.concatStringsSep ":" ([ baseCommand ] ++ ciOverlays); From 6618a5c6d1293f6275e2d5edc6c55a89c91fda8d Mon Sep 17 00:00:00 2001 From: Tim Black Date: Fri, 27 Feb 2026 07:22:38 -0800 Subject: [PATCH 3/4] fix: point ISAR repo at kyosaku-kai fork with symlink fix MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Switch mirrors.yml from upstream ilbers/isar to our fork at kyosaku-kai/isar on the fix/do-adjust-git-dangling-symlinks branch. The fork contains a one-line fix: os.path.exists() → os.path.lexists() in dpkg-base.bbclass do_adjust_git(), fixing FileExistsError when DL_DIR mount paths change between kas-container sessions (dangling .git-downloads symlink not detected by exists() which follows symlinks). Revert to upstream once ilbers/isar merges the fix. --- backends/debian/kas/mirrors.yml | 32 +++++++++++--------------------- 1 file changed, 11 insertions(+), 21 deletions(-) diff --git a/backends/debian/kas/mirrors.yml b/backends/debian/kas/mirrors.yml index 407de7e..3c5e1a0 100644 --- a/backends/debian/kas/mirrors.yml +++ b/backends/debian/kas/mirrors.yml @@ -21,32 +21,22 @@ header: version: 14 -# ISAR framework repository +# ISAR framework repository (fork with do_adjust_git symlink fix) # -# IMPORTANT: This will eventually be a FORK, not a plain mirror. +# Fork of ilbers/isar with fix for dangling symlink handling in +# dpkg-base.bbclass do_adjust_git(). os.path.exists() follows symlinks +# and misses dangling ones, causing FileExistsError when DL_DIR mount +# paths change between kas-container sessions. # -# ISAR's dpkg-base.bbclass has a bug in do_adjust_git (line 30-34) where -# os.path.exists() is used to check a symlink, but this follows the symlink -# target. When the target doesn't exist (stale tmpdir from a previous -# kas-container session), the dangling symlink isn't cleaned up, and the -# subsequent os.symlink() call fails with FileExistsError. +# Fix: os.path.exists() → os.path.lexists() on lines 30 and 33. +# Upstream issue: https://github.com/ilbers/isar/issues/PENDING +# Upstream PR: https://github.com/ilbers/isar/pull/PENDING # -# Fix: change os.path.exists() to os.path.lexists() on line 30 and 33 of -# meta/classes-recipe/dpkg-base.bbclass in the ISAR source. -# -# Upstream file: https://github.com/ilbers/isar/blob/master/meta/classes-recipe/dpkg-base.bbclass -# Affected function: do_adjust_git() -- symlink creation for .git-downloads -# -# Current workaround: rm -f build/tmp/work/debian-trixie-*/.git-downloads -# before each kas-container build when switching between invocations. -# -# Tracking: When internal GitLab mirror is provisioned, create a fork with -# a patch branch containing the fix, point this URL at the fork, and submit -# the fix upstream to ilbers/isar. +# TODO: Revert to upstream ilbers/isar once fix is merged. repos: isar: - url: https://github.com/ilbers/isar.git - commit: 16b7b7e37b54be969453466e01ce4aa66e8ccb8e + url: https://github.com/kyosaku-kai/isar.git + commit: 62a2db1545a8d5dd811d3ad7caa3107fb51e256e layers: meta: meta-isar: From 46e2fef7241bbf3c220b9aabe381546dc8126a2f Mon Sep 17 00:00:00 2001 From: Tim Black Date: Fri, 27 Feb 2026 07:55:11 -0800 Subject: [PATCH 4/4] fix: add upstream issue/PR references for ISAR symlink fix Update mirrors.yml PENDING placeholders with actual upstream tracking: - Issue: https://github.com/ilbers/isar/issues/122 - PR: https://github.com/ilbers/isar/pull/123 --- backends/debian/kas/mirrors.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/backends/debian/kas/mirrors.yml b/backends/debian/kas/mirrors.yml index 3c5e1a0..a120da0 100644 --- a/backends/debian/kas/mirrors.yml +++ b/backends/debian/kas/mirrors.yml @@ -29,8 +29,8 @@ header: # paths change between kas-container sessions. # # Fix: os.path.exists() → os.path.lexists() on lines 30 and 33. -# Upstream issue: https://github.com/ilbers/isar/issues/PENDING -# Upstream PR: https://github.com/ilbers/isar/pull/PENDING +# Upstream issue: https://github.com/ilbers/isar/issues/122 +# Upstream PR: https://github.com/ilbers/isar/pull/123 # # TODO: Revert to upstream ilbers/isar once fix is merged. repos: