This document compiles the hardware specifications, coprocessor architectures, firmware deployment workflows, and pin multiplexing configurations for the BeagleBone AI-64 (BBAI64). It serves as a self-contained guide for future developers and LLM agents working on this platform to avoid redundant web searches.
The BeagleBone AI-64 is built around the Texas Instruments TDA4VM SoC (Jacinto 7 architecture). It features a heterogeneous processor system:
- Host CPU: Dual-core ARM Cortex-A72 (64-bit, running Linux).
- Deep Learning & DSPs:
- 1x C7x DSP with Matrix Multiply Accelerator (MMA).
- 2x C66x floating-point DSPs.
- Real-time Control & Networking (PRUs):
- 2x Industrial Communication Subsystems (ICSSG0 and ICSSG1).
- Each ICSSG contains 2 slices (PRU0 and PRU1), containing a total of 6 programmable cores (2x PRU, 2x RT_PRU, 2x TX_PRU).
- Main Domain R5F Cores: 2x dual-core Cortex-R5F clusters (running in lockstep or split mode) for safety and real-time operations.
On the TDA4VM SoC, the PRU (Programmable Real-Time Unit) cores are organized under two ICSSG instances. The Linux kernel exposes these cores using the remoteproc framework.
Below is the mapping from the Linux /sys/class/remoteproc/remoteprocX interfaces to the physical hardware cores and device tree handles:
| remoteproc ID | Hardware Core | Linux Device Tree Node | Kernel Driver Name / Path |
|---|---|---|---|
remoteproc0 |
ICSSG0 PRU0 (Slice 0 Coordinator) | &pru0_0 |
b034000.pru |
remoteproc1 |
ICSSG0 RT_PRU0 | &rtu0_0 |
b034000.rtu |
remoteproc2 |
ICSSG0 TX_PRU0 | &txpru0_0 |
b034000.txpru |
remoteproc3 |
ICSSG0 PRU1 (Slice 0 Worker) | &pru0_1 |
b038000.pru |
remoteproc4 |
ICSSG0 RT_PRU1 | &rtu0_1 |
b038000.rtu |
remoteproc5 |
ICSSG0 TX_PRU1 | &txpru0_1 |
b038000.txpru |
remoteproc6 |
ICSSG1 PRU0 (Slice 1 Coordinator) | &pru1_0 |
b134000.pru |
remoteproc7 |
ICSSG1 RT_PRU0 | &rtu1_0 |
b134000.rtu |
remoteproc8 |
ICSSG1 TX_PRU0 | &txpru1_0 |
b134000.txpru |
remoteproc9 |
ICSSG1 PRU1 (Slice 1 Worker) | &pru1_1 |
b138000.pru |
remoteproc10 |
ICSSG1 RT_PRU1 | &rtu1_1 |
b138000.rtu |
remoteproc11 |
ICSSG1 TX_PRU1 | &txpru1_1 |
b138000.txpru |
To interact with the PRU cores from user-space, write commands to the sysfs interface under /sys/class/remoteproc/remoteprocX/.
- Copy Firmware: Copy compiled ELF binaries to the target system's firmware directory:
cp my-fw /lib/firmware/
- Assign Firmware to Core:
echo "my-fw" > /sys/class/remoteproc/remoteprocX/firmware
- Control Core State:
- Start:
echo start > /sys/class/remoteproc/remoteprocX/state - Stop:
echo stop > /sys/class/remoteproc/remoteprocX/state - Check Status:
cat /sys/class/remoteproc/remoteprocX/state(returnsrunningoroffline)
- Start:
The BBAI64 includes two expansion headers: P8 (46 pins) and P9 (50 pins).
Caution
- 3.3V Logic Level Max: All expansion pins operate at 3.3V. Connecting any 5V logic signals will cause irreversible electrical damage to the SoC and void the board's warranty.
- Reset Timing: No pins may be driven until after the
SYS_RESETline goes high during the boot sequence. Do not apply voltage to I/O pins when the board is unpowered. - Shorted/Double Pins: On some cape header pins, multiple SoC pins are shorted together on the board layout. Only one signal in a shorted group should be multiplexed/active at a time.
Under Linux, you can control raw GPIO pins using the gpiod toolset. Pins are designated by a gpiochip index and a relative line number.
For example, to toggle P8.03 (SoC Ball AH21 mapped to GPIO chip 1, line 20 / GPIO0_20):
# Drive High (3.3V)
gpioset 1 20=1
# Drive Low (0V)
gpioset 1 20=0To access the physical pins from the PRUs, the pin control registers must be set to Mode 0 (pruout or pruin). Pin register values can be checked under debugfs:
/sys/kernel/debug/pinctrl/11c000.pinctrl-pinctrl-single/pins
The table below lists key expansion pins mapped to PRU General Purpose Outputs (__R30):
| Header Pin | SoC Ball | Control Register Offset | Mode 0 Function (PRU GPO Mapping) | Linux GPIO |
|---|---|---|---|---|
| P8.11 | AB24 |
0x00011C0F4 |
PRG0_PRU0_GPO17 (ICSSG0 PRU0 Bit 17) |
GPIO0_60 (Mode 7) |
| P8.41 | AD29 |
0x00011C110 |
PRG0_PRU1_GPO4 (ICSSG0 PRU1 Bit 4) |
GPIO0_67 (Mode 7) |
| P8.42 | AB27 |
0x00011C114 |
PRG0_PRU1_GPO5 (ICSSG0 PRU1 Bit 5) |
GPIO0_68 (Mode 7) |
| P8.10 | AC24 |
0x00011C040 |
PRG1_PRU0_GPO15 (ICSSG1 PRU0 Bit 15) |
GPIO0_16 (Mode 7) |
| P9.11 | AC23 |
0x00011C004 |
PRG1_PRU0_GPO0 (ICSSG1 PRU0 Bit 0) |
GPIO0_1 (Mode 7) |
| P9.12 | AE27 |
0x00011C0B8 |
PRG0_PRU0_GPO2 (ICSSG0 PRU0 Bit 2) |
GPIO0_45 (Mode 7) |
| P9.13 | AG22 |
0x00011C008 |
PRG1_PRU0_GPO1 (ICSSG1 PRU0 Bit 1) |
GPIO0_2 (Mode 7) |
| P9.15 | AD25 |
0x00011C0C0 |
PRG0_PRU0_GPO4 (ICSSG0 PRU0 Bit 4) |
GPIO0_47 (Mode 7) |
See https://docs.beagleboard.org/boards/beaglebone/ai-64/04-expansion.html
Unlike older BeagleBones, the BBAI64 does not support dynamic pinmux settings via the config-pin utility. Changes must be defined in a Device Tree Overlay source (.dts), compiled to a .dtbo, and loaded at boot time.
To reserve and configure pins, target the pin handle node in the overlay:
/dts-v1/;
/plugin/;
// Disable default kernel-level device bindings for the pin
&bone_led_P8_11 {
status = "disabled";
};
// Route pins to the PRU subsystem
&pru0_0 {
pinctrl-names = "default";
pinctrl-0 = <&P8_11_pruout_pin>;
};
dtc -@ -I dts -O dtb -o my_overlay.dtbo my_overlay.dtsThe overlays are parsed and loaded by U-Boot at boot time using the extlinux.conf file (located at /boot/firmware/extlinux/extlinux.conf or /boot/extlinux/extlinux.conf).
To enable:
- Copy the compiled
.dtboto/boot/firmware/overlays/(or/boot/overlays/). - Append the overlay path to the
fdtoverlaysentry under the active boot label:label Linux eMMC kernel /Image initrd /initrd.img fdt /k3-j721e-common-proc-board.dtb fdtoverlays /overlays/my_overlay.dtbo - Reboot the board to apply changes.
Older BeagleBone AI-64 factory images (released around January 2022) ship with an outdated U-Boot bootloader that ignores the fdtoverlays block in extlinux.conf.
If pinmux checks fail (register is not set to Mode 0 after rebooting), connect via SSH and upgrade the bootloader:
# Update partition bootloader
sudo /opt/u-boot/bb-u-boot-beagleboneai64/install.sh
# Update eMMC bootloader
sudo /opt/u-boot/bb-u-boot-beagleboneai64/install-emmc.sh
# Reboot to apply
sudo rebootEach ICSSG subsystem contains 64KB of Shared RAM starting at local offset 0x10000. This shared memory allows two PRU cores in the same cluster to communicate with zero-latency overhead.
ICSSG Subsystem (e.g. ICSSG0)
┌────────────────────────────────────────────────────────┐
│ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ PRU0 Core │ │ PRU1 Core │ │
│ │ (Coordinator) │ │ (Worker) │ │
│ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │
│ └─────────► ┌──────────┐ ◄─────────┘ │
│ │Shared RAM│ │
│ │ (64 KB) │ │
│ └──────────┘ │
└────────────────────────────────────────────────────────┘
To achieve simultaneous pin toggling across both cores:
- Define a shared volatile flag struct pointing to the Shared RAM address:
#define SHARED_RAM_ADDRESS 0x10000 volatile uint32_t *start_flag = (volatile uint32_t *)SHARED_RAM_ADDRESS;
- Boot Order Execution:
- Start the Worker core (PRU1) first. It resets
start_flag = 0and enters a busy-wait spin lock:*start_flag = 0; while (*start_flag == 0) { // Wait for coordinator to set flag }
- Start the Coordinator core (PRU0) second. Once initialized, it raises the flag:
*start_flag = 1;
- Start the Worker core (PRU1) first. It resets
When the coordinator releases the flag, the worker core requires a few cycles to break out of the spin loop, load instructions, and perform its first pin write.
- Without Compensation: The worker pin toggling lags behind the coordinator pin toggling by several cycles.
- With Compensation: Add a hardcoded compiler-level delay in the coordinator core immediately after setting the flag to align both writes:
*start_flag = 1; __delay_cycles(7); // Exact skew offset to align both execution paths
To prevent timing jitter and cycle variations in high-speed protocols (e.g. Polar Modulation, RF generation), the execution paths of both cores must contain a constant instruction count.
- Avoid Branching: Conditional logic (
if/else) compiles to branches, which can take a variable number of cycles depending on the branch outcome. - Branchless Pin Assignments: Use logical mask operations to set pin values instead of conditional branches.
- Bad (Variable cycle count):
if (val) { __R30 |= (1 << 17); } else { __R30 &= ~(1 << 17); }
- Good (Constant cycle count):
__R30 = (__R30 & ~(1 << 17)) | (val << 17);
- Bad (Variable cycle count):
- Empty Resource Table Requirement: The Linux
remoteprocdriver mandates that any loaded ELF binary contain a.resource_tablesection, even if empty. If missing, the kernel will refuse to load the firmware:#include <rsc_types.h> struct my_resource_table { struct resource_table base; uint32_t offset[1]; }; #pragma DATA_SECTION(pru_remoteproc_ResourceTable, ".resource_table") #pragma RETAIN(pru_remoteproc_ResourceTable) struct my_resource_table pru_remoteproc_ResourceTable = { { 1, 0, { 0, 0 } }, { 0 } };
- Compiler: TI's proprietary PRU Compiler (
clpru) and linker (lnkpru). - Toolchain Library:
rtspruv3_le.lib(runtime support library). - Architecture constraints: The
clprucompiler is a 32-bit ARM binary (compiled for 32-bit ARM Linux/hosts). - Docker/CI Build Environment:
- Running compilation in Docker containers on x86-64 host architectures requires
qemu-user-staticconfiguration. - In GitHub Action pipelines, do not restrict the QEMU platform filter strictly to
linux/arm64. Registering all architectures (specificallylinux/arm/v7orarm) is necessary to execute the 32-bitclprucompiler binaries inside the container.
- Running compilation in Docker containers on x86-64 host architectures requires
When building complex cooperative real-time applications involving multiple cores (PRU and RTU) and dynamic host interface systems (RPMsg), pay attention to the following architecture-specific details:
- Interrupt-to-Bit Mapping: For RTU (and TX_PRU) cores, host interrupts 10-19 map to bits 30-39 of register
__R31(whereas standard PRU cores map host interrupts 0 and 1 to bits 30 and 31). If you are listening for kicks on RTU0 using Host-10, check bit 30:#define HOST_INT ((uint32_t) 1 << 30) // Host-10 on RTU if (__R31 & HOST_INT) { ... }
- System Event Assignments: In the TI kernel, RTU0 uses system event 20 (
TO_ARM_HOST) and event 21 (FROM_ARM_HOST) for VirtIO vring mailboxes.
The RemoteProc driver parses the .pru_irq_map section from the ELF file header to configure the interrupt controller (INTC), but it must not attempt to load this mapping table into the PRU's physical data memory.
- The Failure: If defined as a loadable segment in the linker script, RemoteProc will fail to boot the core, emitting kernel errors:
remoteproc remoteproc1: PRU memory copy failed for da 0xXXXX memsz 0xYYremoteproc remoteproc1: Failed to load program segments: -22 - The Fix: Declare the segment with the
(COPY)attribute in the linker command script (.cmd) to mark the segment as non-loadable (PT_NULL):.pru_irq_map (COPY) : { *(.pru_irq_map) }
The TI clpru compiler enforces C89 constraints by default. Programmers accustomed to modern C/C++ standards must adjust:
- Scope Declarations: You cannot declare variables inline inside loops (e.g.
for(int i = 0; ...)) or midway through blocks. All variables must be declared at the beginning of the block scope. - GCC Inline Assembly Constraint Limitations: GCC-style inline assembly operand constraints (like
: "=r"(val)) are not supported byclpru. Use only simple single-string__asm("...")blocks.
Each PRU/RTU data memory region is extremely small (typically 2 KB to 4 KB per slice partition).
- Avoid
sprintf/atoi/sscanf: Calling standard library parsing and formatting functions imports extensive helper code fromrtspruv3_le.lib, inflating the firmware binary size and consuming massive stack space, which can easily overflow the 2 KB DMEM stack limit. - The Alternative: Write custom, lightweight ASCII-to-integer parsers and integer-to-string formatters directly in your source code.
-
RemoteProc IOMMU Constraint: On AM65x/TDA4VM, the PRU subsystem does not have an IOMMU. Attempting to define mappings in the resource table using
TYPE_DEVMEMwill fail during firmware loading, emitting kernel error:remoteproc remoteproc0: Failed to process resources: -22. -
Manual RAT Configuration: Instead of resource table entries, program the hardware RAT (Region Address Translator) registers directly from the PRU C code. The RAT registers are located at local address
0x00008000(mapped to Constant Register 22 inJ721E_PRU0.cmd). We must configure them using the C compiler'scregistertable features to generateSBCOinstructions (as a standard memory pointerSBBOaccess to local address0x8000is not mapped by the PRU core's memory interface). The regions begin at offset0x20inside the RAT register slice. For example, to map local0x60000000to system physical0x03000000(1 MB):typedef struct { volatile uint32_t CTRL; volatile uint32_t BASE; volatile uint32_t TRANS_L; volatile uint32_t TRANS_H; } rat_region; typedef struct { volatile uint32_t PID; volatile uint32_t CONFIG; uint32_t rsvd8[6]; /* Offset 0x08 to 0x1f */ volatile rat_region REGION[16]; } my_rat; volatile __far my_rat CT_RAT __attribute__ ((cregister("PRU_RTU_RAT0", far), peripheral)); /* Map region 1 */ CT_RAT.REGION[1].BASE = 0x60000000; CT_RAT.REGION[1].TRANS_L = 0x03000000; CT_RAT.REGION[1].TRANS_H = 0; CT_RAT.REGION[1].CTRL = (1U << 31) | 19; /* Enable, 1 MB size */
-
Peripheral Clock Gating Aborts: System peripherals (like EHRPWM) are clock-gated by default. If the PRU attempts to write to a gated register range, it triggers a bus abort exception that freezes the PRU core. To avoid this, set the peripheral's device tree status to
okay(which binds it to the Linux driver), and export/enable at least one channel in Linux user-space (e.g. via sysfspwmchip) before starting the PRU core to force the System Co-processor to keep the clock active. -
EHRPWM Clocking & Time-Base (TBCLK) Frequency:
- The EHRPWM modules on the TDA4VM SoC have their functional clock (
fck) provided by the K3 clock controller (device ID83forEHRPWM0as defined inarch/arm64/boot/dts/ti/k3-j721e-main.dtsi). - By default, the system controller sets this clock rate to 125 MHz (not 100 MHz as often assumed in generic TI eHRPWM documentation).
- This actual clock rate can be verified under Linux at
/sys/kernel/debug/clk/clk_summaryby inspecting the entry forclk:83:0(which corresponds tok3_clks 83 0in the device tree). - When writing custom PRU/RTU firmware to directly configure the eHRPWM registers, you must use
125000000(125 MHz) as the base frequency to compute theTBPRDperiod register values correctly:$$\text{TBPRD} = \frac{125,000,000}{f_{\text{target}} \times \text{divider}} - 1$$ Using 100 MHz in calculations will result in all output frequencies being scaled up by a factor of 1.25x (e.g., requesting 10 MHz will yield a physical output of 12.5 MHz). The maximum achievable frequency at a 50% duty cycle withTBPRD = 1is$\frac{125\text{ MHz}}{2} = 62.5\text{ MHz}$ .
- The EHRPWM modules on the TDA4VM SoC have their functional clock (
- The Problem: Appending multiple device tree overlays to the
fdtoverlaysline inextlinux.confover time can cause pinmux and remoteproc conflicts, leading to boot failures where the board responds to ping but SSH and USB (keyboard) are disabled because the system fails during driver probing. - The Solution: Ensure your overlay enablement scripts clean up any previously registered overlays of the same tutorial/category before adding the new one.