Skip to content

Conversation

fernicmarotta
Copy link

orbtop-rtos Technical Documentation

Real-time RTOS Thread Profiling via ITM/DWT for ARM Cortex-M

Overview

orbtop-rtos is a real-time thread profiling tool that monitors RTOS thread execution on ARM Cortex-M targets using ITM (Instrumentation Trace Macrocell) and DWT (Data Watchpoint and Trace) hardware debugging features. It provides live CPU usage statistics without modifying the target firmware.

Architecture

graph TD
    subgraph "Target MCU (STM32H7)"
        RTOS[RTOS Kernel<br/>RTX5]
        TCB[osRtxInfo.thread.run.curr]
        DWT[DWT Comparator 1]
        ITM[ITM]
        TPIU[TPIU/SWO]
        
        RTOS -->|writes| TCB
        TCB -->|monitored by| DWT
        DWT -->|HW event| ITM
        ITM -->|ITM packets| TPIU
    end
    
    subgraph "Debug Probe"
        STLINK[ST-Link<br/>or J-Link]
    end
    
    subgraph "Host PC"
        OpenOCD[OpenOCD<br/>telnet:4444<br/>ITM/SWO:46000]
        OrbtopRTOS[orbtop-rtos]
        
        subgraph "Output Formats"
            Console[Console Output]
            JSON[JSON File/UDP]
            FTrace[FTrace Text File]
        end
    end
    
    TPIU -->|SWO pin| STLINK
    STLINK -->|USB| OpenOCD
    OpenOCD -->|"tcp:46000<br/>(ITM stream)"| OrbtopRTOS
    OrbtopRTOS <-->|"telnet:4444<br/>(memory reads)"| OpenOCD
    
    OrbtopRTOS --> Console
    OrbtopRTOS --> JSON
    OrbtopRTOS --> FTrace
Loading

How It Works

sequenceDiagram
    participant App as orbtop-rtos
    participant Telnet as OpenOCD Telnet
    participant Kernel as RTOS Kernel
    participant TCB as osRtxInfo.thread.run.curr
    participant DWT as DWT Hardware
    participant ITM as ITM Stream
    
    App->>Telnet: Find osRtxInfo symbol
    Telnet-->>App: Address 0x20001234
    
    App->>Telnet: rtos_dwt_config 0x20001248
    Note over Telnet,DWT: Configure DWT_COMP1 to watch<br/>address 0x20001248<br/>(osRtxInfo.thread.run.curr)
    
    Note over ITM: ITM constantly generates<br/>timestamp packets that<br/>App accumulates
    
    loop Thread Context Switch
        Note over Kernel: Context switch occurs
        Kernel->>TCB: Write new TCB address<br/>to monitored location
        TCB->>DWT: Memory write detected
        DWT->>ITM: Generate HW event (comp match)
        Note over ITM: HW packet includes:<br/>- Comparator number<br/>- Data value (new TCB addr)<br/>- TIMESTAMP in packet!
        ITM->>App: HW packet WITH timestamp
        App->>App: Use accumulated timestamp<br/>from ITM stream
        App->>Telnet: Read TCB at new address
        Telnet-->>App: Thread name, priority, func
        App->>App: Update thread statistics<br/>with timestamp
    end
    
    loop Every interval (1000ms)
        App->>App: Calculate CPU percentages
        App->>Console: Display thread statistics
    end
Loading

Key Components

1. ITM Configuration Requirements

The target must be configured with specific ITM settings:

Register Setting Purpose
ITM_TCR.TSENA 1 Enable timestamps for accurate timing
ITM_TCR.DWTENA 1 Route DWT events through ITM
ITM_TCR.SYNCENA 1 Generate SYNC packets for stream sync
ITM_TER 0x80000000 Enable ITM port 31 for HW events
DWT_CTRL.CYCCNTENA 1 Enable cycle counter for timestamps

2. DWT Configuration (via OpenOCD Telnet)

The rtos_dwt_config function in stm32h74x.cfg configures DWT Comparator 1:

proc rtos_dwt_config {address} {
    # Enable trace subsystem
    mmw 0xE000EDFC 0x01000000 0  # DEMCR.TRCENA = 1
    
    # Unlock ITM and DWT
    mww 0xE0000FB0 0xC5ACCE55     # ITM_LAR unlock
    mww 0xE0001FB0 0xC5ACCE55     # DWT_LAR unlock
    
    # Configure DWT Comparator 1
    mww 0xE0001030 $address       # DWT_COMP1 = watch address
    mww 0xE0001034 0x00000000     # DWT_MASK1 = no masking  
    mww 0xE0001038 0x00000814     # DWT_FUNC1 = data write, 4 bytes
    
    # Enable cycle counter and SYNC
    mww 0xE0001000 0x40000001     # DWT_CTRL.CYCCNTENA | SYNCTAP
    
    # Enable DWT events, timestamps, SYNC in ITM
    mmw 0xE0000E80 0x01010001 0   # ITM_TCR settings
}

3. RTX5 Thread Control Block Structure

The tool reads RTX5 TCB structures from target memory:

Offset (bytes) Field Description Usage
0x00 id Object ID (0xF1 = thread) Validate TCB
0x04 name Pointer to thread name Display name
0x20 priority Thread priority (0-56) Priority display
0x3C thread_addr Entry function address Function name lookup

Important Note on Priority Changes:
Thread priority is only read when a new TCB is first detected (cache miss). If your RTOS supports dynamic priority changes at runtime (priority inheritance, priority ceiling, manual changes), these updates will NOT be reflected in the display. The tool shows the priority at the time the thread was first discovered. To see updated priorities, you would need to restart the monitoring session.

4. Memory Reading via Telnet with Caching

The tool uses OpenOCD's telnet interface with an intelligent cache system:

// From telnet_client.c - Cache structure using uthash
struct memCache {
    uint32_t addr;           // Key: memory address
    uint32_t value;          // Cached value
    uint64_t timestamp;      // When cached
    UT_hash_handle hh;       // Hash table handle
};

uint32_t telnet_read_memory_word(uint32_t address) {
    // First check cache
    struct memCache *cached;
    HASH_FIND_INT(_memCache, &address, cached);
    if (cached) {
        return cached->value;  // Cache hit!
    }
    
    // Cache miss - read from target
    snprintf(cmd, "mdw 0x%08x 1\n", address);
    send(_telnetSocket, cmd, strlen(cmd), 0);
    
    // Parse response and cache it
    if (found) {
        cached = malloc(sizeof(struct memCache));
        cached->addr = address;
        cached->value = value;
        cached->timestamp = genericsTimestampuS();
        HASH_ADD_INT(_memCache, addr, cached);
    }
    return value;
}

// Cache invalidation when thread switches
void telnet_clear_cache_for_tcb(uint32_t tcb_addr) {
    // Clear all cached entries for this TCB (256 byte range)
    HASH_ITER(hh, _memCache, cached, tmp) {
        if (cached->addr >= tcb_addr && 
            cached->addr < (tcb_addr + 256)) {
            HASH_DEL(_memCache, cached);
            free(cached);
        }
    }
}

This caching is critical because reading TCB fields (name, priority, function) for each thread switch would otherwise require 3+ telnet round-trips per switch.

Note: Cache is cleared ONLY when a NEW TCB is detected (not on every switch to an existing thread). This means thread properties are read once and cached indefinitely.

Output Formats

When exceptions are enabled with -E option, additional exception statistics are displayed:

=== Exception Statistics ===
|-------------------|----------|-------|-------------|-------|------------|------------|------------|------------|
| Exception         |   Count  | MaxD  | TotalTicks  |   %   |  AveTicks  |  minTicks  |  maxTicks  |  maxWall   |
|-------------------|----------|-------|-------------|-------|------------|------------|------------|------------|
| 15 (SysTick)      |     1000 |     1 |    1234567  |  2.5  |       1234 |       1000 |       2000 |       2500 |
| 37 (IRQ 21)       |      500 |     2 |     567890  |  1.2  |       1135 |        900 |       1500 |       1800 |
| 53 (IRQ 37)       |      250 |     1 |     234567  |  0.5  |        938 |        800 |       1200 |       1400 |
|-------------------|----------|-------|-------------|-------|------------|------------|------------|------------|

Output Formats

Console Output (Default)

=== RTOS Thread Statistics (RTX5) ===
|----------------|------------|----------------|------------------|----------|-------|-------|----------|
| Thread Name    | Address    | Function       | Priority         | Time(ms) | CPU%  | Max%  | Switches |
|----------------|------------|----------------|------------------|----------|-------|-------|----------|
| main           | 0x20001234 | main_thread    | osPriorityNormal |      451 | 45.123| 48.567|     1234 |
| sensor_task    | 0x20001456 | sensor_loop    | osPriorityHigh   |      234 | 23.456| 25.890|      567 |
| network        | 0x20001678 | net_handler    | osPriorityNormal |      101 | 10.123| 12.345|      890 |
|----------------|------------|----------------|------------------|----------|-------|-------|----------|
| idle           | 0x20001000 | os_idle        | osPriorityIdle   |      214 | 21.400| 22.100|     2345 |
|----------------|------------|----------------|------------------|----------|-------|-------|----------|
Interval: 1000 ms, CPU Usage: 78.600%,  Max: 82.345%, CPU Freq: 480000000Hz

Features:

  • Dynamic column widths: Adjusts to longest values
  • Thread details: Name, TCB address, entry function, priority name
  • Timing metrics: Time in ms, CPU%, Max CPU%, context switches
  • Idle thread separation: Idle thread shown below separator line
  • CPU calculation: Active CPU% (excluding idle), with max tracking
  • Warnings: Shows [ITM OVERFLOW DETECTED!] or timing warnings

JSON Output Modes

RTOS Threads Output (-j output.json)
{
  "threads": [
    {
      "tcb": "0x20001234",
      "name": "main",
      "func": "main_thread",
      "prio": 24,
      "time_ms": 451,
      "cpu": 45.123,
      "max": 48.567,
      "switches": 1234
    },
    {
      "tcb": "0x20001456",
      "name": "sensor_task",
      "func": "sensor_loop",
      "prio": 40,
      "time_ms": 234,
      "cpu": 23.456,
      "max": 25.890,
      "switches": 567
    },
    {
      "tcb": "0x20001000",
      "name": "idle",
      "func": "os_idle",
      "prio": 1,
      "time_ms": 214,
      "cpu": 21.400,
      "max": 22.100,
      "switches": 2345
    }
  ],
  "interval_ms": 1000,
  "cpu_usage": 78.600,
  "cpu_max": 82.345,
  "cpu_freq": 480000000,
  "overflow": false
}
Exceptions Output (when using -E)
{
  "exceptions": [
    {
      "num": 15,
      "name": "SysTick",
      "count": 1000,
      "maxd": 1,
      "total": 1234567,
      "pct": 2.5,
      "ave": 1234,
      "min": 1000,
      "max": 2000,
      "maxwall": 2500
    },
    {
      "num": 37,
      "name": "IRQ 21",
      "count": 500,
      "maxd": 2,
      "total": 567890,
      "pct": 1.2,
      "ave": 1135,
      "min": 900,
      "max": 1500,
      "maxwall": 1800
    },
    {
      "num": 53,
      "name": "IRQ 37",
      "count": 250,
      "maxd": 1,
      "total": 234567,
      "pct": 0.5,
      "ave": 938,
      "min": 800,
      "max": 1200,
      "maxwall": 1400
    }
  ],
  "timestamp": 1699123456789
}
UDP Streaming (-j udp:46006)
  • Sends JSON packets to localhost:46006
  • No console output when using UDP mode
  • One JSON object per line (newline delimited)
  • Receive with: nc -lu 46006

Example UDP stream received:

$ nc -lu 46006
{"threads":[{"tcb":"0x20001234","name":"main","func":"main_thread","prio":24,"time_ms":451,"cpu":45.123,"max":48.567,"switches":1234},{"tcb":"0x20001456","name":"sensor_task","func":"sensor_loop","prio":40,"time_ms":234,"cpu":23.456,"max":25.890,"switches":567},{"tcb":"0x20001678","name":"network","func":"net_handler","prio":24,"time_ms":101,"cpu":10.123,"max":12.345,"switches":890},{"tcb":"0x20001000","name":"idle","func":"os_idle","prio":1,"time_ms":214,"cpu":21.400,"max":22.100,"switches":2345}],"interval_ms":1000,"cpu_usage":78.600,"cpu_max":82.345,"cpu_freq":480000000,"overflow":false}
{"threads":[{"tcb":"0x20001234","name":"main","func":"main_thread","prio":24,"time_ms":502,"cpu":50.234,"max":50.234,"switches":1245},{"tcb":"0x20001456","name":"sensor_task","func":"sensor_loop","prio":40,"time_ms":198,"cpu":19.823,"max":25.890,"switches":578},{"tcb":"0x20001678","name":"network","func":"net_handler","prio":24,"time_ms":95,"cpu":9.500,"max":12.345,"switches":901},{"tcb":"0x20001000","name":"idle","func":"os_idle","prio":1,"time_ms":205,"cpu":20.500,"max":22.100,"switches":2389}],"interval_ms":1000,"cpu_usage":79.557,"cpu_max":82.345,"cpu_freq":480000000,"overflow":false}

When exceptions are enabled (-E), separate exception packets are also sent:

{"ex":1,"num":15,"name":"SysTick","count":1000,"maxd":1,"total":1234567,"pct":2.5,"ave":1234,"min":1000,"max":2000,"maxwall":2500}
{"ex":1,"num":37,"name":"IRQ 21","count":500,"maxd":2,"total":567890,"pct":1.2,"ave":1135,"min":900,"max":1500,"maxwall":1800}
{"ex":1,"num":53,"name":"IRQ 37","count":250,"maxd":1,"total":234567,"pct":0.5,"ave":938,"min":800,"max":1200,"maxwall":1400}

Each packet arrives as a complete JSON object on a single line, making it easy to parse in real-time.

FTrace Output (--ftrace trace.txt)

Generates Linux kernel ftrace text format for analysis with Eclipse TraceCompass:

# tracer: nop
#
# entries-in-buffer/entries-written: 0/0   #P:1
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| /     delay
#           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
#              | |         |   ||||      |         |
main|main_thread-536871234 [000] ....     0.001234: sched_switch: prev_comm=idle|os_idle prev_pid=536871000 prev_prio=1 prev_state=S ==> next_comm=main|main_thread next_pid=536871234 next_prio=24
sensor|sensor_loop-536871456 [000] ....     0.002456: sched_switch: prev_comm=main|main_thread prev_pid=536871234 prev_prio=24 prev_state=S ==> next_comm=sensor|sensor_loop next_pid=536871456 next_prio=40

Key features:

  • Text format: Plain text ftrace format, not binary
  • Real-time capture: Thread switches logged as they happen via DWT events
  • Thread naming: Format is thread_name|entry_function for easy identification
  • PID mapping: TCB address used as PID for unique identification
  • Timestamp accuracy: Microsecond precision from ITM timestamps

Visualization with Eclipse TraceCompass

  1. Install TraceCompass: Download from https://www.eclipse.org/tracecompass/
  2. Install ftrace plugin:
  3. Import trace:
    • File → Open Trace
    • Select your trace.txt file
    • Choose "ftrace" as trace type
  4. Analyze:
    • View thread scheduling timeline
    • Identify priority inversions
    • Measure context switch latencies

Usage:

# Capture to ftrace text file
orbtop-rtos ... --ftrace trace.txt

# Open with TraceCompass
# File -> Open Trace -> select trace.txt -> Type: ftrace

The FTrace output shows actual thread context switches in real-time, making it ideal for analyzing scheduling behavior, finding priority inversions, and understanding system timing.

Usage Examples

Prerequisites: OpenOCD Configuration

Use the provided stm32h74x.cfg from the project. Key parts for ITM/DWT configuration:

# From stm32h74x.cfg - ITM configuration in examine-end event
$_CHIPNAME.cpu0 configure -event examine-end {
    # Enable clock for tracing
    # DBGMCU_CR |= TRACECLKEN
    stm32h7x_dbgmcu_mmw 0x004 0x00100000 0

    # Configure ITM with SYNC packets enabled
    mww 0xE0000E80 0x0001000F   ;# TCR: enable ITM with TraceBusID=1, SYNCENA=1, TSENA=1
    mww 0xE0000E00 0x00000001   ;# TER: enable ITM channel 0
}

# DWT registers for RTOS monitoring (defined in file)
set DWT_CTRL    0xE0001000
set DWT_CYCCNT  0xE0001004
set DWT_COMP1   0xE0001030
set DWT_MASK1   0xE0001034
set DWT_FUNC1   0xE0001038

# DWT configuration function for RTOS
proc rtos_dwt_config {address} {
    # Enable trace in DEMCR
    mmw 0xE000EDFC 0x01000000 0  # DEMCR.TRCENA = 1
    
    # Unlock ITM and DWT
    mww 0xE0000FB0 0xC5ACCE55     # ITM_LAR unlock
    mww 0xE0001FB0 0xC5ACCE55     # DWT_LAR unlock
    
    # Reset and enable cycle counter
    mww 0xE0001004 0              # DWT_CYCCNT = 0
    mmw 0xE0001000 0x40000001 0   # DWT_CTRL.CYCCNTENA | SYNCTAP
    
    # Enable DWT events, timestamps, SYNC in ITM
    mmw 0xE0000E80 0x01010001 0   # ITM_TCR settings
    
    # Configure DWT Comparator 1 for data write tracking
    mww 0xE0001030 $address       # DWT_COMP1 = watch address
    mww 0xE0001034 0x00000000     # DWT_MASK1 = no masking
    mww 0xE0001038 0x00000814     # DWT_FUNC1 = data write, 4 bytes
}

# Exception trace functions
proc exception_trace_enable {} {
    echo "DWT: enabling exception trace"
    mmw 0xE000EDFC 0x01000000 0  # DEMCR.TRCENA
    mmw 0xE0001000 0x00001000 0  # DWT_CTRL.EXCTRCENA
}

Start OpenOCD with the provided cfg file:

openocd -f openocd/stm32h74x.cfg

THAT'S IT! The cfg file automatically does EVERYTHING:

  • Configures ITM with timestamps and SYNC packets
  • Enables trace clocks
  • Creates SWO object and configures it
  • Outputs ITM stream on TCP port 46000

From the cfg file:

# Line 212-213: ITM auto-configured in examine-end event
mww 0xE0000E80 0x0001000F   ;# TCR: enable ITM with TraceBusID=1, SYNCENA=1, TSENA=1

# Line 335-336: SWO auto-configured and enabled
$_CHIPNAME.swo configure -protocol uart -traceclk 480000000 -pin-freq 2000000 -formatter on -output :46000
$_CHIPNAME.swo enable

THAT'S ALL! OpenOCD is ALREADY serving the ITM stream on port 46000!

The data flow is simply:

  • OpenOCD port 46000: ITM stream ready to use
  • OpenOCD port 4444: Telnet for memory reads and DWT config
  • orbtop-rtos: Connects DIRECTLY to OpenOCD port 46000

No manual telnet commands needed for ITM! Everything is automatic when you start OpenOCD with the cfg.

BUT THE MAGIC IS: The cfg file defines helper functions that orbtop-rtos WILL USE via telnet:

# These functions are available via telnet for orbtop-rtos to call:
proc rtos_dwt_config {address}    # Called by orbtop-rtos to monitor thread switches!
proc exception_trace_enable {}     # Called when using -E option
proc pc_sampling_config {freq}     # For PC sampling
proc sync_config {rate}           # For SYNC packet control

When orbtop-rtos starts, it:

  1. Connects to OpenOCD telnet (port 4444)
  2. Finds the osRtxInfo symbol address
  3. Calls rtos_dwt_config 0xXXXXXXXX via telnet to configure DWT Comparator 1
  4. The DWT then monitors that address for thread switches!

So the cfg provides both:

  • Automatic ITM/SWO setup when OpenOCD starts
  • Helper functions that orbtop-rtos calls via telnet

Basic RTOS Monitoring

# orbtop-rtos connects DIRECTLY to OpenOCD's ITM stream on port 46000
orbtop-rtos \
  -s localhost:46000 \               # OpenOCD's SWO/ITM output port
  -p ITM \                           # Protocol: ITM
  -e /path/to/firmware.elf \         # ELF file with symbols
  -T rtxv5 \                         # RTOS type
  -W 4444 \                          # OpenOCD telnet port
  -F 480000000 \                     # CPU frequency (480MHz)
  -I 1000                            # Update interval (1 second)

Real-world Example

orbtop-rtos \
  -s localhost:46000 \
  -p ITM \
  -e /home/fnicolas/Documentos/GIT/nb_combiner/build/arm-none-eabi/Debug/NUBE_CB_APP_6.0.0.0.elf \
  -T rtxv5 \
  -W 4444 \
  -F 480000000 \
  -I 1000

JSON UDP Output (No Console)

orbtop-rtos \
  -s localhost:46000 \
  -p ITM \
  -e firmware.elf \
  -T rtxv5 \
  -W 4444 \
  -F 480000000 \
  -j udp:46006          # JSON via UDP, console disabled

# Receive JSON in another terminal
nc -lu 46006

With Exception Tracking

orbtop-rtos \
  -s localhost:46000 \
  -p ITM \
  -e firmware.elf \
  -T rtxv5 \
  -W 4444 \
  -F 480000000 \
  -E                    # Enable exception statistics

FTrace Output for TraceCompass

orbtop-rtos \
  -s localhost:46000 \
  -p ITM \
  -e firmware.elf \
  -T rtxv5 \
  -W 4444 \
  -F 480000000 \
  --ftrace trace.txt    # Generate ftrace text output

# Open with Eclipse TraceCompass
# File -> Open Trace -> select trace.txt
# Choose "ftrace" as trace type

Thread Switch Detection and Processing

When a new TCB is detected via DWT:

// From rtos_support.c - Thread switch handling
void rtosHandleDWTMatchWithTimestamp(..., uint32_t value, 
                                     uint64_t itm_timestamp, ...) {
    // value = new TCB address that was written
    
    // 1. Check if this is a new thread
    struct rtosThread *thread;
    HASH_FIND_INT(rtos->threads, &value, thread);
    
    if (!thread) {
        // New thread! Allocate and add to hash
        thread = calloc(1, sizeof(struct rtosThread));
        thread->tcb_addr = value;
        HASH_ADD_INT(rtos->threads, tcb_addr, thread);
        
        // Clear cache for this TCB range
        rtosClearMemoryCacheForTCB(value);
    }
    
    // 2. Read thread info from target (uses cache)
    rtos->ops->read_thread_info(rtos, symbols, thread, value);
    
    // 3. Update timing for previous thread
    if (rtos->current_thread && rtos->current_thread != value) {
        struct rtosThread *prev;
        HASH_FIND_INT(rtos->threads, &rtos->current_thread, prev);
        if (prev) {
            // Calculate how long previous thread ran
            uint64_t delta = itm_timestamp - rtos->last_switch_time;
            prev->accumulated_time_us += delta;
            prev->accumulated_cycles += delta * cpu_freq / 1000000;
        }
    }
    
    // 4. Switch to new thread
    rtos->current_thread = value;
    rtos->last_switch_time = itm_timestamp;
    thread->context_switches++;
    thread->window_switches++;
}

Adding Support for Other RTOS

To add FreeRTOS or other RTOS support, implement the rtosOps interface:

struct rtosOps {
    // Read thread info from TCB
    int (*read_thread_info)(struct rtosState *rtos,
                           struct SymbolSet *symbols,
                           struct rtosThread *thread,
                           uint32_t tcb_addr);
    
    // Detect RTOS from ELF symbols
    bool (*detect)(struct SymbolSet *symbols,
                   struct rtosDetection *result);
    
    // Initialize RTOS (find current thread pointer)
    int (*init)(struct rtosState *rtos,
                struct SymbolSet *symbols);
    
    // Get priority name string
    const char* (*get_priority_name)(int8_t priority);
};

Example FreeRTOS implementation would:

  1. Find pxCurrentTCB symbol instead of osRtxInfo
  2. Map FreeRTOS TCB offsets (different from RTX5)
  3. Configure DWT to watch pxCurrentTCB address

Implementation Flow

flowchart TD
    Start([orbtop-rtos start])
    
    Start --> LoadELF[Load ELF symbols]
    LoadELF --> DetectRTOS{Detect RTOS type}
    
    DetectRTOS -->|RTX5 found| FindSymbol[Find osRtxInfo symbol]
    DetectRTOS -->|Not found| Error[Exit: RTOS not supported]
    
    FindSymbol --> CalcAddr[Calculate thread.run.curr address]
    CalcAddr --> ConnectTelnet[Connect to OpenOCD telnet]
    
    ConnectTelnet --> ConfigDWT[Call rtos_dwt_config via telnet]
    ConfigDWT --> ConnectITM[Connect to ITM stream]
    
    ConnectITM --> MainLoop{Process ITM packets}
    
    MainLoop --> PacketType{Packet type?}
    
    PacketType -->|HW Event| ReadTCB[Read TCB via telnet]
    PacketType -->|Timestamp| UpdateTime[Update timestamp]
    
    ReadTCB --> UpdateStats[Update thread statistics]
    
    UpdateStats --> CheckInterval{Interval complete?}
    UpdateTime --> CheckInterval
    
    CheckInterval -->|No| MainLoop
    CheckInterval -->|Yes| Output[Generate output]
    
    Output --> ResetCounters[Reset interval counters]
    ResetCounters --> MainLoop
Loading

Technical Details

DWT Comparator Configuration

The DWT comparator monitors writes to osRtxInfo.thread.run.curr:

  • DWT_COMP1: Set to address of current thread pointer
  • DWT_MASK1: 0x00000000 (no masking, exact match)
  • DWT_FUNC1: 0x00000814
    • Bits 0-3: 0x4 = Generate watchpoint debug event
    • Bit 4: 1 = EMITRANGE
    • Bits 10-11: 0x2 = Data write of size 4 bytes

ITM Timestamp Handling

ITM timestamps are incremental, not absolute. The tool accumulates them to track real time:

// From orbtop_rtos.c - Timestamp handling
struct TSMsg {
    uint32_t timeInc;    // Incremental timestamp from ITM
    enum timeDelay timeStatus;
};

void _handleTS(struct TSMsg *m, struct ITMDecoder *i) {
    // Accumulate incremental timestamps
    _r.timeStamp += m->timeInc;
}

// When DWT event arrives with thread switch
void _handleDataAccessWP(struct wptMsg *m, struct ITMDecoder *i) {
    // Use accumulated timestamp for thread timing
    rtosHandleDWTMatchWithTimestamp(_r.rtos, _r.s, 
                                    m->comp, 0, m->data, 
                                    _r.timeStamp,  // Accumulated!
                                    options.telnetPort);
}

The ITM generates timestamp packets:

  • Local timestamps: Small increments between packets
  • Global timestamps: Periodic full timestamp sync
  • Prescaler affects resolution (typically /4 or /16 of CPU clock)

With proper prescaler settings:

  • Resolution: ~microsecond level
  • Used to calculate accurate thread execution times
  • Window-based statistics reset every interval

Thread Statistics Calculation

// Per thread, per interval:
cpu_percent = (accumulated_time_us * 100.0) / window_time_us;
accumulated_cycles = (accumulated_time_us * cpu_freq) / 1000000;

ITM Overflow and Its Impact on CPU Measurements

The Problem

When ITM overflow occurs, packets are LOST, including:

  • HW packets: Thread switch events from DWT
  • Timestamp packets: Relative timing information

Since ITM timestamps are incremental (not absolute), losing packets means:

  1. Lost thread switches: The tool doesn't know a thread ran
  2. Lost time intervals: Can't calculate how long threads executed
  3. Incorrect CPU percentages: Missing data leads to wrong calculations

How It Shows in Output

Interval: 1000 ms, CPU Usage: 78.600%,  Max: 82.345%, CPU Freq: 480000000Hz [ITM OVERFLOW DETECTED!]

Or when total doesn't add up to ~100%:

Interval: 1000 ms, CPU Usage: 45.234%,  Max: 82.345%, CPU Freq: 480000000Hz [WARNING: Low total - possible lost DWT events]

Why This Happens

  • Too much ITM traffic: Exception trace + thread switches + SW packets
  • SWO bandwidth limit: 2MHz pin frequency can't handle all data
  • Buffer overruns: ITM internal buffers overflow

Solutions

  1. Reduce ITM traffic:

    • Disable exception trace if not needed (don't use -E)
    • Increase update interval (-I 2000 for 2 seconds)
    • Disable SW ITM output in firmware
  2. Increase SWO bandwidth (in cfg file):

    $_CHIPNAME.swo configure -protocol uart -traceclk 480000000 -pin-freq 4000000
  3. Monitor overflow counter: Watch the Ovf counter in output

IMPORTANT: When overflow occurs, CPU usage percentages are UNRELIABLE! The tool shows warnings but continues running with incorrect data.

Troubleshooting

Issue Cause Solution
No thread data DWT not configured Verify telnet connection and osRtxInfo symbol
Wrong thread names Memory cache stale Tool auto-clears cache on thread switch
Missing timestamps ITM timestamps disabled Set ITM_TCR.TSENA=1 in target config
High CPU usage shown Interval too short Increase -I parameter (default 1000ms)
ITM OVERFLOW warning Too much ITM traffic Reduce traffic or increase SWO bandwidth
Low CPU total (<95%) Lost DWT events/overflow Check for overflow, reduce ITM load

Performance Considerations

  • Memory caching: Reduces telnet round-trips
  • Batch telnet commands: Multiple reads in single transaction
  • UDP mode: Eliminates console rendering overhead
  • DWT overhead: Single comparator has minimal target impact

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant