This repository contains the RVVTS test sets, reports, and categorized failure cases discussed in the paper From Generation to Failure Categorization: An Open-Source automated RTL Verification Framework for RVV by Manfred Schlägl, Jonas Reichhardt, and Daniel Große, presented at the ACM Great Lakes Symposium on VLSI (GLSVLSI) 2026.
This work is based on the RVVTS RISC-V Vector Test Framework. RVVTS provides the coverage-guided test generation, automated execution, failure minimization, single-instruction isolation and automated failure categorization used for these results.
RVVTS executed each test case on Spike as reference and the verilated PULP Ara as DUT, compared the resulting machine state (registers, CSRs, trap counts, etc.), minimized each detected deviation to the triggering instruction and categorized the detected case in a specified failure category.
- Reference model (REF): Spike / riscv-isa-sim, commit
1553a2a896. - DUT: PULP Ara, commit
a6436df6ad. - Test framework: RVVTS, commit
2fe4c04d00(tag RVVTSv2_AFC_GLSVLSI_2026). - Target: RV64 with RVV 1.0 and
VLEN = 4096bit. - Test sets: TestSets_RVV_RefSpike_RV64_VLEN4096_v1.
VScontains valid non-trapping sequences;IVScontains invalid/trap-triggering sequences for negative testing.- RVV test case split: 25 code fragments.
The following table reproduces Table 2 from the paper. The VS/IVS entries link to the corresponding generated test sets.
Note: The pre-generated test sets can also be used for testing other DUTs with similar configuration (e.g., VLEN = 4096).
| Test set | Million Instructions | Million RVV Instr. (% w.r.t. all Instr.) | Functional Coverage | Detected Fails | Minimized Cases (% w.r.t. detected) | Isolated Failing Instructions |
|---|---|---|---|---|---|---|
Valid Sequences (VS) |
2.17 | 0.95 (43.78%) | 31,403 / 33,076 (94.94%) | 48,973 | 46,601 (95.16%) | 584 overall, 13 exclusive |
Invalid+Valid Sequences (IVS) |
2.04 | 1.00 (49.02%) | 31,950 / 33,076 (96.60%) | 33,630 | 33,391 (99.29%) | 598 overall, 27 exclusive |
Merged Sequences (MS = VS + IVS) |
4.21 | 1.95 (46.32%) | 31,951 / 33,076 (96.60%) | 82,603 | 79,992 (96.84%) | 611 overall |
Unfortunately, the results are too large for a GitHub repository.
The results can be downloaded from an external source by running the download_extract_results.sh script at the top level of the repository (compressed/download: 1.5 GiB; decompressed: 25 GiB).
The following table repeats the deviation categories and adds a short description of the architectural symptom captured by the AFC rules. The descriptions summarize the rule intent; each categorized failure is still further grouped by its isolated instruction in the result directories.
| ID | Failure Category | Category Description | VS Detected Fails (% w.r.t. detected VS) | IVS Detected Fails (% w.r.t. detected IVS) | MS Detected Fails (% w.r.t detected MS) | MS Minimized Cases (% w.r.t detected MS in Cat.) | MS Isolated Instructions | RVV Instruction Classes in MS (class: #cases / #instr) |
|---|---|---|---|---|---|---|---|---|
| 1 | VREG_ONLY |
Only vector register contents differ between reference and DUT. This usually points to wrong vector instruction results or vector register handling. | 34,315 (70.07%) | 3,654 (10.87%) | 37,969 (45.97%) | 37,108 (97.73%) | 448 | integer: 10,405 / 132; mask: 10,290 / 13; fixed-point: 6,324 / 32; floating-point: 4,135 / 89; permutation: 2,478 / 15; reduction: 1,774 / 16; load: 1,702 / 151 |
| 2 | VTYPE_VILL_SET_ERROR |
The reference sets the vill bit in vtype, but the DUT does not. The DUT therefore fails to mark an illegal vector configuration as illegal. |
90 (0.18%) | 19,858 (59.05%) | 19,948 (24.15%) | 19,948 (100.00%) | 1 | config: 19,948 / 1 |
| 3 | ARA_HANG |
Ara does not complete the test case and the DUT machine state records an invalid last committed PC. This captures deadlocks or execution stalls rather than a normal architectural mismatch. | 8,384 (17.12%) | 1,606 (4.78%) | 9,990 (12.09%) | 8,521 (85.30%) | 291 | permutation: 3,650 / 10; store: 1,404 / 104; load: 1,349 / 134; fixed-point: 1,204 / 6; mask: 514 / 2; integer: 298 / 9; floating-point: 57 / 22; reduction: 45 / 4 |
| 4 | MSTATUS_EXT_DUT |
The mstatus.fs/vs extension-state bits differ, with more bits set on the DUT than on the reference. This suggests that the DUT invalidly enables or dirties floating-point or vector extension state. |
6 (0.01%) | 4,303 (12.80%) | 4,309 (5.22%) | 4,297 (99.72%) | 101 | floating-point: 3,277 / 91; reduction: 589 / 6; permutation: 431 / 4 |
| 5 | EXC_INVALID_REJECT |
The DUT reports more traps than the reference. The DUT rejected an instruction or configuration that the reference considers valid. | 1,174 (2.40%) | 1,242 (3.69%) | 2,416 (2.92%) | 2,415 (99.96%) | 65 | reduction: 1,043 / 16; mask: 825 / 10; integer: 511 / 27; floating-point: 13 / 8; permutation: 11 / 2; store: 7 / 1; load: 5 / 1 |
| 6 | EXC_INVALID_ACCEPT |
The reference reports more traps than the DUT. The DUT accepted an instruction or configuration that the reference considers invalid. | 3 (0.01%) | 2,309 (6.87%) | 2,312 (2.80%) | 2,310 (99.91%) | 413 | floating-point: 750 / 91; load: 367 / 107; integer: 332 / 92; fixed-point: 240 / 32; permutation: 199 / 15; store: 164 / 63; reduction: 148 / 8; mask: 110 / 5 |
| 7 | DMEM_ONLY |
Only the dedicated data-memory hash differs. The failure manifests as an unexpected data-memory update, typically from store-side behavior. | 2,159 (4.41%) | 139 (0.41%) | 2,298 (2.78%) | 2,076 (90.34%) | 124 | store: 2,076 / 124 |
| 8 | VCSR_ONLY |
Only vector CSR state such as vxrm, vxsat, or vcsr differs. No other architectural state mismatch is present. |
1,442 (2.94%) | 272 (0.81%) | 1,714 (2.07%) | 1,714 (100.00%) | 2 | Zicsr config: 1,714 / 2 |
| 9 | IREG_ONLY |
Only integer register contents differ. This can indicate a wrong scalar result or an unintended write to an integer register. | 819 (1.67%) | 147 (0.44%) | 966 (1.17%) | 925 (95.76%) | 3 | permutation: 543 / 1; mask: 382 / 2 |
| 10 | FREG_ONLY |
Only floating-point register contents differ. This captures failures whose visible symptom is limited to floating-point register values. | 536 (1.09%) | 26 (0.08%) | 562 (0.68%) | 561 (99.82%) | 1 | permutation: 561 / 1 |
| 11 | VTYPE_INVALID_ACCEPT |
vtype differs because the reference marks a non-zero illegal encoding with vill, while the DUT does not. This is an invalid vector-type acceptance symptom. |
4 (0.01%) | 51 (0.15%) | 55 (0.07%) | 55 (100.00%) | 2 | config: 55 / 2 |
| 12 | FCSR_FFLAGS_ONLY |
Only fcsr differs and the mismatch is confined to the floating-point exception flags fflags. Other fcsr fields and architectural state match. |
28 (0.06%) | 5 (0.01%) | 33 (0.04%) | 33 (100.00%) | 5 | floating-point: 32 / 4; reduction: 1 / 1 |
| 13 | VREGFCSR_ONLY |
Only vector registers and fcsr differ. This combines a vector result mismatch with floating-point status side effects, with no other state deviations. |
10 (0.02%) | 5 (0.01%) | 15 (0.02%) | 15 (100.00%) | 6 | floating-point: 14 / 5; reduction: 1 / 1 |
| 14 | VCSR |
A vector CSR mismatch is present together with other architectural deviations. This separates mixed failures from pure VCSR_ONLY cases. |
3 (0.01%) | 6 (0.02%) | 9 (0.01%) | 9 (100.00%) | 7 | fixed-point: 9 / 7 |
| 15 | VSTART_WEXC |
vstart differs while matching non-zero exception counts show that a trap occurred. This points to vector restart-state handling around exceptions. |
0 (0.00%) | 5 (0.01%) | 5 (0.01%) | 5 (100.00%) | 5 | load: 5 / 5 |
| 16 | VALREG_ONLY |
All deviations are confined to architectural value registers across integer, floating-point, and vector registers. CSRs, memory, and PC-related state match. | 0 (0.00%) | 2 (0.01%) | 2 (0.00%) | 0 (0.00%) | 0 | |
| 17 | PC_DIFF |
The program counter or last committed PC differs, but no Ara hang was classified. This captures control-flow or commit-PC mismatches. | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 18 | VLENB |
The vlenb CSR differs. The DUT and reference therefore disagree on the vector-register byte length reported to software. |
0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 19 | MSTATUS_EXT_REF |
The mstatus.fs/vs extension-state bits differ, with more bits set on the reference than on the DUT. This suggests that the DUT failed to enable or dirty expected extension state. |
0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 20 | MSTATUS_DIFF |
The mstatus.fs/vs bits differ, but neither side simply has more enabled or dirty extension-state bits. This captures other mstatus.fs/vs pattern mismatches. |
0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 21 | VTYPE_VILL_CLEAR_ERROR |
The DUT sets vtype.vill while the reference clears it for an otherwise zero vector type. The DUT marks a legal vector configuration as illegal. |
0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 22 | VTYPE_INVALID_REJECT |
vtype differs because the DUT marks a non-zero vector-type encoding as illegal while the reference accepts it. This is the counterpart of invalid vector-type acceptance. |
0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 23 | VTYPE_DIFF |
vtype differs in a way not covered by the specific vill set, clear, accept, or reject categories. This captures remaining vector-type CSR mismatches. |
0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 24 | VL_ONLY |
Only the vl CSR differs. The DUT and reference agree on all other state but compute or retain a different active vector length. |
0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 25 | VSTART_WOEXC |
vstart differs while the matching exception count is zero. This points to an unexpected vstart update or reset during normal, non-trapping execution. |
0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 26 | FCSR_ONLY |
Only fcsr differs, but the mismatch is not limited to fflags. This includes rounding-mode or other floating-point CSR differences. |
0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 27 | XMEM_ONLY |
Only the dedicated instruction-memory hash differs. The visible symptom is a memory-side effect outside the dedicated data-memory region. | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 28 | FREGFCSR_ONLY |
Only floating-point registers and fcsr differ. This combines floating-point value deviations with floating-point status side effects. |
0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| 29 | UNKNOWN |
Fallback category for deviations that do not match any AFC rule. No Ara result in this data set falls into this category. | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 0 | |
| Total | Summary over all listed categories. | 48,973 (100.00%) | 33,630 (100.00%) | 82,603 (100.00%) | 79,992 (96.84%) | 611 |
If you use this material or find it useful, please cite our papers as follows:
-
@inproceedings{SRG:2026b, author = {Manfred Schl{\"{a}}gl and Jonas Reichhardt and Daniel Gro{\ss}e}, title = {From Generation to Failure Categorization: An Open-Source automated {RTL} Verification Framework for {RVV}}, booktitle = {ACM Great Lakes Symposium on VLSI (GLSVLSI)}, year = 2026 }
-
@inproceedings{SG:2024b, author = {Manfred Schl{\"{a}}gl and Daniel Gro{\ss}e}, booktitle = {International Conference on Computer-Aided Design}, title = {Single Instruction Isolation for {RISC-V} Vector Test Failures}, year = {2024}, }