This repository provides materials (Assembly sources, and analysis scripts) for the paper ``Efficiently Detecting Masking Flaws in Software Implementations'' (co-authored by Nima Mahdion and Elisabeth Oswald). The ower traces that we utlised are archived via Zenodo and can be accessed from Data set (traces) for Software implementations of Multiplication Gadgets: SEAL .
- All_implementations_C_ASM_multiplication_Gadgets
- python_experiments
- References
- Acknowledgement
The assembly implementations are for ARM cortex M3.
c = a * b, where a, b and c are one share.
The function gfmul is used in all gadget for computing a[i] * b[i].
gf_mul.S is the implementation of GF(2^8) (gfmul(a,b,c), c = a * b) for ARM Cortex-M0/3 in GNU assembly, with THUMB-16 instructions. The multiplication is based on Log_Ex with table.
gf_mul.S can be compiled for any ARM Cortex-M0/3.
gf_mul.h contains the table of Log_Exp.
The script is tested via calling gfmul(a,b,c) function in gf_mul.c.
test_GFMUL.py is for generating random inputs, sending the inputs from PC to the Microcontroller and receiving the output from Microcontroller via UART port.
In case using SCALE board:
From scale directory run:
./run.sh GFMUL gf_mul
Then for test, run test_GFMUL.py
Two and three shares Multiplication: a.b = c
- ISW
- BBPP
- DOM_INDEP
- HPC1_OPT
- PINI1
- PINI2
Leakage detection is conducted for the two and three shares implementations, which are also documented in all_gadgets_leakage_detection.pdf.
-
Arm assembly (thumb-16 instructions), tested on LPC NXP Cortex-M3.
-
byte-oriented
-
Inputs (shares of a, b, rnd) of the gadgets are generated externally and then sent to the device.
-
Inputs (shares of a, b, rnd) and outputs (shares of c) of the gadgets are stored in memory as the follow: For example, when n_sh=3:
Mem map: * 00 00 00 a0 * 00 00 00 a1 * 00 00 00 a2
In this way, the horizontal neighbor effects are eliminated.
-
All together 10 Instructions have been used:
mov movs ldrb R_x, [R_y, R_z] ldrb R_x, [R_y, \#imm] ldr R_x, =table adds negs asrs ands eors strb R_x, [R_y, \#imm]$ pop push
-
No control-flow instructions: no branch instruction
-
Galois field multiplication for one share is based on Double Log-Exp, using lookup table.
-
In our power-consumption measurement setup, we record the instructions before raising-edge of the trigger. Therefore, trigger instructions (according to SCALE board) is spotted (in these implementations) in the .S file, as the follow:
@ Trigger
#########################################################################
# Seperating instructions related to main function and trigger, as there are pipeline stages
@ Trigger
@ scale.c: LPC13XX_GPIO1->GPIODATA &= ~( 0x1 << 0 ) ; // initialise SCALE_GPIO_PIN_TRG = 0
@ SCALE_GPIO_PIN_TRG in scale board: pin33: PIO1_0
@ PIO1_0: https://www.digikey.pl/htmldatasheets/production/660585/0/0/1/lpc1311fhn33-551.html : 9.4 Register description
@ baseaddress: 0x50010000, offset: 0x3ffc, baseaddress: 0x50010000, offset: 0x3ffc
@ address: 0x5001ffc; producing this value: needs several instructions:
@ https://developer.arm.com/documentation/den0042/a/Unified-Assembly-Language-Instructions/Instruction-set-basics/Constant-and-immediate-values
@ Start of trigger
ldr r4, =0x50013ffc
movs r5,#1
ldr r6, [r4, #0] @ r6 = 0 : SCALE_GPIO_PIN_TRG = 0
# test: str r6, [r3, #0] @ r6 = 0xfc0f0000
eors r5, r6 @ r5 = 1 @ Start trigger: SCALE_GPIO_PIN_TRG = 1
str r5, [r4, #0]
nop
nop
nop
nop
@ End of trigger
str r6, [r4, #0] @ End trigger: r6 = 0 : SCALE_GPIO_PIN_TRG = 0
If a trigger is not required, these instruction can be omitted.
Within the SCALE board framework, it is also possible to add the trigger into a .c file, as demonstrated below:
scale_gpio_wr( SCALE_GPIO_PIN_TRG, true);
Isw_3(shares_a, shares_b, rnd, shares_ab);
scale_gpio_wr( SCALE_GPIO_PIN_TRG, false);
In case you wish to use the implementations on SCALE board. Please follow the below example. The same procedure is applied to all gadgets (ISW, BBPP, DOM_INDEP, HPC1_OPT, PINI1, PINI2).
ISW_2 file: The implementation of the First-Order ISW multiplication (two shares) in Arm assembly, with THUMB-16 instructions. is for testing the isw_2 function. RUN_gadgets.py: Running the gadgets by transferring data through UART to/from the SCALE board.
Testing:
Download SCALE.
$ git clone http://www.github.com/danpage/scale.git
$ cd scale ; export SCALE="${PWD}"
$ git submodule update --init --recursive
Copy ISW_2 file
in scale/hw
directory.
Then:
$ cd scale
$ export SCALE="${PWD}"
$ cd hw
$ export SCALE_HW="${PWD}"
$ export TARGET="${SCALE_HW}/target/lpc1313fbd48"
$ cd ${TARGET}
$ make --no-builtin-rules clean all
$ cd ${SCALE_HW}/isw_2
$ sudo make --no-builtin-rules -f ${TARGET}/build/lib/scale.mk BSP="${TARGET}/build" USB="/dev/ttyUSB0" PROJECT="isw_2" PROJECT_SOURCES="isw_2.c isw_2.S" clean all program
Then, on the SCALE board:
-
Press and hold the (right-hand) GPI switch,
-
Press and hold the (left-hand) reset switch,
-
Release the (left-hand) reset switch,
-
Transfer via lpc21isp starts,
-
Release the (right-hand) GPI switch,
Finally, running the RUN_gadgets.py.
Up to 5 shares
bl gfmul
in .S file).
Prefix _b, _B means using branch instruction (bl gfmul
).
Leakage detection is not conducted.
Multiplication: a.b = c
-
ISW: Up to 5 shares
-
HPC1_OPT: Up to 4 shares
-
DOM_DEP: 3 to 5 shares
-
DOM_INDEP: Up to 5 shares
-
BBPP_OPT: 3 to 5 shares
-
Arm assembly (thumb-16 instructions), tested on LPC NXP Cortex-M3.
-
byte-oriented
-
Inputs (shares of a, b, rnd) of the gadgets are generated externally and then sent to the device.
-
Inputs (shares of a, b, rnd) and outputs (shares of c) of the gadgets are stored in memory as the follow: For example, when n_sh=4:
Mem map: * a3 a2 a1 a0 * ... * b3 b2 b1 b0 * ... * c3 c2 c1 c0
-
All together the below instructions have been used:
bl bx mov movs ldrb R_x, [R_y, R_z] ldrb R_x, [R_y, \#imm] ldr R_x, =table adds negs asrs ands eors strb R_x, [R_y, \#imm]$ pop push
-
gfmul: Galois field multiplication for one share is based on Double Log-Exp, using lookup table. gfmul is defined as a function that computes the multiplication of
$a_i.b_j$ , and it is called bybl gfmul
in .S file. -
Branch instruction to call function gfmul.
In case you wish to use the implementations on SCALE board. Please follow the below example. The same procedure is applied to all gadgets (ISW, BBPP, DOM_INDEP, HPC1_OPT, PINI1, PINI2).
ISW_2_B file: _b, _B in ISW_2_B means branch instruction. Branch instruction to call function gfmul. The implementation of the First-Order ISW multiplication (two shares) in Arm assembly, with THUMB-16 instructions.
RUN_gadgets.py: Running the gadgets by transferring data through UART to/from the SCALE board.
Testing:
Download SCALE.
$ git clone http://www.github.com/danpage/scale.git
$ cd scale ; export SCALE="${PWD}"
$ git submodule update --init --recursive
Copy ISW_2_B file in scale/hw
directory.
Then:
$ cd scale
$ export SCALE="${PWD}"
$ cd hw
$ export SCALE_HW="${PWD}"
$ export TARGET="${SCALE_HW}/target/lpc1313fbd48"
$ cd ${TARGET}
$ make --no-builtin-rules clean all
$ cd ${SCALE_HW}/isw_2_b
$ sudo make --no-builtin-rules -f ${TARGET}/build/lib/scale.mk BSP="${TARGET}/build" USB="/dev/ttyUSB0" PROJECT="isw_2_b" PROJECT_SOURCES="isw_2_b.c isw_2_b.S" clean all program
Then, on the SCALE board:
-
Press and hold the (right-hand) GPI switch,
-
Press and hold the (left-hand) reset switch,
-
Release the (left-hand) reset switch,
-
Transfer via lpc21isp starts,
-
Release the (right-hand) GPI switch,
Finally, running the RUN_gadgets.py.
C implementations of the gadgets.
Each gadget is encapsulated in its own directory including a header file (Gadget_name.h) and two source files (Gadget_name.c, main.c).
The number of Mask_ORD
(number of shares = Mask_ORD+1) can be changed.
can be adjusted directly in the gadget's header file (Gadget_name.h).
For compilation:
Using the GCC compiler
gcc main.c Gadget_name.c
For running:
./a.out
From ISW directory:
gcc main.c ISW.c
a.out will be generated, For running:
./a.out
Python dependencies are in requirements.txt:
pip install -r requirements.txt
Use: pip install pySerial
The most important packages are:
trsfile==0.3.2
numpy==1.19.5
Please use python 3.12.10
and numpy==1.19.5
The standard format .trs file is used for storing data (inputs, outputs, traces values).
All I/O data of the gadgets are represented in bytes and stored in a .trs file. Within the .trs file, the variable
Each gadget includes:
-
$a$ , shares of a (mask_a) -
$b$ , shares of b (mask_b) -
$rnd$ (rnd_gadget) -
$c$ , shares of c (out_len_gadget)
It is important to note that the I/O stored in .trs for T-test/split-T-test includes an extra byte compared to those for SNR/Template attacks/F-test. This extra byte referred to as data_set byte
in_len_trs = data_set (= rnd_or_fix) + a + b + input_of_gadget (= mask_a + mask_b + rnd_gadget)
in_len_trs = d + 1 + 1 + in_len_gadget
Where
out_len_trs = out_len_gadget = mask_order + 1 = number of shares
Regarding the value of
Consequently, in all scripts, it is importand to accurately set the value for the variable
This script contains common Functions and Classes that are used across multiple scripts in the project.
These scripts are especially writen (by Si) for handling .trs files a standard format for storing dataset (inputs, outputs, traces values).
It is used for running and testing Gadgets on SCALE board by sending the gadget's inputs and receiving outputs through UART to and from the board
This code extracts data_set
Capturing the power-consumption of ARM Cortex-M3 microprocessor on SCALE board while it is executing multiplication gadgets written in Assembly Thumb-16 instructions.
The power-consumption traces are recorded by Pico oscilloscope 5000a, in the Rapid-mode.
These scripts are related to the implementations existed in:
-
All_implementations_C_ASS_multiplication_Gadgets/ASS_gadgets_2_3_shares_Leakage_Detection
-
All_implementations_C_ASS_multiplication_Gadgets/ASS_gadgets_H_HV_16
The power-consumption of the instructions before raising-edge of the trigger are recorded. Since trigger instructions (correspond SCALE board) is spotted (in these implementations) in the .S file, as the follow:
@ Trigger
#########################################################################
Seperating instructions related to main function and trigger, as there are pipeline stages
@ Trigger
@ scale.c: LPC13XX_GPIO1->GPIODATA &= ~( 0x1 << 0 ) ; // initialise SCALE_GPIO_PIN_TRG = 0
@ SCALE_GPIO_PIN_TRG in scale board: pin33: PIO1_0
@ PIO1_0: https://www.digikey.pl/htmldatasheets/production/660585/0/0/1/lpc1311fhn33-551.html : 9.4 Register description
@ baseaddress: 0x50010000, offset: 0x3ffc, baseaddress: 0x50010000, offset: 0x3ffc
@ address: 0x5001ffc; producing this value: needs several instructions:
@ https://developer.arm.com/documentation/den0042/a/Unified-Assembly-Language-Instructions/Instruction-set-basics/Constant-and-immediate-values
@ Start of trigger
ldr r4, =0x50013ffc
movs r5,#1
ldr r6, [r4, #0] @ r6 = 0 : SCALE_GPIO_PIN_TRG = 0
# test: str r6, [r3, #0] @ r6 = 0xfc0f0000
eors r5, r6 @ r5 = 1 @ Start trigger: SCALE_GPIO_PIN_TRG = 1
str r5, [r4, #0]
nop
nop
nop
nop
@ End of trigger
str r6, [r4, #0] @ End trigger: r6 = 0 : SCALE_GPIO_PIN_TRG = 0
Furthermore, depending on the trigger setting (instructions), acq_gadget.py script allows specifying the start and end points of the recording window.
This script contains functions related to generating gadget inputs (a, b, shares of a, shares of b, and randomness). These functions are used in acq_gadget.py to generate and send inputs to the SCALE board and receive the output via the serial port in the Rapid-mode.
Generating:
- Random inputs used in computing SNR and F-test.
- Fixed inputs used in template attack.
- Collapsed inputs used in F-test.
- Random/fixed inputs used in first-order T-test.
- Random/fixed inputs, 2-shares out of 3-shares used in split T-test.
This script measures the power-consumption, and stores all data (inputs, outputs, traces values) in .trs file. The traces can be generated to be used in SNR, T-test, Template attacks, split-T-test, F-test.
In this code, one can set the gadget_name, number of shares (2, 3), number of cycles to be captured. Furthermore, the generation of gadget inputs see section type_of_execution_gadget.py.
This code is used to plot the power-consumption traces stored in the .trs .
Computing the SNR based on the gadget_name and extracting Point Of Interest (POI), by setting a threshold.
This script computes the intermediate values for SNR and is utilized in TRS_common_func.py, which is subsequently called in snr.py.
This code claculates the SNR based on the gadget_name and the number of shares (2/3), since they have different intermediate values.
It is possible to use different set of traces (T-test, F-test, ...), but it is important to set data_set
The intermediate values can be chosen via modifying Function Cal_im_value
(called in snr.py) in intermediate_values_n.py.
This is also like snr.py, but it uses multiprocessing with the python package parmap
.
It calculates SNR for traces that are randomly selected from a .trs file.
For more information, please see [1].
The script is used for conducting uni/multi-variate T-test using scipy.stats.ttest_ind
package.
Input file format of data_info and traces can be .trs or .npy files.
SNR is used in the context of multivariate T-tests in order to reduce the complexity of the computations.
Using one sample (time) point
Variable
This setup can be used to performing statistical t-tests on traces of:
- Two shares gadget
- split-t-test
- Three shares gadget (in first-order evaluation)
Variable
Using mean-free squared one sample point
This setting can be used to performing statistical t-test on traces of Three shares gadget (in second-order uni-variate assessment) and also even split-t-test.
Using two sample (time) points:
Central-product combinations of two sample points
It can perform second-order multi-variate t-test on:
- All cycles (samples) in the trace set
- Two cycles: In this case, POI can be extracted using the class in snr.py. Or just firstly run snr.py, obtain the POI and then put the list of POI in the script t_test_SNR.py.
This script is used for conducting first/second-order uni-variate T-test using the scipy.stats.ttest_ind
module utilizing the parmap
package for multiprocessing, especially when dealing with a large number of traces.
Setting variable
[1]: Leakage Assessment Methodology - A Clear Roadmap for Side-Channel Evaluations.
[2]: A Novel Framework for Explainable Leakage Assessment
This research was funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 72504, SEAL).