Body Biasing Injection experiments

Published on 05 October 2021

Body Biasing Injection
Building the injector
- Testing the probe
Bench preparation
Characterization
- Test firmware
- First scan
RDP protection bypass
- Bench setup
- Glitch campaign
Analyzing results
- Bootloader analysis
Conclusions

Several months ago, I stumbled upon this paper from Colin O’Flynn. Since I never heard about this technique, I thought it could be interesting to try it by myself and see how it performs.

Body Biasing Injection

Body Biasing Injection has first been presented in this paper from P. Maurine et al. The rough idea is to inject a high voltage pulse directly on the backside of the silicon of the target chip in order to generate faults.

The induced voltage bias will modify the threshold voltage of the underlying transistors and can potentially prevent some bits to be correctly forwarded through the CPU circuitry.

The main advantage of this technique compared to voltage glitching for instance is that it is more localized and can only perturbate a specific region of the target chip.

Several other papers are covering this technique:

Yet Another Fault Injection Technique : by Forward Body Biasing Injection by P. Maurine et al.

Body Biasing Injection Attacks in Practice by Noemie Beringuier-Boher et al.

Before we dig deeper, let's review the basics about chips. The die is a silicon substrate where several layers of metal are applied in order to create a circuit containing billions of transistors. The following picture from Wikipedia shows the different layers in a die:

https://upload.wikimedia.org/wikipedia/commons/e/ee/Cmos-chip_structure_in_2000s_%28en%29.svg

When talking about different injection techniques, the terms frontside and backside are often used. Contrarily to the schematic above:

The frontside is the metal side of the die. It's the one connecting the die pad to the actual chip package pins

The backside is the silicon side of the die. It is basically the other side of the silicon substrate.

Building the injector

In all of these papers, a high voltage generator was used and coupled with a pulse generator in order to generate the pulses. This kind of equipment is generally quite complicated to use, so Colin O’Flynn took a different approach by using a transformer in order to generate a high voltage spike.

The MOSFET Q1 is used to quickly drop the voltage across the primary of the transformer. This will generate a voltage spike on the secondary coil, proportional to the number of turns ratio between the primary and the secondary of the transformer.

I built my own version of the BBI injection by using scrape parts I found around my desk.

The two capacitors have been replaced by a single 100μF capacitor.

The MOSFET I used is a IRF7470

The transformer is a 6 turns inductance I recycled from a dead motherboard. I wound 60 turns of 0.2mm wire to make the secondary.

The injection probe is a simple spring loaded pogo pin. This will be important later on.

In the end, the injection probe looks like this:

Testing the probe

As the turn ratio on the transformer is 1:10, a 1V input should generate a 10V spike on the secondary. Hooking up the oscilloscope directly on the output could be dangerous with standard probes, as the voltage could kill the input. I used a 100:1 passive probe that is able to withstand up to 1200V in order to be able to take measurements.

The width of the MOSFET gate pulse changes the shape and height of the pulse, so I tested various gate pulse widths and fixed the pulse width to 1.5μs, which provides the highest voltage spike.

Here is an example output pulse using 10V input voltage and 1500ns gate pulse:

and with 1V input voltage and 1500ns gate pulse:

Interestingly, the assumed turn ratio of 1:10 does not look correct at all (thanks to the high voltage probe, my oscilloscope is still safe). I am not sure about what is happening here, but my explanation is that the different coil wire diameters plays a role. The approximate ratio is measured around 1:50 instead of 1:10.

Update 10/27/2021 As mentioned by @WestonBraun:

You are getting a much higher voltage from your pulse generator because the
transformer is not clamped, it's acting like a flyback transformer
https://en.wikipedia.org/wiki/Flyback_converter

In your waveforms there is a negative pulse closer to what you would expect
before the HV pulse.

Thanks to him for that information.

Bench preparation

Preparing the target

As a test target, I took a STM32F103 Nucleo devkit I had lying around. The chip is a STM32F103RBT6 which is quite common.

As BBI requires physical access to the backside, we need to grind through the plastic case of the chip in order to reach the silicon. However, there's a catch here: The silicon side of the die is located on the back of the chip package. We therefore not only need to grind the back side of the chip, but also dig a hole in the PCB in order to place the injection probe.

https://fdtc.deib.polimi.it/FDTC12/shared/FDTC-2012-keynote-1.pdf

Looking at the schematics for the devkit shows that the PCB under the chip does not contain any important signals but only some GPIO traces to the headers, it is therefore possible to drill through the PCB and only loose some GPIOs. After that, grinding the chip is a matter of patience and precision. Note that you can grind the silicon a bit without harming the chip.

Once done, the board looks like this (the image shows a probe placed on the silicon as well).

One interesting thing to notice is that this chip's die has a copper pad glued to the silicon. This pad can be removed using a scalpel to avoid breaking the die. You can see that the left part of the silicon looks smooth, this is because I was able to tear apart the last part of the copper pad that was glued here.

Once the device has been prepared, we can verify that there is a resistance value between the silicon and the ground. By using a multimeter, I could measure this resistance to be between 90kΩ and 240kΩ depending on the measurement point on the die.

Preparing the XYZ table

In order to move the probe accurately across the die, a XYZ table is required. However, it is possible to use a 3D printer to get the same kind of results. I soldered the pogo pin to a piece of prototype board and a SMA connector and fixed this small PCB to the 3D printer head using a 3D-printed part.

My friend Azox prepared a Python library that can be used to drive the printer and scan a whole area using a few Python commands:

from glitch3d import printer
from glitch3d import chip

# Prepare the printer and set its zero position
ender = printer(port="/dev/ttyUSB5",baudrate=115200, timeout=1)
ender.load_settings("glitch3d/settings/ender3.ini")
ender.set_pos(0,0,10)
ender.go_home_xyz()

# Prepare the area to scan with 0.2mm steps
target = chip()
target.set_home(166,76.5)
target.set_end(168.7,79.8)
target.steps=0.2

#Define Z axis positions in order to move the probe
UP = 76
DOWN = 73

# Run the scan. Try to inject a glitch for every position
for position in target.vertical():
        x,y = position
        ender.set_pos(x,y,UP)
        ender.set_pos(x,y,DOWN)
        bbi.run()
        ender.set_pos(x,y,UP)

Target reset

As the target might be stuck in an unstable state, I used a P-MOSFET between the 5V power supply and the target. This MOSFET is driven by the Hydrabus to reset the target after each attempt to attempt the fault in a clean fresh state.

Final setup

Once everything is connected together, this huge mess of wires and probes is ready to be run.

Characterization

Test firmware

In order to detect faults and their effects, I prepared a very simple firmware in assembly that will trigger different debug breakpoints depending on the fault effect:

.thumb

@ Variables

.equ    RAM, (0x20000000)
.equ    RAM_END, (0x20010000)

@ Vector table start
.long               0x20001000  @SP value
.long               _start      @Reset
.long               _nmi        @NMI interrupt
.long               _hardfault  @Hard fault
.long               _memfault   @Memory fault
.long               _busfault   @Bus fault
.long               _usagefault @Usage fault
.long               0x00000000  @Reserved
.long               0x00000000  @Reserved
.long               0x00000000  @Reserved
.long               0x00000000  @Reserved
.long               0x00000000  @SvCall
.long               0x00000000  @Debug
.long               0x00000000  @Reserved
.long               0x00000000  @PendSV
.long               0x00000000  @Systick
@ Vector table end

.thumb_func
_start:
    MOV R2, #0
    LDR R3, =(RAM)
    STR R2, [R3]
    .thumb_func
    _loop:
        ADD R2, R2, #1
        LDR R1, [R3]
        ADD R1, R1, #1
        STR R1, [R3]
        CMP R1, R2
        BEQ _loop
    bkpt 0xb0
    b _start
.global _start

_nmi:
bkpt 0xa0
b _nmi

_hardfault:
bkpt 0xa1
b _hardfault

_memfault:
bkpt 0xa2
b _memfault

_busfault:
bkpt 0xa3
b _busfault

_usagefault:
bkpt 0xa4
b _usagefault

This firmware is a simple infinite loop that updates a register and a variable stored in SRAM synchronously. Both values are compared and if they differ somehow, a software breakpoint is raised.

I then used a Hydrabus as a SWD probe connected to the target in order to detect when the CPU stops its execution, retrieve the register status and the breakpoint code in order to know the kind of effect the fault triggered.

These helper functions were used to control and retrieve the status of the target CPU:

import pyHydrabus
s = pyHydrabus.SWD()

def init_swd():
    s.bus_init()
    s.read_dp(0)
    s.write_dp(4, 0x50000000)
    CSW = s.read_ap(0, 0)
    s.write_ap(0,0,CSW|0b10)

def halt_cpu():
    s.write_ap(0, 0x4, 0xE000EDF0) # DHCSR
    s.write_ap(0, 0xc, 0xA05F0003)

def run_cpu():
    s.write_ap(0, 0x4, 0xE000EDF0) # DHCSR
    s.write_ap(0, 0xc, 0xA05F0001)

def reset_cpu():
    s.write_ap(0, 0x4, 0xE000ED0C)
    s.write_ap(0, 0xc, 0x05FA0004)

def read_mem(address):
    s.write_ap(0, 0x4, address)
    return s.read_ap(0, 0xc)

def read_register(regnum):
    s.write_ap(0, 0x4, 0xE000EDF4) # DCRSR
    s.write_ap(0, 0xc, regnum)
    s.write_ap(0, 0x4, 0xE000EDF8) # DCRDR
    return s.read_ap(0, 0xc)

def write_register(regnum, value):
    s.write_ap(0, 0x4, 0xE000EDF8) # DCRDR
    s.write_ap(0, 0xc, value)
    s.write_ap(0, 0x4, 0xE000EDF4) # DCRSR
    s.write_ap(0, 0xc, regnum|1<<16)

def is_running():
    s.write_ap(0, 0x4, 0xE000EDF0)
    DHCSR = s.read_ap(0,0xc)
    return (DHCSR&0x20000) == 0

The most important part is in the run_cpu() function, where the C_DEBUGEN bit is set in the DHCSR register. That way the CPU will correctly halt at a bkpt instruction.

First scan

For each location on the die, two faults where generated per input voltage, and all the results where graphed based on the location and the type of fault. In this first scan, I used 0.2mm steps roughly across the center of the chip so I missed some regions on the edges of the die.

For instance, with 1V of input, the following faults were generated:

On the opposite, with 10V of input, way more faults are generated:

The labels are the following:

Glitch Successful bypass of the infinite loop.

Hard HardFault

crash Device hanged. No reply from SWD interface

memory MemFault

other PC value in not accessible memory region.

I performed the characterization with all voltages from 1 to 10 Volts in steps of 1V, and here are the results:

As we can see here, increasing the input voltage raises the chance that something bad happens. Looking back at the results, the highest successful glitch/voltage ratio is at 4 Volts, this is what we'll use for the next step.

Interestingly enough, no successful glitches were caused by a difference in the two register values. This either means there was an instruction skip or an instruction corruption.

RDP protection bypass

RDP is the flash readout protection for ST microcontrollers. Once set it prevents being able to read the flash memory from the bootloader or the debug interface. This protection has several levels depending on the MCU family. In this case, there is only one RDP level which prevents reading the flash but keeps the debug interfaces enabled.

We now know that the BBI can induce faults in the microcontroller. To make sure that this fault injection technique can be useful, let's try it on a more realistic scenario and try to reproduce the (in-)famous RDP bypass glitch but using BBI. This technique has been done several times using voltage glitch but as far as I know, never done using BBI.

The bypass vulnerability lies in the fact that the bootloader performs a software check in order to check if the flash content is locked or not, and a glitch can bypass this check:

After locking the flash using the ST utility or through the bootloader, the attack is quite simple:

Send the Read memory command (0x11, 0xEE)

Send the glitch when the RDP check is done

Flash memory is sent back by the bootloader.

Now that we fixed the gate pulse width and the voltage, only two parameters are still to be discovered: the delay after the command and the physical location of the injection on the die.

Bench setup

To prepare the target, I fixed BOOT0 to 1 and BOOT1 to 0 using jumpers so the board always starts in bootloader mode. I then used the following functions to interact with the UART interface:

def read_ack():
    ret = u.read(1)
    if ret == b'\x79':
        return True
    else:
        #print(f"Error {ret.hex()}")
        return False

def target_init():
    u.write(b'\x7f')
    i = 0
    while not read_ack() and i < 3:
        i = i+1

def target_lock():
    u.write(b'\x82\x7d')
    if not read_ack():
        #print("Error sending read command")
        return False
    else:
        return True

def target_program():
    u.write(b'\x31\xce')
    if not read_ack():
        print("Error sending write command")
        return None
    u.write(b'\x08\x00\x00\x00\x08')
    if not read_ack():
        print("Error sending address")
        return None
    u.write(b'\x06SUCCESS\x45')
    if not read_ack():
        print("Error sending data")
        return None

def target_read():
    u.write(b'\x11\xee')
    if not read_ack():
        #print("Error sending read command")
        return None
    u.write(b'\x08\x00\x00\x00\x08')
    if not read_ack():
        #print("Error sending address")
        return None
    u.write(b'\xff\x00')
    if not read_ack():
        #print("Error sending size")
        return None
    return u.read(256)

Using these helper functions, I can program and lock the flash with the following commands:

target_reset()
target_program()
target_lock()
target_read()

Glitch campaign

I then used the UART line to synchronize my FPGA with the Read memory command by counting the number of edges of the read memory command (0x11 0xEE). I can then start iterating through delays and die position:

for position in target.vertical():
    x,y = position
    ender.set_pos(x,y,UP)
    ender.set_pos(x,y,DOWN)
    for delay in range(1200,2000,10):
        bbi.delay = delay
        for _ in range(5):
            target_reset()
            bbi.arm()
            status = target_read()
            if status is not None:
                print(f"fault @ {ender.get_pos()} with {delay} delay")
                print(status)
                oscillo_screenshot(f"{x}-{y}-{delay}.png")
    ender.set_pos(x,y,UP)

And after some time, I finally got positive results:

During the campaign, I also got several flash mass-erases which led to lots of false positives and a ruined night of sampling since the mass erase also clears the readout protection. I found out that the mass-erases happened around 3.5µs after the last rising edge of the UART line. The delay value (1790 for instance) are steps on the FPGA, which is clocked at 200MHz so about 8.95µs after the last rising edge of the UART line.

Note that the trigger is not perfectly aligned with the CPU execution, as there is several microseconds of jitter between the request and the response. This means that there could be some missing faults even if everything is correct. I could however get up to 3 faults at the same physical location out of 5 tries per location.

My script also takes oscilloscope screenshots of successful glitches. Here is one for reference:

Yellow line is the UART, blue is the BBI probe voltage.

Analyzing results

Here is the plot of all successful RDP bypasses locations on the die:

But now, what is exactly causing the fault ? We are unfortunately not able to access the debug interface when in bootloader mode, so how can we know which component is vulnerable here ? Fortunately for me, I was able to use some tools at work to thin and polish the die and take infrared pictures to see the basic blocks within the die. I then mapped the plot and the resulting picture and got this:

Even if the locations are not fully accurate, we can see that the flash memory (located on the lower right corner) is at fault. This means that we are faulting a flash read or register in order to bypass the RDP restriction. Interestingly enough, no CPU fault triggered the vulnerability (at least not with this input voltage).

Bootloader analysis

In order to analyze the results further, I dumped and analyzed the bootloader in order to see how the bootloader checks whether the RDP is enabled or not. After some time analyzing it, I located the following function:

┌ 14: fcn.1ffff132 ();
│           0x1ffff132      c049           ldr r1, [0x1ffff434]        ; [0x1ffff434:4]=0x40022000
│           0x1ffff134      0020           movs r0, 0
│           0x1ffff136      c969           ldr r1, [r1, 0x1c]
│           0x1ffff138      8907           lsls r1, r1, 0x1e
│       ┌─< 0x1ffff13a      00d5           bpl 0x1ffff13e
│       │   0x1ffff13c      0120           movs r0, 1
└       └─> 0x1ffff13e      7047           bx lr

The base register (0x40022000) is the flash controller base address. The code loads the register at address 0x4002201C and checks for the last two bits. Looking at the PM0075 document from ST shows that this is the FLASH_OBR register, and that the last two bits are:

RDPRT - When set, this indicates that the Flash memory is read-protected

OPTERR - When set, this indicates that the loaded option byte and its complement do not match.

All of this seems to make sense. The check is performed on a hardware flash controller register, this could explain why the successful faults are happening in the vicinity of the flash memory.

Conclusions

BBI is an interesting technique for sure. It offers the possibility to have a better spatial localization of the injected faults and provides a new way of injecting faults in a target at a reasonable cost.

However, this also comes with lots of drawbacks. The biggest one being the fact that the chip has to be prepared in order to access the silicon before being able to inject faults. The second drawback is the need for multiple tools and measurement probes in order to make sure that everything works the way it should. Adding the high voltage probes and the XYZ table into the bench setup is not an easy task and takes some time to setup correctly.

Compared to voltage glitching for instance, several other parameters have to be taken care of, such as the physical position of the injection probe and the input voltage. This makes the characterization way more challenging and time consuming.

It is nonetheless a different technique, and as such can be useful in some specific cases. The next step is to test BBI on devices protected by glitch detectors and see if it is possible to bypass this kind of protection.

Balda's place

Body Biasing Injection experiments