These articles are written by Codalogic empowerees as a way of sharing knowledge with the programming community. They do not necessarily reflect the opinions of Codalogic.

Raspberry Pi Pico Assembly Programming (Part 1)

By: Pete, January 2023

(Code for this post can be found on Github at https://github.com/codalogic/Pico-Assembler)

This post describes programming the Raspberry Pi Pico using ARM Thumb Assembler.

This is an educational exercise rather than a sensible way to do large amounts of Pico programming. It does cover some details about the Pi Pico that you might not discover programming in C via the Pico SDK and also shows some aspects of Arm Thumb assembler.

The end goal is to flash a bunch of red-green LEDs as shown in the video below.

The red-green LEDs I purchased (by mistake - I meant to get 3 leg LEDs!) light up red if you feed current through them one way and green if you feed current through them the other way. I therefore connected each LED (along with a series resistor) between two pins on the Pico. Driving one of those pins high and the other low makes the LED go red and flipping the pin drive to low and high makes the LED go green.

The Pico is a complex chip (See https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf). To get it up and running involves multiple steps, including releasing peripherals from reset and setting up the system clocks via various phase locked loop settings.

As this is less interesting than flashing LEDs I cheated and installed the Pico C SDK to do this initial heavy lifting (At some point I hope to go back and do all the reset in assembler so that I have assembler only code).

To get the Pico SDK installed I followed Gary Explains' instructions shown here and here. My differences to Gary's instructions is that I didn't need to build CMake from source as the version in Ubuntu 20.04 is sufficiently up to date and instead of configuring the XServer using export DISPLAY 127.0.0.1:0 I had to do export DISPLAY :0.

When creating the new project via ./pico_project.py --gui I (unimaginatively) set the project name to ASM1.

I created an initially empty myasm.s file to contain my assembly code and changed the following line in the generated CMakeLists.txt from:

add_executable(ASM1 ASM1.c )

to:

add_executable(ASM1 ASM1.c myasm.s)

This was sufficient to get the build chain to correctly process my assembly file. (As Gary Explains, the makefile was generated by doing cd build; cmake ... The actual build was invoked by doing make -j4 while in the build directory. Rather than dragging and dropping the built ASM1.uf2 to the Pico using Windows Explorer as shown in Gary's video, I opened a CMD window in the build directory and did copy ASM1.uf2 e: where e: is the drive letter that the Pico appeared as to Windows. This is easier when doing it repetitively.)

Back to the code... I editted the generated ASM1.c C file to include a prototype for the assembly function I would write (my_main) and called it in the main() function. The result is as follows:

#include <stdio.h>
#include "pico/stdlib.h"

void my_main(void);

int main()
{
    stdio_init_all();
    puts("Hello, world!");

    my_main();

    return 0;
}

(The stdio_init_all() call and associated puts() is not needed, but I left them because I thought they might help with debugging. In the end I didn't need them and they could be removed.)

There is a lot of code that runs before main() is called to set up the Pico. This can be seen after successful compilation in the file ASM1.dis. The relevant section is:

1000021a <platform_entry>:
1000021a:    4919          ldr    r1, [pc, #100]    ; (10000280 <__get_current_exception+0x1a>)
1000021c:    4788          blx    r1
1000021e:    4919          ldr    r1, [pc, #100]    ; (10000284 <__get_current_exception+0x1e>)
10000220:    4788          blx    r1
10000222:    4919          ldr    r1, [pc, #100]    ; (10000288 <__get_current_exception+0x22>)
10000224:    4788          blx    r1
10000226:    be00          bkpt    0x0000
10000228:    e7fd          b.n    10000226 <platform_entry+0xc>
...
10000280:    10001e45     .word    0x10001e45
10000284:    1000035d     .word    0x1000035d
10000288:    10001e01     .word    0x10001e01

In the above, the first blx r1 is a call to runtime_init, the C source of which is available on GitHub here.

The second blx r1 is the call to main(). (The third is to the exit() function.)

(The addresses in the jump table at 10000280, such as 10001e45, are odd valued because the lsb is set to indicate to the processor that the called function is written in Thumb code. To find the function called in the disassembly file, subtract one. i.e. look for 10001e44.)

I mention this to show that there is quite a lot of setup involved and that is without considering the multi-step booting from flash. Thus it is complex, needs to be done right, but not particularly exciting and not especially informative.

If you're happy accepting this shortcut we can get to the actual assembly code that starts in the my_main assembly function. Once my_main is entered no more use is made of the C SDK.

Below is the whole assembly code so you can give it a quick scan. After this I will break it down to explain each part.

.syntax unified
.cpu cortex-m0plus
.thumb

.data

gpio_all_pins_mask: .word 0
gpio_even_pins_mask: .word 0

.text

@ RP2040 Section: 2.19.6.1. IO - User Bank
.equ IO_BANK0_BASE, 0x40014000
.equ IO_BANK0_GPIO_CTRL_BASE, IO_BANK0_BASE + 0x04

.equ FUNCSEL_SIO, 5
.equ GPIO_FUNCSEL, FUNCSEL_SIO
.equ GPIO_INIT_VALUE, (GPIO_FUNCSEL<<0)

@ RP2040 Section: 2.19.6.3. Pad Control - User Bank
.equ PADS_BANK0_BASE, 0x4001c000 
.equ PADS_BANK0_GPIO_OFFSET, 0x04 
.equ PADS_BANK0_GPIO_BASE, PADS_BANK0_BASE + PADS_BANK0_GPIO_OFFSET
@ Bit 7 - OD Output disable. Has priority over output enable from peripherals RW 0x0
@ Bit 6 - IE Input enable RW 0x1
@ Bits 5:4 - DRIVE Drive strength.
@           0x0 → 2mA
@           0x1 → 4mA
@           0x2 → 8mA
@           0x3 → 12mA
@           RW 0x1
@ Bit 3 - PUE Pull up enable RW 0x0
@ Bit 2 - PDE Pull down enable RW 0x1
@ Bit 1 - SCHMITT Enable schmitt trigger RW 0x1
@ Bit 0 - SLEWFAST Slew rate control. 1 = Fast, 0 = Slow RW 0x0
@          bit nums:   76543210
.equ PAD_INIT_VALUE, 0b00010110

@ RP2040 Section: 2.3.1. SIO
@ 2.3.1.7. List of Registers
.equ SIO_BASE, 0xd0000000
.equ GPIO_OE,     SIO_BASE + 0x020 @ GPIO output enable
.equ GPIO_OE_SET, SIO_BASE + 0x024 @ GPIO output enable set
.equ GPIO_OE_CLR, SIO_BASE + 0x028 @ GPIO output enable clear
.equ GPIO_OE_XOR, SIO_BASE + 0x02c @ GPIO output enable XOR
.equ GPIO_OUT,     SIO_BASE + 0x010 @ GPIO output value
.equ GPIO_OUT_SET, SIO_BASE + 0x014 @ GPIO output value set
.equ GPIO_OUT_CLR, SIO_BASE + 0x018 @ GPIO output value clear
.equ GPIO_OUT_XOR, SIO_BASE + 0x01c @ GPIO output value XOR

@ RP2040 Section: 4.6. Timer
@ Use TIMERAWL raw timer because we don't need the latching behaviour of TIMELR
.equ TIMER_BASE, 0x40054000
.equ TIMER_TIMERAWL, TIMER_BASE + 0x28

LEDS: .byte 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 26
LEDS_LEN = . - LEDS

.global my_main
.thumb_func
my_main:
    push    {r0-r7, lr}     @ For now we don't ever plan to return to the C world, but play it safe anyway and protect the C environment
    bl      init_all_gpios
    bl      compute_led_masks
    bl      flash_leds
    pop     {r0-r7, pc}

.thumb_func
init_all_gpios:
    push    {r0-r2, LR}
    @ Usage:
    @ r0 - reserved for calling init_single_gpio
    @ r1 - base address of LED numbers array
    @ r2 - initially offset of last value in LED numbers array, then counted down
    ldr     r1, =LEDS
    movs    r2, #LEDS_LEN-1
_init_all_gpios_loop:
    ldrb    r0, [r1, r2]
    bl      init_single_gpio
    subs    r2, r2, #1
    bpl     _init_all_gpios_loop
    pop     {r0-r2, PC}

.thumb_func
init_single_gpio:
    @ Input:
    @ r0 - LED number
    push    {r1-r3, LR}
    @ Usage:
    @ r1 - base addresses
    @ r2 - computed address offsets
    @ r3 - (computed) values to set in register

    @ init IO_BANK0 register
    ldr     r1, =IO_BANK0_GPIO_CTRL_BASE
    ldr     r3, =GPIO_INIT_VALUE    @ Load early to reduce stall cycles (Probably makes no difference on Cortex M0+!)
    lsls    r2, r0, #3      @ Multiply LED number by 8 because IO_BANK0 config is pairs of 32 bit, 4 byte registers
    str     r3, [r1, r2]

    @ init PAD
    ldr     r1, =PADS_BANK0_GPIO_BASE
    ldr     r3, =PAD_INIT_VALUE    @ Load early to reduce stall cycles (Probably makes no difference on Cortex M0+!)
    lsls    r2, r0, #2      @ Multiply LED number by 4 as 4 bytes per registers
    str     r3, [r1, r2]

    @ init SIO (Set low, output enable)
    movs    r3, #1
    lsls    r3, r3, r0
    ldr     r1, =GPIO_OUT_CLR
    str     r3, [r1]
    ldr     r1, =GPIO_OE_SET
    str     r3, [r1]

    pop     {r1-r3, PC}

compute_led_masks:
    push    {r0-r6, LR}
    @ Usage:
    @ r0 - base address of LED numbers array
    @ r1 - initially offset of last value in LED numbers array, then counted down
    @ r2 - computed gpio_all_pins_mask
    @ r3 - computed gpio_even_pins_mask
    @ r4 - value read from array
    @ r5 - update value
    @ r6 - immediate value #1!
    ldr     r0, =LEDS
    movs    r1, #LEDS_LEN-1
    movs    r2, #0
    movs    r3, #0
    movs    r6, #1
_compute_led_masks_loop:
    ldrb    r4, [r0, r1]                @ Load LED number
    movs    r5, #1                      @ Compute bit to update
    lsls    r5, r4                      @ ...
    orrs    r2, r2, r5                  @ Insert bit into gpio_all_pins_mask
    tst     r1, r6                      @ Test if even/odd
    bne     _compute_led_masks_skip_odd
    orrs    r3, r3, r5                  @ Conditionally insert bit into gpio_even_pins_mask
_compute_led_masks_skip_odd:
    subs    r1, r1, #1                  @ Move on to next LED number
    cmp     r1, #0                      @ Loop if more LEDs to do
    bpl     _compute_led_masks_loop     @ ...
    ldr     r5, =gpio_all_pins_mask     @ Save computed gpio_all_pins_mask value
    str     r2, [r5]                    @ ...
    ldr     r5, =gpio_even_pins_mask    @ Save computed gpio_even_pins_mask value
    str     r3, [r5]                    @ ...
    pop     {r0-r6, PC}

.thumb_func
flash_leds:
    push    {r0-r2, LR}
    ldr     r0, =gpio_even_pins_mask    @ Set even pins high
    ldr     r0, [r0]                    @ ...
    ldr     r1, =GPIO_OUT_SET
    str     r0, [r1]
    ldr     r0, =gpio_all_pins_mask     @ Load all pins mask
    ldr     r0, [r0]                    @ ...
    ldr     r1, =GPIO_OUT_XOR           @ Load GPIO XOR register address
_flash_leds_loop:
    bl      wait_1_second
    str     r0, [r1]                    @ Invert output pins
    b       _flash_leds_loop
    @ Never returns

.thumb_func
wait_1_second:
    push    {r0-r3, LR}
    @ Usage:
    @ r0 - 1 million
    @ r1 - TIMER_TIMERAWL address
    @ r2 - start time
    @ r3 - current time and delta time
    ldr     r0, =#1000000
    ldr     r1, =TIMER_TIMERAWL
    ldr     r2, [r1]
_wait_1_second_loop:
    ldr     r3, [r1]
    subs    r3, r3, r2  @ r3 = delta time
    cmp     r3, r0      @ compare to 1 million
    blt     _wait_1_second_loop
    pop     {r0-r3, PC}

Now let's explain each part...

The following preamble is important to tell the assembler the correct version of Arm Thumb code to use. The Pico's Cortex-M0+ uses the simplest instruction set that Arm currently use.

.syntax unified
.cpu cortex-m0plus
.thumb

The Pico GPIO pins are all driven by bits in various memory-mapped peripheral registers. Each register has a different function but all GPIO pins are represented in a given register using a different bit. For example, there is a register to indicate whether GPIO pins are outputs. Each GPIO pin will have a bit in that register indicating whether the corresponding pin is an output or not. Another register is used to configure whether each GPIO pin should be high or low.

The plan is to create two masks. The first mask consists of a 1 bit for each pin that we wish to drive. Each LED requires a pair of pins to drive it. So the second mask only has 1s present for every other pin so that the pins in a pair can be set to different values.

The masks are computed by the compute_led_masks function which is described later but it is useful to store their values as global variables. The following allocates variables to store the first mask in gpio_all_pins_mask and the second mask in gpio_even_pins_mask. Naturally this is in the data segment.

.data

gpio_all_pins_mask: .word 0
gpio_even_pins_mask: .word 0

Key peripheral registers are memory mapped and need special values to set them up correctly. Much of this information can be found from the RP2040 datasheet or reverse engineering the Pico C SDK. The relevant key addresses are configured using the code below. The comments indicate which section of the RP2040 datasheet contain the relevant information.

.text

@ RP2040 Section: 2.19.6.1. IO - User Bank
.equ IO_BANK0_BASE, 0x40014000
.equ IO_BANK0_GPIO_CTRL_BASE, IO_BANK0_BASE + 0x04

.equ FUNCSEL_SIO, 5
.equ GPIO_FUNCSEL, FUNCSEL_SIO
.equ GPIO_INIT_VALUE, (GPIO_FUNCSEL<<0)

@ RP2040 Section: 2.19.6.3. Pad Control - User Bank
.equ PADS_BANK0_BASE, 0x4001c000 
.equ PADS_BANK0_GPIO_OFFSET, 0x04 
.equ PADS_BANK0_GPIO_BASE, PADS_BANK0_BASE + PADS_BANK0_GPIO_OFFSET
@ Bit 7 - OD Output disable. Has priority over output enable from peripherals RW 0x0
@ Bit 6 - IE Input enable RW 0x1
@ Bits 5:4 - DRIVE Drive strength.
@           0x0 → 2mA
@           0x1 → 4mA
@           0x2 → 8mA
@           0x3 → 12mA
@           RW 0x1
@ Bit 3 - PUE Pull up enable RW 0x0
@ Bit 2 - PDE Pull down enable RW 0x1
@ Bit 1 - SCHMITT Enable schmitt trigger RW 0x1
@ Bit 0 - SLEWFAST Slew rate control. 1 = Fast, 0 = Slow RW 0x0
@          bit nums:   76543210
.equ PAD_INIT_VALUE, 0b00010110

@ RP2040 Section: 2.3.1. SIO
@ 2.3.1.7. List of Registers
.equ SIO_BASE, 0xd0000000
.equ GPIO_OE,     SIO_BASE + 0x020 @ GPIO output enable
.equ GPIO_OE_SET, SIO_BASE + 0x024 @ GPIO output enable set
.equ GPIO_OE_CLR, SIO_BASE + 0x028 @ GPIO output enable clear
.equ GPIO_OE_XOR, SIO_BASE + 0x02c @ GPIO output enable XOR
.equ GPIO_OUT,     SIO_BASE + 0x010 @ GPIO output value
.equ GPIO_OUT_SET, SIO_BASE + 0x014 @ GPIO output value set
.equ GPIO_OUT_CLR, SIO_BASE + 0x018 @ GPIO output value clear
.equ GPIO_OUT_XOR, SIO_BASE + 0x01c @ GPIO output value XOR

@ RP2040 Section: 4.6. Timer
@ Use TIMERAWL raw timer because we don't need the latching behaviour of TIMELR
.equ TIMER_BASE, 0x40054000
.equ TIMER_TIMERAWL, TIMER_BASE + 0x28

To create the masks an array of pin numbers is used. This is shown below. Each number is a Pico GPIO pin number and an LED is connected across each pair.

LEDS: .byte 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 26
LEDS_LEN = . - LEDS

The main entry point for the assembly code is my_main as shown below:

.global my_main
.thumb_func
my_main:
    push    {r0-r7, lr}     @ For now we don't ever plan to return to the C world, but play it safe anyway and protect the C environment
    bl      init_all_gpios
    bl      compute_led_masks
    bl      flash_leds
    pop     {r0-r7, pc}

The .global my_main directive is required so the my_main label is exposed to the linker and it can be called from the C main() function.

The push {r0-r7, lr} instruction pushes all the registers onto the stack including the LR link register (and the corresponding pop {r0-r7, pc} pulls them all off including the PC program counter - effectively performing a function return). For an assembly programmer this is a very nice and handy instruction. For a compiler it adds little value. Hence you find this kind of instruction in Thumb and ARM32 but it is dropped from the ARM64 instruction set.

The body of the my_main function calls the init_all_gpios to initialise the GPIO pins and compute_led_masks to compute the masks mentioned earlier. The main flashing of the LEDs is done in flash_leds.

init_all_gpios initialises all the relevant GPIO pins specified by the LEDS array mentioned earlier.

The base address of the LEDs array is placed in r1 and the offset to the last member of the list is placed in r2. A pin number is loaded into r0 from the LEDS array using the ldrb r0, [r1, r2] instruction. init_single_gpio is called to initialise that pin and then r2 (the offset into the LEDS array) is decremented. If r2 is positive (greater than or equal to zero) the bpl _init_all_gpios_loop instruction will loop back to initialise the next pin.

.thumb_func
init_all_gpios:
    push    {r0-r2, LR}
    @ Usage:
    @ r0 - reserved for calling init_single_gpio
    @ r1 - base address of LED numbers array
    @ r2 - initially offset of last value in LED numbers array, then counted down
    ldr     r1, =LEDS
    movs    r2, #LEDS_LEN-1
_init_all_gpios_loop:
    ldrb    r0, [r1, r2]
    bl      init_single_gpio
    subs    r2, r2, #1
    bpl     _init_all_gpios_loop
    pop     {r0-r2, PC}

The RP2040 is a complex beast where each pin is multi-functional. To set a pin to be a GPIO pin a number of steps are required.

First the function associated with a pin must be set. In addition to being a GPIO pin, a pin could be part of a UART, USB, Timer or other. The registers starting at the IO_BANK0_BASE address configure this (among other things). Each pin has a consecutive pair of registers used to configure it, so in ascending order we have the registers: GPIO0_STATUS, GPIO0_CTRL, GPIO1_STATUS, GPIO1_CTRL, GPIO2_STATUS, GPIO2_CTRL and so on. It is the GPIOx_CTRL registers that are needed to configure a pin to be GPIO. Therefore, the lsls r2, r0, #3 is used to convert the pin number into an offset to be added to the base of the ctrl registers to find the relevant register.

Each GPIO is connected to a PAD (as in "IC bonding pad" to which internal wires are connected to the external legs of the chip). The way the pad is driven, irrespective of the function it is given, is controlled by the registers starting at the PADS_BANK0_GPIO_BASE address. The sorts of things that can be configured are the maximum drive currents, whether the pin should have a pull up or pull down resistor and slew rate control (e.g. to minimise RFI on long lines). The power-on reset defaults are actually OK for this application but I have re-initialised them anyway in case the C SDK had set other values.

Finally a pin needs to be set as an output pin and it's output set low. Each pin will have a bit in the GPIO_OUT register to set it's output value. As there are multiple pins controlled by the same register an output can be set low by setting the relevant bit to 1 in the GPIO_OUT_CLR register. This will set the pin low (assuming it's output is enabled) without affecting the other pins. There is also a GPIO_OUT_SET register to set output pins to high and a GPIO_OUT_XOR register to invert the pin. (This concept is repeated throughout the chip to enable correct multi-threaded, multi-core operation. To allow both cores to simulataneiously write to the GPIO pins without corrupting the other core's pins, the Pico SDK writes values to the pins using the expression: *GPIO_OUT_XOR = (*GPIO_IN ^ new_value) & mask;. The *GPIO_IN ^ new_value part effective sets the bits that are different to what we want to 1 and the bits that are what we want to 0. The XOR performed by *GPIO_OUT_XOR then flips the bits that are not what we want and leaves the bits that are what we want unchanged.)

The last part of the pin setup is to make the pin output enabled (OE) by writing the relevant bit to the GPIO_OE_SET register.

.thumb_func
init_single_gpio:
    @ Input:
    @ r0 - LED number
    push    {r1-r3, LR}
    @ Usage:
    @ r1 - base addresses
    @ r2 - computed address offsets
    @ r3 - (computed) values to set in register

    @ init IO_BANK0 register
    ldr     r1, =IO_BANK0_GPIO_CTRL_BASE
    ldr     r3, =GPIO_INIT_VALUE    @ Load early to reduce stall cycles (Probably makes no difference on Cortex M0+!)
    lsls    r2, r0, #3      @ Multiply LED number by 8 because IO_BANK0 config is pairs of 32 bit, 4 byte registers
    str     r3, [r1, r2]

    @ init PAD
    ldr     r1, =PADS_BANK0_GPIO_BASE
    ldr     r3, =PAD_INIT_VALUE    @ Load early to reduce stall cycles (Probably makes no difference on Cortex M0+!)
    lsls    r2, r0, #2      @ Multiply LED number by 4 as 4 bytes per registers
    str     r3, [r1, r2]

    @ init SIO (Set low, output enable)
    movs    r3, #1
    lsls    r3, r3, r0
    ldr     r1, =GPIO_OUT_CLR
    str     r3, [r1]
    ldr     r1, =GPIO_OE_SET
    str     r3, [r1]

    pop     {r1-r3, PC}

The gpio_all_pins_mask and gpio_even_pins_mask masks mentioned earlier are computed using the compute_led_masks function.

This uses quite a few registers but is actually quite simple in concept.

The LEDS array is walked through in a similar way to how it is walked through in the earlier init_all_gpios function. The gpio_all_pins_mask value is constructed in r2 and the gpio_even_pins_mask value is constructed in r3. As the LEDS array is walk through and a pin number encountered, a 1 is shifted left in r5 by the number of the pin. The shifted 1 is then or-ed into the intermediate gpio_all_pins_mask mask using the orrs r2, r2, r5 instruction. A test is made to see if the current offset into the array is even or odd by testing the least significant bit of r1 using the instruction tst r1, r6 (r6 is used to store the value 1 because the Thumb instruction set doesn't allow tst r1, #1). If the offset is even the shifted one is or-ed into the intermediate gpio_even_pins_mask mask using the orrs r3, r3, r5 instruction.

Once the two masks are computed in their respective registers, they are stored into the global gpio_all_pins_mask and gpio_even_pins_mask variables using instructions ldr r5, =gpio_all_pins_mask, str r2, [r5] and ldr r5, =gpio_even_pins_mask, str r3, [r5].

compute_led_masks:
    push    {r0-r6, LR}
    @ Usage:
    @ r0 - base address of LED numbers array
    @ r1 - initially offset of last value in LED numbers array, then counted down
    @ r2 - computed gpio_all_pins_mask
    @ r3 - computed gpio_even_pins_mask
    @ r4 - value read from LED numbers array
    @ r5 - update value
    @ r6 - immediate value #1!
    ldr     r0, =LEDS
    movs    r1, #LEDS_LEN-1
    movs    r2, #0
    movs    r3, #0
    movs    r6, #1
_compute_led_masks_loop:
    ldrb    r4, [r0, r1]                @ Load LED number
    movs    r5, #1                      @ Compute bit to update
    lsls    r5, r4                      @ ...
    orrs    r2, r2, r5                  @ Insert bit into gpio_all_pins_mask
    tst     r1, r6                      @ Test if even/odd
    bne     _compute_led_masks_skip_odd
    orrs    r3, r3, r5                  @ Conditionally insert bit into gpio_even_pins_mask
_compute_led_masks_skip_odd:
    subs    r1, r1, #1                  @ Move on to next LED number
    cmp     r1, #0                      @ Loop if more LEDs to do
    bpl     _compute_led_masks_loop     @ ...
    ldr     r5, =gpio_all_pins_mask     @ Save computed gpio_all_pins_mask value
    str     r2, [r5]                    @ ...
    ldr     r5, =gpio_even_pins_mask    @ Save computed gpio_even_pins_mask value
    str     r3, [r5]                    @ ...
    pop     {r0-r6, PC}

The flashing of the LEDs is done by the flash_leds function.

To start the LEDs flashing the first step is to load the even pins bit mask. This is used to set the relevant pins high by writing it to the GPIO_OUT_SET register (All pins were originally initialised to be low). The even pins mask is loaded using the instruction sequence ldr r0, =gpio_even_pins_mask (which loads the address of the mask), ldr r0, [r0] and written to the GPIO_OUT_SET register using the instruction sequence ldr r1, =GPIO_OUT_SET, str r0, [r1].

The all pins mask is then loaded into r0 using ldr r0, =gpio_all_pins_mask and ldr r0, [r0]. The address of the GPIO_OUT_XOR register is loaded into r1 using ldr r1, =GPIO_OUT_XOR.

A one second delay is invoked, after which the all pins mask is written to the GPIO_OUT_XOR register to invert all the pins (thus changing the colour of the LEDs) and a branch made back to the beginning of the loop.

.thumb_func
flash_leds:
    push    {r0-r2, LR}
    ldr     r0, =gpio_even_pins_mask    @ Set even pins high
    ldr     r0, [r0]                    @ ...
    ldr     r1, =GPIO_OUT_SET
    str     r0, [r1]
    ldr     r0, =gpio_all_pins_mask     @ Load all pins mask
    ldr     r0, [r0]                    @ ...
    ldr     r1, =GPIO_OUT_XOR           @ Load GPIO XOR register address
_flash_leds_loop:
    bl      wait_1_second
    str     r0, [r1]                    @ Invert output pins
    b       _flash_leds_loop
    @ Never returns

The one second delay is implemented by the wait_1_second function.

The address of the TIMER_TIMERAWL register is loaded into r1. TIMER_TIMERAWL stores the low 32-bits of a 64-bit timer that increments at microsecond intervals. (TIMER_TIMERAWL is used because the latching feature of the TIMER_TIMEL register is not required.) The C SDK version of the delay function reads both the low and high register values to create a 64-bit integer value (see here, here and here). By using the magic of modulo arithmetic and subtracting the initial low 32-bits time from the current low 32-bits time (read by ldr r3, [r1] and subtracted using subs r3, r3, r2) we don't need the full 64-bit value. We can see if enough time has elapsed using the cmp r3, r0 instruction, where r0 contains 1000000. If not, a loop is made back to read the current time again.

.thumb_func
wait_1_second:
    push    {r0-r3, LR}
    @ Usage:
    @ r0 - 1 million
    @ r1 - TIMER_TIMERAWL address
    @ r2 - start time
    @ r3 - current time and delta time
    ldr     r0, =#1000000
    ldr     r1, =TIMER_TIMERAWL
    ldr     r2, [r1]
_wait_1_second_loop:
    ldr     r3, [r1]
    subs    r3, r3, r2  @ r3 = delta time
    cmp     r3, r0      @ compare to 1 million
    blt     _wait_1_second_loop
    pop     {r0-r3, PC}

That completes the description of the code. As you can see, just to flash a few LEDs it's quite involved. The example does however show examples of programming the Cortex-M0+ in assembly code and various aspects of the inner workings of the Pico's RP2040 chip. As such it acts as a good basis for future projects.

Keywords