Building for and debugging an ARM Cortex-M

I recently bought an ST NUCLEO-L432KC, a small dev board with an STM32L432KC, a Cortex-M4 micro-controller designed for ultra-low-power applications.1 These STM32 MCUs come with a proprietary tool for code generation called STM32CubeMX, unfortunately my experience with it was very poor so to speak: I couldn’t even generate a functioning base project for this board.2

Since ST doesn’t provide ready to use template projects I had to look for an alternative. PlatformIO is a popular tool nowadays for these sorts of things. I’ve never used it before but I was able to go from zero to blinking LED in less than 5 minutes, literally. My only issue with it was that this felt like cheating, and I was looking to learn a bit more about how these binaries are built, so I decided to build it “from scratch”.

First you will need to install a GDB and GCC cross-compiler that can target ARM, the Newlib C standard library, and OpenOCD. This will depend on your distro but in Arch Linux you’ll find everything in the official repo:

sudo pacman -Syu arm-none-eabi-gcc arm-none-eabi-newlib arm-none-eabi-gdb openocd

The next step is to obtain CMSIS Core (standard APIs for Cortex processors), BSP and HAL drivers for our MCU’s family, which is all provided by ST on GitHub:

mkdir nucleo-l432kc
cd nucleo-l432kc
git submodule add --depth 1 https://github.com/STMicroelectronics/cmsis-device-l4.git
git submodule add --depth 1 https://github.com/STMicroelectronics/stm32l4xx-hal-driver.git
git submodule add --depth 1 https://github.com/STMicroelectronics/stm32l4xx-nucleo-32-bsp.git
git submodule add --depth 1 https://github.com/STMicroelectronics/cmsis-core.git

Now you will need the device vector table (“contains the initialization value for the stack pointer, and the entry point addresses of each exception handler”) and the linker script for our MCU. You can find these files and also a bunch of basic examples in the STM32CubeL4 MCU Firmware Package repo, which is actually quite heavy, so let’s grab just what we need from there:

git clone --depth 1 --filter=blob:none --sparse https://github.com/STMicroelectronics/STM32CubeL4.git
cd STM32CubeL4
git sparse-checkout set Projects/NUCLEO-L432KC/Examples/GPIO/GPIO_IOToggle

The vector table initialisation code is in startup_stm32l432kcux.s, and the linker script for GNU ld is STM32L432KCUX_FLASH.ld. This directory includes project files for IDEs we don’t care about, but you should copy the example code inside Src and Inc for blinking the green LED in the board.

For some reason STM32L432KCUX_FLASH.ld is missing a symbol named __end__ that is needed in newlib. I haven’t look too much into this but maybe ST’s fork of newlib does something different. In any case, what I have done is to include the following line in the ._user_heap_stack almost at the end of the file:

. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
PROVIDE ( __end__ = . ); /* Add this directive here */
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);

I will save you the pain of having to figure out what exactly you need to compile this simple project with GCC (you can also find all this stuff in my git repo nucleo-l432kc-template):

CC = arm-none-eabi-gcc
DEFS = -DSTM32L432xx
CFLAGS = -Wall -Wextra -g -march=armv7e-m+fp -mfloat-abi=hard -mfpu=fpv4-sp-d16 --specs=rdimon.specs
LDFLAGS = -TSTM32L432KCUX_FLASH.ld
CMSIS_DEVICE = cmsis-device-l4/
HAL_DRIVER = stm32l4xx-hal-driver/
NUCLEO_BSP = stm32l4xx-nucleo-32-bsp/
INCLUDES = -Isrc -I$(HAL_DRIVER)/Inc -I$(NUCLEO_BSP) -Icmsis-core/Include -I$(CMSIS_DEVICE)/Include

all:
	$(CC) $(CFLAGS) $(LDFLAGS) $(DEFS) $(INCLUDES) \
	$(HAL_DRIVER)Src/stm32l4xx_hal.c \
	$(HAL_DRIVER)Src/stm32l4xx_hal_cortex.c \
	$(HAL_DRIVER)Src/stm32l4xx_hal_gpio.c \
	$(HAL_DRIVER)Src/stm32l4xx_hal_pwr.c \
	$(HAL_DRIVER)Src/stm32l4xx_hal_pwr_ex.c \
	$(HAL_DRIVER)Src/stm32l4xx_hal_rcc.c \
	$(NUCLEO_BSP)stm32l4xx_nucleo_32.c \
	$(CMSIS_DEVICE)Source/Templates/system_stm32l4xx.c \
	startup_stm32l432kcux.s \
	src/stm32l4xx_it.c \
	src/main.c

The rdimon specs file is used for semihosting, which you will need later for debugging. Once you run make you should get a new a.out file. You can use the common file command or arm-none-eabi-objdump -f to see that the ELF is for an ARM architecture.

Let’s now flash the MCU with this new file. Plug the dev board to a USB port, then open a new terminal and run:

openocd -f interface/stlink.cfg -c "transport select hla_swd" -f target/stm32l4x.cfg

You should see OpenOCD detecting a Cortex-M4 processor and informing us that GDB is ready to accept new connections on port 3333. Now in another terminal let’s use GDB to flash our device:

arm-none-eabi-gdb -q a.out

target extended-remote :3333
monitor program a.out verify

A message indicating that the programming has finished and successfully verified should appear on the screen. We won’t see any blinking LED yet because the state of the MCU is halted (you can see this by running monitor targets), so start our program inside GDB do:

monitor reset run

Now you should see the small green LED blinking. You can exit GDB with q and kill OpenOCD with Ctrl-C.

To finish this tutorial, let’s try some basic debugging capabilities. Open the main.c source and look for the while loop where the GPIO port connected to the LED gets toggled. Add a printf("toggled\n"); anywhere inside the loop and include stdio.h at the to of the file. Now recompile and reprogram the MCU with this new binary. You may notice the green LED turns on but doesn’t blink, this is because the MCU is halted awaiting for a host to connect. In the GDB session do:

monitor arm semihosting enable
load
monitor reset run

At this point you will see that in the OpenOCD terminal our message gets printed every 100 milliseconds. You can do all sorts of things now in GDB, like seeing the current point of execution with the l command, or halting the execution and stepping with monitor halt and the s command.


  1. As low as ~10 μA in run mode, and in the order of tens of nanoamps in standby mode↩︎

  2. I won’t go into details here but for instance it just didn’t generate a Makefile for me↩︎

  1. 📆 December 31, 2024
  2. 🏷️ embedded systems

Programming an FPGA with a FOSS toolchain

This is a quick guide on how to program a Sipeed Tang Nano 9K (GOWIN GW1NR-9) using a fully open source toolchain for FPGA programming. I will use Arch Linux but you may find some of the required packages available in your distro’s package repository as well.

Install Yosys and OpenFPGALoader from the official Arch repository with pacman:

sudo pacman -S yosys openfpgaloader

Now you will need to install nextpnr and Project Apicula from AUR. I had a few problems with these packages so I will explain what I’ve done to make it work for me, but try a normal installation first as these problems will likely get fixed in the near future.

First download and install my apicula-git package from AUR.

Then download the nextpnr-git package, also from AUR, and edit the PKGBUILD file to only build your target:

_ARCHS=('himbaechel')

And comment out this dependency:

    himbaechel)
      #makedepends+=('prjapicula')

Now you should be able to makepkg -i, this will take some time so you can brew a mate meanwhile. If everything has gone well you should be able to run nextpnr-himbaechel --version successfully.

At this point you have everything needed to synthesise, route&place, generate bitstreams and upload them into your Tang Nano. Let’s try out the LED example from the Sipeed wiki.

In a new directory create a led.v Verilog file with the following code:

module led (
    input clk,              // clk input
    input rst,              // reset input
    output reg [5:0] led    // 6 LEDS pin
);

reg [23:0] counter;

always @(posedge clk or negedge rst) begin
    if (!rst)
        counter <= 24'd0;
    else if (counter < 24'd1349_9999)       // 0.5s delay
        counter <= counter + 1'b1;
    else
        counter <= 24'd0;
end

always @(posedge clk or negedge rst) begin
    if (!rst)
        led <= 6'b111110;
    else if (counter == 24'd1349_9999)       // 0.5s delay
        led[5:0] <= {led[4:0],led[5]};
    else
        led <= led;
end

endmodule

You will need a CST file for this board; you can find an example in the Apicula’s repo but you can also use this fragment (name it tangnano9k.cst):

IO_LOC "clk" 52;
IO_LOC "led[0]" 10;
IO_LOC "led[1]" 11;
IO_LOC "led[2]" 13;
IO_LOC "led[3]" 14;
IO_LOC "led[4]" 15;
IO_LOC "led[5]" 16;
IO_LOC "key" 3;
IO_LOC "rst" 4;

You are ready to synthesise this design with the following yosys script:

yosys -p "read_verilog led.v; synth_gowin -top led -json led.json"

If the synthesis completes successfully there will be a new led.json file. Next let’s use nextpnr:

nextpnr-himbaechel --json led.json --write pnrled.json --device GW1NR-LV9QN88PC6/I5 --vopt family=GW1N-9C --vopt cst=tangnano9k.cst

After a few seconds you should see a message saying Program finished normally and there should be a new pnrled.json file that we will use to generate the bitstream:

gowin_pack -d GW1N-9C -o led.fs pnrled.json

Now it’s time to grab your dev board :D

Connect the Tang Nano 9K to a USB-C and try this:

openFPGALoader --scan-usb

This should list an FT2232 JTAG Debugger, if it doesn’t then try to unplug and plug again (and avoid sketchy USB hubs ^_^). If you do see the debugger then you can run:

openFPGALoader --detect

This should list a Gowin device and it means we are ready to upload the bitstream:

openFPGALoader -b tangnano9k -f led.fs

Once it’s finished, the six tiny LEDs next to the FPGA should be turning on progressively.

  1. 📆 October 19, 2024
  2. 🏷️ programmable logic, electronics

The Lawless Guide to Monads

This is an introduction to the concept of monads in Haskell for those who are starting to get familiar with the basics of the language - things like polymorphism, type signatures, higher-order functions and so. I will use a very simple example that looks useless but will hopefully help you understand this concept.

Start a REPL session with the command ghci and try the following:

ghci> f = (\x -> [2*x])
ghci> f 3
[6]

Here we define a function f that takes a number and gives back a list of number(s). We can use the :type command to check this:

ghci> :type f
f :: Num a => a -> [a]

It can take some time to get use to reading type signatures but this is a fundamental part of programming in Haskell so make sure you understand them.

We will be playing quite a bit with lists so for brevity, let’s create, say, a list of numbers named xs that we can use for trying stuff out:

ghci> xs = [1,2,3]
ghci> :type xs
xs :: Num a => [a]

Since we have a list of numbers and a function that takes a number, we could use the map function to apply f to each element of xs:

ghci> map f xs
[[2],[4],[6]]

As you can see, we have successfully multiplied each element of xs by two, although, because our function returns a list with a single number, we get this funky list of lists of numbers (i.e. Num a => [[a]]). But worry not: as Haskell is such a great language, we can easily flatten it:

ghci> concat (map f xs)
[2,4,6]

If by any chance you are wondering “can we make this shorter?” I’ve got great news for you :D Haskell is such a magnificent language that it provides an operator called “bind” to do the same thing in half the characters!

ghci> xs >>= f
[2,4,6]

Fantastic isn’t it? Next,

ghci> xs >>= (\x -> [2*x])
[2,4,6]
ghci> xs >>= (\x -> return (2*x))
[2,4,6]

The first expression is there just to remind you what f is. Now, the second expression is equivalent to the first but it uses a function named return. Note that this name is unfortunately misleading: this is not a statement for “returning” some value to the caller (like it would be for instance in C), this is just an ordinary function, and in this case we are simply applying it to (2*x).

For the sake of clarity, this is an equivalent expression without using >>=:

concat (map (\x -> return (x*2)) xs)
[2,4,6]

Since we still get the same result, it seems like return is just wrapping a value in a list, no? Hmmm… Let’s check its type signature:

ghci> :type return
return :: Monad m => a -> m a

This says that return is a function that takes a value of some generic type a and gives back a monad of type a. “But weren’t we expecting a list of a?”, I hear you say. Well, guess what? Lists are monads!

You see, for a type to be a monad it needs to implement the functions >>= (bind) and return that work for its own type, and Haskell lists do that out of the box. This idea of classifying something in terms of what you can do with it might seem a bit strange if you don’t have experience with type classes1. If this is the case for you then you may want to revisit ad hoc polymorphism in Haskell to get more familiar with it.

Back to monads, we haven’t checked the signature of >>= yet:

ghci> :type (>>=)
(>>=) :: Monad m => m a -> (a -> m b) -> m b

Since >>= is an infix function, the first parameter represents the left-hand-side, this is m a, and the second one the right-hand-side, (a -> m b). If we consider that in our case m is a list then you could spell out this signature as “bind is a function that takes two parameters: (i) a list of type A and (ii) a function that takes a value of type A and gives back a list of type B; bind then gives back a list of type B”.

If there is one thing you should remember from all this it is this >>= function, in particular its type signature (i.e. m a -> (a -> m b) -> m b). As a mnemonic you could say that “a monad represents a type that can be bound over”, similar to the classic “a functor represents a type that can be mapped over”.

Congrats, you now know what a monad is. There are still a few details2 you could learn but you’ll be alright skipping those for now. In my next article I will show you how to use a monad called IO to deal with I/O operations in a purely functional way.


  1. Or similar abstraction mechanisms from other languages, like interfaces, traits, etc.↩︎

  2. Mainly monad laws (which inspired this article’s title)↩︎