For some future projects i need to bring up a small microcontroller board containing a STM32F103C8T6. But most ARM Cortex-Mx devices are similar for how far this series is going to reach.

Because i like to really know what is running in my microcontroller projects i decided to bring up this device from scratch and document what i needed to do. Where from scratch means i'm allowed to use the basic gnu toolchain (make, g++, ld, as, objcopy), a flash program, the vendor documentation and the register definition headers from STM.

ARM microcontrollers are often said to be complicated and beyond the reach of beginners. I tend to disagree. I started my embedded development with ARM. And i found my trusted gnu toolchain well supported. But i also found big vendor libraries full of cross compiler compatibility and wrappers around simple register manipulation. While these libraries can be a great help if you just want things to be running, they tend to stand in the way of real unterstanding. So here i'm not going to use those libraries.

I made an exception for the hardware register/memory mapped io definitions. Transcibing them from the documentation to C++ code really is just error prone and not very enlightening. For this post i extracted just what is needed from those headers to show how these work. But generelly if the vendor releases them with a truely free license they are nice to use (At least as long as you're no using some C++ meta programming libraries that need definitions in quite a different form. Kvasir comes to mind).

Some of the complexity is also because the ARM ecosystem is a multi vendor ecosystem with actual choice. So much less is hidden in the toolchain, because that is shared among different vendors. Actually having to think about linker scripts is one symptom of this. They do exist for desktop development too, but the toolchain ships with one full of deep knowledge of a well standardized platform. So we tend to ignore them and all the raw power that they contain. Or maybe because the documentation and error handling tends to be a bit rough.

This post details the minimum of c++ code and linker scripts to get the LED on the board to blink. This is actually a bit of code golf. You shouldn't really use code from this post. But it's a good start to have something very minimal to explain and later expand upon.

When starting to bring up a microcontroller the first stop of course is the documentation. In this case consisting of a datasheet(DS5319), a (hardware) reference manual (RM0008), a CPU architecture manual (PM0056) and maybe an errata sheet (learn look at them early to avoid nasty surprises).

Highlights from the datasheet for now are:

  • ARM Cortex-M3 CPU, that's what we need to bring up to execute our program.
  • Boots with internal 8 MHz RC oscillator selected as clock
  • 3.3V with 5V tolerance on many pins, we might need to attach an LED and power to it.
  • A block diagram, always handy to have for an quick overview
  • The memory map, needed for the linker script
  • It has a boot loader mode, so we can get software into the flash without JTAG or similar

As i am using a ready made board with the microcontroller i can ignore most of the pinout for now. Of course i need to find out which pin and thus which GPIO (general purpose input/output) the LED i want to controll is attached to. The LED is helpfully labeled PC13 which ends up as the port with index 13 (so zero based counting) of GPIO port C.

All the other details and charateristics are not needed for now. Great 100+ pages i don't need to keep in my head for now.

Ok, now what actually happens when this part is powered on? Well actually, no. In what state is it when is starts to execute code and which code will it start to execute? The datasheet told me that there is an integrated bootloader, so first i need to know how to enable and disable that. Section 3.4 "Boot configuration" of the Reference Manual has all i need. There are 2 boot pins (called BOOT0 and BOOT1, but BOOT1 is shared with a GPIO on the chip i use) that are sampled after reset. For normal boot from embedded flash BOOT0 needs to be pulled to low. A the datasheet tells me that BOOT0 is pin 35 and a good light and sharp eyes trace that pin to the upper jumper on the board i use (which came completly without documentation).

With the rom bootloader out of the way i can focus on the environment my early bootup code has to run. ARM Cortex Mx is a very C/C++ friendly architecture so my goal is to boot directly using C++ code. While this is certainly not 100% portable C++ and might even not be actually guaranteed to work with gcc's g++, in pratice this works well.

The boot configuration section also has some information about the initial state of the cpu. But it uses a bit confusing language. So let's refer to the more general documentation. Section 4 of the datasheet contains the memory map in great detail. For now look at the big picture:

  • SRAM goes from 0x2000 0000 up
  • Flash memory goes from 0x0800 0000 up to 0x0801 FFFF
  • Memory from 0x0000 0000 up is aliased depending on the boot pins

ARM memory maps are quite sparse. Bootup state of the cpu is described in the STM Cortex-M3 programming manual. Sections 2.1.1, 2.1.2 and 2.1.3 detail the basic cpu registers and their reset state. The most relevant part is that the initial value of the stack pointer is read from the 32 bit word at 0x00000000 and the initial value of the program counter is read from 0x0000 0004. This is actually the start of the vector table (2.3.4), but for now the other vectors can be ignored. Also it states that the processor starts in privileged(2.3.2) thread mode using the main stack. By the way, the arm stack grows towards lower addresses.

Interrupts are disabled on start. Some arm cores have a memory protection unit (MPU). Section 4.2.6 shows that the reset value of its enable register is off.

So back to the Reference Manual, using BOOT0 pulled low, the addresses starting from 0x0000 0000 are setup to alias 0x0800 0000, that is the main flash area. So that's enough to start writing some code. And some linker configuration. So lets start with the linker script. As i said above the gnu arm toolchain doesn't have special knowledge about specific microcontrollers.


This tells ld that it's going to process arm object files using the elf format as general object file format. The microcontroller of course doesn't use elf, but the elf bits will get stripped off as the very last build step.

    FLASH : ORIGIN = 0x08000000, LENGTH = 64K

The MEMORY command declares memory regions that the linker can use for allocating specific usages. For now we only need one region for the flash part of the memory map. FLASH is just an identifier to refer to this region later. ORIGIN and LENGTH specify the regions location in the address space.

    .vectors : {
    } > FLASH

    .text : {
    } > FLASH


ELF uses named sections for various parts of code and data. For now the SECTIONS command just instructs ld to put the .vectors section at the start of the flash filling it with the contents of the .vectors sections of the input files. The second part does similarly with sections whose name starts with .text, adding them just after the vectors section.

As the flash memory gets aliased to 0x0000 0000 on boot, the vectors section will be readable from 0x0000 0000 and 0x0000 0004 to setup the stack and program counter registers to start the actual program.

On to the C++ code:

void mainFn() {
    // code to follow later

extern void (* const vectors[])() __attribute__ ((section(".vectors"))) = {
                (void (*)())0x20000400,

The first part is simple for now. mainFn will be the function where execution starts. But for that to actually happen the vectors table needs to be setup. That's what the second part does. __attribute__ ((section(".vectors"))) instructs g++ to emit this initialized array into the section named .vectors to be picked up later by the linker and placed at the very start of the flash section. As C++ is typed and most of the vector table is later filled with pointers to functions i've choosen to use void (*)() as the basic type of the array. I’ve also made this const so that matches at c++ level with the final placement of the section in flash. As const implies internal linkage in C++ the extern also is needed for g++ to actually emit this at C++ level unused data.

Index 0 is the initial value of the stack register. 0x2000 0400 is 1kbyte from the start of the SRAM in the memory map. As stack grows towards lower addresses if the program overflows the stack it will fault, which should be more useful while debugging then just silently corrupting whatever memory happens to come below the stack. 1kbyte should be ok for experimenting, but likely it's much more than needed most programs. Later this will be supplied from the linker. The second value is the address of the function to run on reset, also known as reset handler.

The goal of this simple program is to blink the LED attached for GPIO C13. So on to setting up GPIO port C. Section 9 of the reference manual covers the GPIOs. But this block starts in disabled (i.e. unclocked) state on boot. So Section 7.3 the register part of "reset and clock control" is our first stop. GPIO port C is controlled by the IOP C EN bit in the APB2 EN R register of the RCC. As most peripherial registers are memory mapped in arm this will be a read–modify–write on a memory address.

Mapping all the needed registers oneself doesn't help much in understanding or fine control of the system. So i'll use the vendor provided definition from STM32CubeF1 (i used version 1.2) which are available in BSD (3 clause) licensed form in the directories STM32Cube_FW_F1_V1.2.0/Drivers/CMSIS/Include and STM32Cube_FW_F1_V1.2.0/Drivers/CMSIS/Device/ST/STM32F1xx/Include But for the program in this post i extracted the relevent code to exlpain the general setup of the hardware mapping. The parts needed for enabling the GPIO Port are:

#define     __IO    volatile

typedef struct
  __IO uint32_t CR;
  __IO uint32_t CFGR;
  __IO uint32_t CIR;
  __IO uint32_t APB2RSTR;
  __IO uint32_t APB1RSTR;
  __IO uint32_t AHBENR;
  __IO uint32_t APB2ENR;
  __IO uint32_t APB1ENR;
  __IO uint32_t BDCR;
  __IO uint32_t CSR;

} RCC_TypeDef;

#define PERIPH_BASE           ((uint32_t)0x40000000)
#define AHBPERIPH_BASE        (PERIPH_BASE + 0x20000)

#define RCC_BASE              (AHBPERIPH_BASE + 0x1000)

#define RCC                   ((RCC_TypeDef *) RCC_BASE)

#define  RCC_APB2ENR_IOPCEN   ((uint32_t)0x00000010)

So access to memory mapped registers used volatile access to prevent to compiler from any kind of reordering of these accesses. As is usually done in ARM register definitions all registers of one component are gathered into one struct and a macro is definied that ultimatly ends up as a typecast of it’s base address to an pointer of this type. Additionally we get a macro that maps bits in the register to values. So the following code in the mainFn now enables the gpio port component:


Back to the gpio configuration. Most gpios start up tristated (that is in high impedance mode). This gpio block has 4 bits of configuration for each pin. Thus configuration is split into 2 32-bit registers. The high part of the configuration register contains configuration of output 13. For now i picked any output mode and set it to push-pull mode. Resulting in 0b0011 as configuration. For the rest of the pins floating input mode is selected which is also the reset state. But this way it's a simple set instead of a read–modify–write and the state is more explicitly visible. As the LED's cathod is connected (via a resistor of course) to the GPIO, setting the state of the GPIO to low will activate the LED. One way to set on pin to low is to write to the port bit reset register with the bit corresponding to the pin index set to one. The code (omitting the definition of GPIOC which is similar to how RCC is defined):

    GPIOC->CRH = 0b0100'0100'0011'0100'0100'0100'0100'0100;
    GPIOC->BRR = 1 << 13;

On reset the second line is not strictly needed, because the reset state already has all output data bits of the GPIO as 0. But here i opted to be explicit.

Ok, on to blink the LED. I want 1 Hz and 50% duty cycle. So next we need a 500ms delay. A simple way, when there are no interrupts to add unpredictable additional delay, is just to use a waiting loop. To get the timing right we need to know the exact generated code and how much cycles are taken by each machine instruction. And we need to tell g++ not to optimize away the loop, because a wait loop looks just useless to it.

    int ctr = 1000;
    while (ctr) {

Apart from the start value for ctr this simple loop is a good delay loop. The asm(""); doesn't emit any code, but g++'s optimizer doesn't considers this asm statement to be removable by optimization, so it’s a easy way to disable code removal. To calculate the right value for ctr we need to look at the generated assembly. I'm using objdump --disassemble here:

  1e:   f44f 737a       mov.w   r3, #1000       ; 0x3e8
  22:   3b01            subs    r3, #1
  24:   d1fd            bne.n   22 <_Z6mainFnv+0x22>

That’s the whole loop, extracted from the 20·ish lines of output from objdump. For most who have read any kind of assembly this looks rather expected. An immediate load of the start value (not part of the loop proper), an substraction of the loop variable and an conditional jump back to the look start. One thing to watch out for when looking at ARM assembly is that the instruction mnemonics used by ARM changed over time(now uses Unified Assembler Language (UAL)), so depending on the tools and documentation it sometimes happens that they mismatch. The subs instruction takes 1 cycle and the bne.n takes 2-4 cycles (e.g. ARM Cortex‑M3 Processor Technical Reference Manual 3.3.1). In simple loops the branch is sufficently easy that in my experience it mostly takes 2 cycles. This all assumes that g++ doesn't start to generated different code. So with the cpu still running on the internal 8MHz clock we get:

        int ctr;
        ctr = (8000000 / 3) / 2;
        // each loop iteration takes 3 cycles to execute.
        while (ctr) {
            asm ("");

Next step is setting the GPIO to high to disable the LED. The GPIO has a register that allows both setting and resetting individual bits that can be used just like BRR to set the bits: BSRR.

So the final blinking code looks like this:

#include "minsys.h"

void mainFn() {
    GPIOC->CRH = 0b0011'0000'0000'0000'0000'0000;
    GPIOC->BRR = 1 << 13;

    while (1) {
        int ctr;
        ctr = (8000000 / 3) / 2;
        // each loop iteration takes 3 cycles to execute.
        while (ctr) {
            asm ("");
        GPIOC->BRR = 1 << 13;
        ctr = (8000000 / 3) / 2;
        // each loop iteration takes 3 cycles to execute.
        while (ctr) {
            asm ("");
        GPIOC->BSRR = 1 << 13;

extern void (* const vectors[])() __attribute__ ((section(".vectors"))) = {
                (void (*)())0x20000400,

minsys.h here is the minimal definitions extracted from the mbed board headers.

The last step is to actually compile this and upload it to the board to test:

arm-none-eabi-g++ -c -mcpu=cortex-m3 -mthumb --std=c++14 -O2 -fno-rtti -fno-exceptions main.cpp -o main.o

Here -mcpu=cortex-m3 -mthumb tells g++ what cpu the code should be generated for. While -mthumb looks redundant for an cpu that only supports thumb code, g++ requires it nevertheless. I use c++14 mode because it’s constexpr support comes in handy even with embedded development. -fno-rtti and -fno-exceptions disable runtime type information and the complete exception infrastructure in the generated code. This is important because both need runtime support code and that infrastructure is rather large and thus hard (or impossible) to fit into small microcontrollers.

arm-none-eabi-g++ -mcpu=cortex-m3 -mthumb -Tlinkerscript.ld -nostartfiles main.o -o main.elf

Next step is to link the code. -Tlinkerscript.ld specifies the linker script that should be used for linking, replacing ld’s default linker script. In this case i saved linker script in the file linkerscript.ld. -nostartfiles disables the usage of any toolchain provided object files for the entrypoint and start.

The linker generates a fully linked output that still contains elf metainformation like section names and various other bookkeeping data.

arm-none-eabi-objcopy -O binary main.elf main.bin

This metadata is stripped away by objcopy when copying to a binary (or ihex) format. objcopy in this mode produces a file that starts with the data of the loaded section with the lowest load address and pads space between sections with zero bytes. In this example there are no gaps between the sections.

A hexdump of the result still fits nicely in a few lines:

000000 00 04 00 20 09 00 00 08 0b 4b 70 b4 4f f4 00 54
000010 19 46 22 46 09 4d 4f f4 40 16 a8 69 40 f0 10 00
000020 a8 61 5e 60 5c 61 06 4b 01 3b fd d1 04 4b 4a 61
000030 01 3b fd d1 0a 61 f6 e7 00 10 01 40 00 10 02 40
000040 55 58 14 00

Well not that we are going to analyse this. But we see that this generated 68 bytes of binary.

The binary can now be uploaded to flash. I use stm32flash which is just an apt install stm32flash away with debian. For this we need to set the board to bootloader mode, so pull boot0 up to 3.3V and push reset. Then stm32flash should be able to detect the controller using an 3.3V or 5V logic level serial interface.

$ stm32flash /dev/ttyUSB1
stm32flash 0.4


Interface serial_posix: 57600 8E1
Version      : 0x22
Option 1     : 0x00
Option 2     : 0x00
Device ID    : 0x0410 (Medium-density)
- RAM        : 20KiB  (512b reserved by bootloader)
- Flash      : 128KiB (sector size: 4x1024)
- Option RAM : 16b
- System RAM : 2KiB

To flash and run use:

stm32flash -w main.bin /dev/ttyUSB1
stm32flash -g0 /dev/ttyUSB1

Now the board should blink it’s LED.

For full code see here

more in this series:
2. STM32 from scratch, the hardware
3. STM32 from scratch, serial
4. Makefile for embedded projects
5. Enabling C/C++ features
6. Heap, malloc and new
7. STM32 PLL and Crystal Oscillator
8. Interrupts
« STM32 from scratch, the hardware