on 2016-02-20 in stm32-from-scratch
For some future projects i need to bring up a small microcontroller board containing a STM32F103C8T6. But most ARM Cortex-Mx devices are similar for how far this series is going to reach.
Because i like to really know what is running in my microcontroller projects i decided to bring up this device from scratch and document what i needed to do. Where from scratch means i'm allowed to use the basic gnu toolchain (make, g++, ld, as, objcopy), a flash program, the vendor documentation and the register definition headers from STM.
ARM microcontrollers are often said to be complicated and beyond the reach of beginners. I tend to disagree. I started my embedded development with ARM. And i found my trusted gnu toolchain well supported. But i also found big vendor libraries full of cross compiler compatibility and wrappers around simple register manipulation. While these libraries can be a great help if you just want things to be running, they tend to stand in the way of real unterstanding. So here i'm not going to use those libraries.
I made an exception for the hardware register/memory mapped io definitions. Transcibing them from the documentation to C++ code really is just error prone and not very enlightening. For this post i extracted just what is needed from those headers to show how these work. But generelly if the vendor releases them with a truely free license they are nice to use (At least as long as you're no using some C++ meta programming libraries that need definitions in quite a different form. Kvasir comes to mind).
Some of the complexity is also because the ARM ecosystem is a multi vendor ecosystem with actual choice. So much less is hidden in the toolchain, because that is shared among different vendors. Actually having to think about linker scripts is one symptom of this. They do exist for desktop development too, but the toolchain ships with one full of deep knowledge of a well standardized platform. So we tend to ignore them and all the raw power that they contain. Or maybe because the documentation and error handling tends to be a bit rough.
This post details the minimum of c++ code and linker scripts to get the LED on the board to blink. This is actually a bit of code golf. You shouldn't really use code from this post. But it's a good start to have something very minimal to explain and later expand upon.
When starting to bring up a microcontroller the first stop of course is the documentation. In this case consisting of a datasheet(DS5319), a (hardware) reference manual (RM0008), a CPU architecture manual (PM0056) and maybe an errata sheet (learn look at them early to avoid nasty surprises).
Highlights from the datasheet for now are:
As i am using a ready made board with the microcontroller i can ignore most of the pinout for now. Of course i need to find out which pin and thus which
GPIO (general purpose input/output) the LED i want to controll is attached to. The LED is helpfully labeled PC13
which ends up as the port with
index 13 (so zero based counting) of GPIO port C.
All the other details and charateristics are not needed for now. Great 100+ pages i don't need to keep in my head for now.
Ok, now what actually happens when this part is powered on? Well actually, no. In what state is it when is starts to execute code and which code
will it start to execute? The datasheet told me that there is an integrated bootloader, so first i need to know how to enable and disable that.
Section 3.4 "Boot configuration" of the Reference Manual has all i need. There are 2 boot pins (called BOOT0
and BOOT1
, but BOOT1
is shared
with a GPIO on the chip i use) that are sampled after reset. For normal boot from embedded flash BOOT0
needs to be pulled to low.
A the datasheet tells me that BOOT0
is pin 35 and a good light and sharp eyes trace that pin to the upper jumper on the board i use (which came completly
without documentation).
With the rom bootloader out of the way i can focus on the environment my early bootup code has to run. ARM Cortex Mx is a very C/C++ friendly architecture so my goal is to boot directly using C++ code. While this is certainly not 100% portable C++ and might even not be actually guaranteed to work with gcc's g++, in pratice this works well.
The boot configuration section also has some information about the initial state of the cpu. But it uses a bit confusing language. So let's refer to the more general documentation. Section 4 of the datasheet contains the memory map in great detail. For now look at the big picture:
ARM memory maps are quite sparse. Bootup state of the cpu is described in the STM Cortex-M3 programming manual. Sections 2.1.1, 2.1.2 and 2.1.3 detail the basic cpu registers and their reset state. The most relevant part is that the initial value of the stack pointer is read from the 32 bit word at 0x00000000 and the initial value of the program counter is read from 0x0000 0004. This is actually the start of the vector table (2.3.4), but for now the other vectors can be ignored. Also it states that the processor starts in privileged(2.3.2) thread mode using the main stack. By the way, the arm stack grows towards lower addresses.
Interrupts are disabled on start. Some arm cores have a memory protection unit (MPU). Section 4.2.6 shows that the reset value of its enable register is off.
So back to the Reference Manual, using BOOT0 pulled low, the addresses starting from 0x0000 0000 are setup to alias 0x0800 0000, that is the main flash area. So that's enough to start writing some code. And some linker configuration. So lets start with the linker script. As i said above the gnu arm toolchain doesn't have special knowledge about specific microcontrollers.
OUTPUT_FORMAT(elf32-littlearm)
OUTPUT_ARCH(arm)
This tells ld that it's going to process arm object files using the elf format as general object file format. The microcontroller of course doesn't use elf, but the elf bits will get stripped off as the very last build step.
MEMORY {
FLASH : ORIGIN = 0x08000000, LENGTH = 64K
}
The MEMORY command declares memory regions that the linker can use for allocating specific usages. For now we only need one region for the flash part
of the memory map. FLASH
is just an identifier to refer to this region later. ORIGIN
and LENGTH
specify the regions location in the address space.
SECTIONS {
.vectors : {
*(.vectors)
} > FLASH
.text : {
*(.text*)
} > FLASH
}
ELF uses named sections for various parts of code and data. For now the SECTIONS
command just instructs ld to put the .vectors
section at the
start of the flash filling it with the contents of the .vectors
sections of the input files. The second part does similarly with sections whose name
starts with .text
, adding them just after the vectors section.
As the flash memory gets aliased to 0x0000 0000 on boot, the vectors section will be readable from 0x0000 0000 and 0x0000 0004 to setup the stack and program counter registers to start the actual program.
On to the C++ code:
void mainFn() {
// code to follow later
}
extern void (* const vectors[])() __attribute__ ((section(".vectors"))) = {
(void (*)())0x20000400,
mainFn,
};
The first part is simple for now. mainFn
will be the function where execution starts. But for that to actually happen the vectors table needs
to be setup. That's what the second part does. __attribute__ ((section(".vectors")))
instructs g++ to emit this initialized array into the
section named .vectors
to be picked up later by the linker and placed at the very start of the flash section. As C++ is typed and most of the
vector table is later filled with pointers to functions i've choosen to use void (*)()
as the basic type of the array. I’ve also made this const
so that matches at c++ level with the final placement of the section in flash. As const
implies internal linkage in C++ the extern also is needed
for g++ to actually emit this at C++ level unused data.
Index 0 is the initial value of the stack register. 0x2000 0400 is 1kbyte from the start of the SRAM in the memory map. As stack grows towards lower addresses if the program overflows the stack it will fault, which should be more useful while debugging then just silently corrupting whatever memory happens to come below the stack. 1kbyte should be ok for experimenting, but likely it's much more than needed most programs. Later this will be supplied from the linker. The second value is the address of the function to run on reset, also known as reset handler.
The goal of this simple program is to blink the LED attached for GPIO C13. So on to setting up GPIO port C. Section 9 of the reference manual covers the GPIOs. But this block starts in disabled (i.e. unclocked) state on boot. So Section 7.3 the register part of "reset and clock control" is our first stop. GPIO port C is controlled by the IOP C EN bit in the APB2 EN R register of the RCC. As most peripherial registers are memory mapped in arm this will be a read–modify–write on a memory address.
Mapping all the needed registers oneself doesn't help much in understanding or fine control of the system. So i'll use the vendor provided definition from STM32CubeF1 (i used version 1.2) which are available in BSD (3 clause) licensed form in the directories STM32Cube_FW_F1_V1.2.0/Drivers/CMSIS/Include and STM32Cube_FW_F1_V1.2.0/Drivers/CMSIS/Device/ST/STM32F1xx/Include But for the program in this post i extracted the relevent code to exlpain the general setup of the hardware mapping. The parts needed for enabling the GPIO Port are:
#define __IO volatile
typedef struct
{
__IO uint32_t CR;
__IO uint32_t CFGR;
__IO uint32_t CIR;
__IO uint32_t APB2RSTR;
__IO uint32_t APB1RSTR;
__IO uint32_t AHBENR;
__IO uint32_t APB2ENR;
__IO uint32_t APB1ENR;
__IO uint32_t BDCR;
__IO uint32_t CSR;
} RCC_TypeDef;
#define PERIPH_BASE ((uint32_t)0x40000000)
#define AHBPERIPH_BASE (PERIPH_BASE + 0x20000)
#define RCC_BASE (AHBPERIPH_BASE + 0x1000)
#define RCC ((RCC_TypeDef *) RCC_BASE)
#define RCC_APB2ENR_IOPCEN ((uint32_t)0x00000010)
So access to memory mapped registers used volatile
access to prevent to compiler from any kind of reordering of these accesses. As is usually done
in ARM register definitions all registers of one component are gathered into one struct and a macro is definied that ultimatly ends up as a typecast
of it’s base address to an pointer of this type. Additionally we get a macro that maps bits in the register to values. So the following code in the
mainFn
now enables the gpio port component:
RCC->APB2ENR |= RCC_APB2ENR_IOPCEN;
Back to the gpio configuration. Most gpios start up tristated (that is in high impedance mode). This gpio block has 4 bits of configuration for each pin. Thus configuration is split into 2 32-bit registers. The high part of the configuration register contains configuration of output 13. For now i picked any output mode and set it to push-pull mode. Resulting in 0b0011 as configuration. For the rest of the pins floating input mode is selected which is also the reset state. But this way it's a simple set instead of a read–modify–write and the state is more explicitly visible. As the LED's cathod is connected (via a resistor of course) to the GPIO, setting the state of the GPIO to low will activate the LED. One way to set on pin to low is to write to the port bit reset register with the bit corresponding to the pin index set to one. The code (omitting the definition of GPIOC which is similar to how RCC is defined):
GPIOC->CRH = 0b0100'0100'0011'0100'0100'0100'0100'0100;
GPIOC->BRR = 1 << 13;
On reset the second line is not strictly needed, because the reset state already has all output data bits of the GPIO as 0. But here i opted to be explicit.
Ok, on to blink the LED. I want 1 Hz and 50% duty cycle. So next we need a 500ms delay. A simple way, when there are no interrupts to add unpredictable additional delay, is just to use a waiting loop. To get the timing right we need to know the exact generated code and how much cycles are taken by each machine instruction. And we need to tell g++ not to optimize away the loop, because a wait loop looks just useless to it.
int ctr = 1000;
while (ctr) {
asm("");
--ctr;
}
Apart from the start value for ctr
this simple loop is a good delay loop. The asm("");
doesn't emit any code, but g++'s optimizer doesn't considers this asm
statement to be removable by optimization, so it’s a easy way to disable code removal. To calculate the right value for ctr
we need to look at the generated assembly. I'm using objdump --disassemble
here:
1e: f44f 737a mov.w r3, #1000 ; 0x3e8
22: 3b01 subs r3, #1
24: d1fd bne.n 22 <_Z6mainFnv+0x22>
That’s the whole loop, extracted from the 20·ish lines of output from objdump. For most who have read any kind of assembly this looks rather expected. An immediate load of the start value (not part of the loop proper), an substraction of the loop variable and an conditional jump back to the look start. One thing to watch out for when looking at ARM assembly is that the instruction mnemonics used by ARM changed over time(now uses Unified Assembler Language (UAL)), so depending on the tools and documentation it sometimes happens that they mismatch. The subs
instruction takes 1 cycle and the bne.n
takes 2-4 cycles (e.g. ARM Cortex‑M3 Processor Technical Reference Manual 3.3.1). In simple loops the branch is sufficently easy that in my experience it mostly takes 2 cycles. This all assumes that g++ doesn't start to generated different code. So with the cpu still running on the internal 8MHz clock we get:
int ctr;
ctr = (8000000 / 3) / 2;
// each loop iteration takes 3 cycles to execute.
while (ctr) {
asm ("");
--ctr;
}
Next step is setting the GPIO to high to disable the LED. The GPIO has a register that allows both setting and resetting individual bits that can be used just like BRR to set the bits: BSRR.
So the final blinking code looks like this:
#include "minsys.h"
void mainFn() {
RCC->APB2ENR |= RCC_APB2ENR_IOPCEN;
GPIOC->CRH = 0b0011'0000'0000'0000'0000'0000;
GPIOC->BRR = 1 << 13;
while (1) {
int ctr;
ctr = (8000000 / 3) / 2;
// each loop iteration takes 3 cycles to execute.
while (ctr) {
asm ("");
--ctr;
}
GPIOC->BRR = 1 << 13;
ctr = (8000000 / 3) / 2;
// each loop iteration takes 3 cycles to execute.
while (ctr) {
asm ("");
--ctr;
}
GPIOC->BSRR = 1 << 13;
}
}
extern void (* const vectors[])() __attribute__ ((section(".vectors"))) = {
(void (*)())0x20000400,
mainFn,
};
minsys.h
here is the minimal definitions extracted from the mbed board headers.
The last step is to actually compile this and upload it to the board to test:
arm-none-eabi-g++ -c -mcpu=cortex-m3 -mthumb --std=c++14 -O2 -fno-rtti -fno-exceptions main.cpp -o main.o
Here -mcpu=cortex-m3 -mthumb
tells g++ what cpu the code should be generated for. While -mthumb
looks redundant for an cpu that only supports thumb code, g++ requires it nevertheless. I use c++14 mode because it’s constexpr support comes in handy even with embedded development. -fno-rtti
and -fno-exceptions
disable runtime type information and the complete exception infrastructure in the generated code. This is important because both need runtime support code and that infrastructure is rather large and thus hard (or impossible) to fit into small microcontrollers.
arm-none-eabi-g++ -mcpu=cortex-m3 -mthumb -Tlinkerscript.ld -nostartfiles main.o -o main.elf
Next step is to link the code. -Tlinkerscript.ld
specifies the linker script that should be used for linking, replacing ld’s default linker script. In this case i saved linker script in the file linkerscript.ld
. -nostartfiles
disables the usage of any toolchain provided object files for the entrypoint and start.
The linker generates a fully linked output that still contains elf metainformation like section names and various other bookkeeping data.
arm-none-eabi-objcopy -O binary main.elf main.bin
This metadata is stripped away by objcopy
when copying to a binary (or ihex) format. objcopy
in this mode produces a file that starts with the data of the loaded section with the lowest load address and pads space between sections with zero bytes. In this example there are no gaps between the sections.
A hexdump of the result still fits nicely in a few lines:
000000 00 04 00 20 09 00 00 08 0b 4b 70 b4 4f f4 00 54
000010 19 46 22 46 09 4d 4f f4 40 16 a8 69 40 f0 10 00
000020 a8 61 5e 60 5c 61 06 4b 01 3b fd d1 04 4b 4a 61
000030 01 3b fd d1 0a 61 f6 e7 00 10 01 40 00 10 02 40
000040 55 58 14 00
Well not that we are going to analyse this. But we see that this generated 68 bytes of binary.
The binary can now be uploaded to flash. I use stm32flash which is just an apt install stm32flash
away with debian. For this we need to set the board to bootloader mode, so pull boot0 up to 3.3V and push reset. Then stm32flash should be able to detect the controller using an 3.3V or 5V logic level serial interface.
$ stm32flash /dev/ttyUSB1
stm32flash 0.4
http://stm32flash.googlecode.com/
Interface serial_posix: 57600 8E1
Version : 0x22
Option 1 : 0x00
Option 2 : 0x00
Device ID : 0x0410 (Medium-density)
- RAM : 20KiB (512b reserved by bootloader)
- Flash : 128KiB (sector size: 4x1024)
- Option RAM : 16b
- System RAM : 2KiB
To flash and run use:
stm32flash -w main.bin /dev/ttyUSB1
stm32flash -g0 /dev/ttyUSB1
Now the board should blink it’s LED.
For full code see here