korena's blog

2. First signs of life - Part 1

The boot up process of an ARM processor is almost an exhausted topic, however, a hobbyist would usually stumble upon extensive theoretical explanations, where she would be so excited about the knowledge she gains, that she overlooks the missing links the theoretical explanation assumed she would know, then find herself in a situation where she doesn't quite know what to do. Now, I'm not saying this post will present all there is to know about the topic, after all, you need to have some background on the matter, but I will try my best to be as through as possible, if you feel the need to, contact me and I will try my best to answer your questions.

I generally log my work in the form of questions and answers, and most of the posts in this series are refurbished old logs, so, If you'r planning on follow this series, you may want to get used to it.

Required resources

The resources you need to start development on a target board include:

  1. Datasheet and user Manual of the target (if developing on a platform that's wired by your company, someone across the cubical hall must have these!) .

  2. The technical reference manual and architecture overview document of the core.

  3. SoC documentation (very important!)

  4. thorough documentation on the software you are about to implement.

You might want to spend some considerable amount of time collecting the relevant documents, it is extremely counter productive to get stuck in the middle of your work because a requisite document is not immediately available when you need it. Again, proper documentation should be a major criterion in your hunt for a development platform.

The boot up process

  • What does an ARM processor do when it wakes up from a cold reset?

Well, to you, the user, it starts executing instructions from a predefined memory location, i.e 0xFFFF0000 .

  • That looks like a memory address, shouldn't the memory controller be initialized for the processor to have any concept of memory outside it's registers?

True that, in fact, a lot more than memory controller initialization takes place before the processor even looks at the code you pass to it, such as clock and cache initialization we'll look into initialization in a future post.

A blank SoC from your perspective as a user is not really blank, since there's a piece of code (firmware) that is immutable residing in the internal ROM of the chip you are using, this iROM is generally on-chip, but outside the ARM core.
This piece of code is referred to as BL0, or boot code, or (confusingly enough!) startup code or simply the iROM code. The vendor of your SoC (i.e Samsung) decides the capacity of this piece of code, in a typical ARM based processor, it performs the following:

  1. Disable watchdog (so it wouldn't fire off during the process of booting, its enabled again later)
  2. Initialize Instruction cache (Rough Post on Cache).
  3. Initializing critical peripherals such as internal SRAM memory controller. Initialize stack and heap (preliminary)
  4. Set various clock and PLL configurations
  5. check the operation mode pin (OM) to decide which booting device is to be asked for the instructions to be executed after BL0 is done executing,. Examples of such booting devices include NAND(1), NOR, emmc, SD card, Uart, USB. The OM bits typically reside in some control register inside the ARM core, we will see about them in a future post ...
  6. check the integrity of the BL1 code (CHECKSUM), and possibly some security boot or trustzone related conditions, this is configurable by the SoC manufacturer, The TrustZone technology documentation talks about security for users, and some pretty cool stuff, but some Evil consumer electronics corporations can prevent you -the OWNER of the hardware you buy- from altering the software stack on the device using it, so you cant run an OS of your choice! How screwed up is that?! (TrustZone)
  7. now that the iROM code (BL0) knows which booting device holds the code that is to be executed next, it proceeds to initialize the controller of that particular selected device, and loads that piece of code from the OM-selected device to the internal SRAM of the chip, and jumps to execute the loaded code's first instruction.

This is a general overview of what a masked firmware (unchangeable BL0) could do for you, some are more capable than others, you may want to check your processor's documentation for the tasks performed by this code, unfortunately, the info one might find are often vague and unclear , I will present the way I figured out how things are done in the Samsung S5PV210 in the board that I own, that should hopefully give you a good idea on how to figure it out, I must admit though, the documentation Samsung produced for the S5PV210 iROM boot code is unusually good, you may not have access to such documentation for your particular chip.
I should add that, if you have access to a JTAG link, you can actually observe the behavior of the system and the various register values at an early booting stage and figure out whats happening exactly before the processor is aware of user code.

Now this code, the one loaded in the last step above, is the code you generally are aware of, in systems that use bootloaders, this is called the first stage bootloader, or BL1. if you're a bare-metal kind of person, this is what you would call a Startup code.

S5PV210 IROM code and boot-up process

Moving on to our Dashboard project, I got a hold of the Application Note (Internal ROM Booting) S5PV210 RISC Microprocessor NOV 23, 2009 Preliminary REV 0.3 document, which details the process of booting wonderfully, one thing I am still confused about though, is the fact that this document is marked Samsung Confidential! I am not sure where I got it from, but I recon it could not possibly be confidential, for it is impossible to get the processor to boot without having access to the information presented in this document, I did a Google search and it turns out that many Chinese (maybe Korean!) blogs actually have it attached, perhaps that was my source, but I really do not remember where I got it from.
Now, this document spells out the booting process of this processor in a rather detailed manner. As one would expect, its a big document, and you cant possibly go reading every word of the documentation provided by your SoC vendor, so you have to know exactly what you're looking for.

In this project's case, I needed to know the following:

  1. What exactly does the iROM code do?

  2. What booting devices are available, and what does the iROM expect from me, in order to smoothly run the code I intend to write?

  3. Since I will be dealing with the internal SRAM of the SoC in the first phase of the boot up process (because BL1/startup code is eventually copied to it by BL0) , what does the memory map of the internal SRAM look like?

  4. If I get into trouble in the booting process, what is the error handling procedure implemented in the BL0 code? And what error codes will I get if I do something wrong?

Now lets get into the implications of each question. The first question was asked because I am a pretty lazy programmer, I would not want to redo stuff that I know the ARM core needs in order to run, if these 'stuff' were already done for me in the BL0 stage, like watchdog disabling, cache initialization etc ... In addition to that, if you are a micro optimization freak, you wouldn't want to lose those precious microseconds unnecessarily.

And so I went on to the section, intuitively titled: Operating Sequence

in this section, to my delight, I found the following figure:

V210 boot up process

BL1 / BL2 : It can be variable size copied from boot device to internal SRAM area.

BL1 max. size is 16KB. BL2 max. size is 80KB.

① iROM can do initial boot up : initialize system clock, device specific controller and booting device.

② iROM boot codes can load boot-loader to SRAM. The boot-loader is called BL1.
then iROM verify integrity of BL1 in case of secure boot mode.

③ BL1 will be executed: BL1 will load remained boot loader which is called BL2 on the SRAM
then BL1 verify integrity of BL2 in case of secure boot mode.

④ BL2 will be executed : BL2 initialize DRAM controller then load OS data to SDRAM.

⑤ Finally, jump to start address of OS. That will make good environment to use system.

No room for speculation was left by the document, in the following section, it informed me that :

2.2 iROM(BL0) boot-up sequence (Refer 2.3 V210 boot-up diagram)
  1. Disable the Watch-Dog Timer
  2. Initialize the instruction cache
  3. Initialize the stack region
  4. Initialize the heap region.
  5. Initialize the Block Device Copy Function.
  6. Initialize the PLL and Set system clock.
  7. Copy the BL1 to the internal SRAM region
  8. Verify the checksum of BL1.
    If checksum fails, iROM will try the second boot up. (SD/MMC channel 2)
  9. Check if it is secure-boot mode or not.
    If the security key value is written in S5PV210, It’s secure-boot mode.
    If it is secure-boot mode, verify the integrity of BL1.
  10. Jump to the start address of BL1

Now that's what I call thorough documentation! So, at this point, I know that my BL1 code needs not do any of the previously done steps, I might have to set the processor to the supervisor mode, but that's about it(3)!

At this point, I figured out what the iROM code does for me. Moving on to the next question, the available booting devices are listed in the boot diagram above, so no further in-document digging is required, having gained this knowledge, I decided to use the MMC card, cause I have it!
UART booting could have been a good choice as well, because it needs no checksum verification (this was stated in the section that describes the UART booting functionality in the same document), but I decided to go with the MMC card choice, because I would like to demonstrate the process of assembling a binary that abides by the rules stated in the documentation of the target system.

For the second part of the second question, I need to figure out what the iROM code expects from me. In the Memory Map diagram (below), you can see that the iROM code:

  • Expects to find the first instruction of BL1 at address 0xD0020010.
  • The maximum size of BL1 code should be 16KB.
  • Expects to find some information attached to the BL1 binary.
  • These bits of information should be 4 words in length.
  • The 4 words are expected to form a 'header' to the BL1 binary, meaning they should come BEFORE the binary, which starts at 0xD0020010.
  • the contents of this header should be:

0x00 : The size of BL1(so that the iROM internal copy function can move BL1 from MMC to SRAM)

0x04: zero word (we don't need to know why)

0x08: precalculated checksum of BL1(for verification by iROM code)

0x0C: zero word (we don't need to know why)

The above Info was collected from various sections in the documentation, it is mentioned multiple times, in great detail throughout the document.

To find an answer for the third question, I looked up the internal memory map section of the document,
you need to pay attention to the fact that this memory map is not the global SoC memory map, it shows you the IROM+SRAM memory regions only. The document presents the following figure under Memory Map section:

Internal memory map (S5PV210)

Moving on to the fourth and last question, I found the error code information I needed in the last section titled: 5.ERROR HANDLING, the section tells me that basically, the iROM code will produce error PWM signals through the XPWMTOUT0, each error source will produce a different duty cycle. I got a bit worried at this point, because I do not have access to a logic analyzer at home, and there isn't a way for me to figure out what error signal I am getting if I ever get one. So I went on to check if the tinySDK has anything attached to the XPWMTOUT0 pin of the processor (by checking the schematic), and found that there's a buzzer that is directly driven by the pin, so I will at least get some squeaking noise if I do something wrong, but I wont really know what went wrong(4).

Building binaries

The first step is to get my environment setup, I will not get into the details of this process, you can probably find hundreds of resources on this topic online.

CrossCompileEnv.sh:

#!/bin/bash
export PATH=/home/korena/Development/Compilers/gcc-arm-none-eabi-4_7-2013q1/bin:$PATH
export ARCH=arm
export CROSS_COMPILE=arm-none-eabi-
 
as='as'
ld='ld'
GNUCROSSCOMPILER='gcc'
ar='ar'
nm='nm'
strip='strip'
objcopy='objcopy'
objdump='objdump'
 
export AS=$CROSS_COMPILE$as
export LD=$CROSS_COMPILE$ld
export AR=$CROSS_COMPILE$ar
export GCC=$CROSS_COMPILE$GNUCROSSCOMPILER
export NM=$CROSS_COMPILE$nm
export STRIP=$CROSS_COMPILE$strip
export OBJCOPY=$CROSS_COMPILE$objcopy
export OBJDUMP=$CROSS_COMPILE$objdump
CrossCompileEnv.sh

You should know that the above can only get you started, at some point, you need to write a Makefile, and move on to a more productive development 'mode', we'll get to that once we pass the bumpy phase of BL1 code loading.

We are now ready to write some code, lets start by creating the simplest BL1 binary ever ... an infinite loop that does absolutely nothing!

ridiculouslySimpleStartup.s:

.text
.code 32
.global _Reset
_Reset:
  B .
ridiculouslySimpleStartup.s

code defined to belong to the text section, set to ARM mode, _Reset label made visible to the linker and a branch to the current address is the first and only instruction after (at) the _Reset label (while (1);).

simple_linker.lds:

OUTPUT_FORMAT("elf32-littlearm")
OUTPUT_ARCH(arm)
ENTRY(_Reset)
SECTIONS
{
 . = 0xd0020010;
 .text : {
 ridiculouslySimpleStartup.o
 *(.text)
 }
 .data : { *(.data) }
 .bss : { *(.bss) }
}

Defined the entry for the code that's being linked to be _Reset, set the current address (before defining any text section particulars) to match the starting address found in the iROM application notes above, defined the text sections inside ridiculouslySimpleStartup.o file to be placed inside the beginning of the linker skript's text section, the data and bss sections were defined for the heck of it, we have no use for them at the moment, if we did, we would have matched their addresses to the internal memory map's stack region we found in the iROM document above.

The two files above are all we need to get a working BL1 binary file to test the boot up process, it might seem too simple, but given the fact that I have no way of verifying the system's error codes, I wanted to eliminate any possible coding errors that could distract me from focusing on the target of the current task, which is to get the processor not to produce any buzzing sound, which is an indication of a problem in the BL1 loading process.

Next, we will compile and link the above into a single binary file, and call it BL1.bin:

korena@korena-solid:~/$ sudo chmod +x CrossCompileEnv.sh
korena@korena-solid:~/$ . ./CrossCompileEnv.sh
korena@korena-solid:~/$ $AS ridiculouslySimpleStartup.s -o ridiculouslySimpleStartup.o
korena@korena-solid:~/$ $LD  -T simple_linker.lds  -o BL1.elf -Map BL1.map ridiculouslySimpleStartup.o
$ $OBJCOPY -O binary BL1.elf BL1.bin

lines (1) and (2) prepare the environment for cross compiling, line (3) assembles the startup assembly code, line (4) links the file into an elf, and finally, line (5) copies the binary information from the BL1.elf into a 'stripped' BL1.bin file, a BL1.map file is also produced, only for reference though.

Bootable MMC card

This is by far the most boring process you'll have to go through if your target board's vendor does not extend Linux users the courtesy of providing tools to carry out trivial tasks like this one! I got lucky to have had access to proper documentation from Samsung, so I know exactly what the SoC expects in the bootable image, what's left is writing the code to produce the expected image.

What the documentation says about it:

MMC card block assignement

So We know that block 0 is not to be touched, and that bl0 will look for BL1 in Block 1 of the bootable image, the format is fat32 (because I said so). In addition to the BL1 header information above, that's all we need to know to make a working image.

Building host tools for BL1 fusing

after compiling BL1.bin, we need to do two things, first, we have to calculate the size of it, one might think that doing something like "ls -l" in the directory of the BL1.bin file will be sufficient, as it will give you the size of the bin file, but you do not want to execute that every time you want to change your BL1 code, you should expect that this code will have many many changes before it rests at it's final state, so the lazy approach of "ls -l" will not do. The second thing we need to do is calculate the checksum of BL1.bin, because BL0 expects it, this information is presented in the iROM application notes, an example of the code you need to write to calculate the checksum is given as:

for(count=0;count< dataLength;count+=1){
    buffer = (*(volatile u8*)(uBlAddr+count));
    checkSum = checkSum + buffer;
}

This code simply loads one byte at a time from the binary, and recursively adds all the bytes and comes up with a number, we'll achieve the same result.

imageMake.c:

#include<stdio.h>
#include<stdint.h>
 
/*=========complimentary defines=========*/
#define REDTEXT  "\x1B[31m"
#define YELLOWTEXT  "\x1B[33m"
 
/*===========Macro defines==============*/
#define max_BL1_length 16384
 
 
#define INPUT  "BL1.bin"
#define OUTPUT  "BL1.boot"
#define BL1_LENGTH (16*1024)
#define DEBUG  1
 
/*===========Prototypes================*/
void checksum_calc(uint32_t *checksum,uint32_t *length);
int makeImage(uint32_t *checksum,uint32_t *length);
void showResults(char *inputFileName,char *outputFileName);
 
 
 
 
int main(int argc,char** argv){
 
  FILE *input = NULL;
  uint32_t checksum = 0;
  uint32_t length = 0;
 
    /*read in BL1.bin,calculate checksum, and get the length while you're at it ...*/
    checksum_calc(&checksum,&length);
  /*check if the file is too big to fit ...*/
  if(length > BL1_LENGTH){
   printf("dude, this thing wont fit! (Size is %d)\n\n",length); 
    return 2;
  }
  /*create the output image*/
  makeImage(&checksum,&length);
   
#ifdef DEBUG
showResults(INPUT,OUTPUT);  
#endif  
  return 0;
}
 
void checksum_calc(uint32_t *checksum,uint32_t *length){
  int guard = 1;
  uint32_t data = 0;
  FILE* file = fopen(INPUT, "rb");
   
  if(file == NULL){
    printf("error opening file\n");
    return;
  }
   
   while (guard = fread(&data, sizeof(uint32_t), 1, file))
 {
  *length += 4;
  *checksum += ((data >> 0) & 0xff);
  *checksum += ((data >> 8) & 0xff);
  *checksum += ((data >> 16) & 0xff);
  *checksum += ((data >> 24) & 0xff); 
 }
  printf("Input File Length: %d Bytes\n", *length);
  printf("Input File Checksum: %d\n", *checksum);
  if (file != NULL) fclose(file);
}
int makeImage(uint32_t *checksum,uint32_t *length){
  FILE *output = NULL;
  FILE *input = NULL;
  int guard = 1;
  uint32_t data = 0;
  uint32_t actualLength = 0;
   
  input=fopen(INPUT, "rb");
  output = fopen(OUTPUT,"wb");
   
  if(output == NULL || input == NULL){
    printf("Error reading file\n\n");
    return 1;
  }
 *length = BL1_LENGTH;
 fwrite(length, sizeof(uint32_t), 1, output);
 fwrite(&data, sizeof(uint32_t), 1, output);
 fwrite(checksum, sizeof(uint32_t), 1, output);
 fwrite(&data, sizeof(uint32_t), 1, output);
 printf("wrote BL1 expected header:\n");
 printf("BL1 length: %d\n",*length);
 printf("Reserved:   %d\n",data);
 printf("BL1 CS:     %d\n",*checksum);
 printf("Reserved:   %d\n",data);
 actualLength = 16;
 while (guard = fread(&data, sizeof(uint32_t), 1, input))
 {
  actualLength += 4;
  fwrite(&data, sizeof(uint32_t), 1, output);
 }
 printf("wrote BL1 data...\n");
  
if (input != NULL) fclose(input);
if (output != NULL) fclose(output);
  
return 0;
}
 
void showResults(char *inputFileName,char *outputFileName){
   
 FILE *input = NULL;
 FILE *output = NULL;
  uint32_t data;
  int guard = 10;
  int lim = 1;
  input = fopen(inputFileName,"rb");
  output = fopen(outputFileName,"rb");
   
  if(input == NULL || output == NULL){
    printf("ERROR reading files from showResults function ...\n");
    return;
  }
   
  printf("showing the first 10 words of each file ...\n");
   
  printf("%s contents:\n",INPUT);
  while(guard && lim){
    lim = fread(&data,sizeof(uint32_t),1,input);
    printf("0x%08X\n",data);
    guard -= 1;
    }
    guard = 10;
    lim = 1;
  printf("%s contents:\n",OUTPUT);
  while(guard && lim){
    lim = fread(&data,sizeof(uint32_t),1,output);
    printf("0x%08X\n",data);
    guard -= 1;
    }
    fclose(input);
    fclose(output);
}
NOTE: If you understand what's happening above, skip the next three paragraphs ...

Here's what's up with imageMake.c, there are three functions, the first, checksum_calc has two arguments that are passed by reference, a pointer to checksum and another to length. The purpose of this function is to calculate these two values. It starts by reading the BL1.bin file, and proceeds to calculating its checksum by reading 4 bytes at a time, and recursively adding their values to the contents of the checksum address. Inside the loop that calculates the checksum, the contents of the length address is incremented by four, because the final length is represented by number of bytes, but the loop reads four bytes at a time. The loop breaks when guard equals zero, which denotes end of file. The function wraps up by closing the file pointer.

The second function is makeImage, which takes the same arguments as checksum_calc, because the values of checksum and length are needed in constructing the final boot image with the header included. A new file named BL1.boot is created. The first thing the function does is write the length of BL1 in the header as expected, note that the length is adjusted to the full maximum length of BL1, which is 16KB, this is to avoid a miss match between the actual length and the length expected in the BL1 header. The iROM document states that BL1 can have a variable length, but should not exceed 16KB, but after spending some time trying to fit in the actual length of the binary into the header, and failing at least 6 times, I decided to just cheat BL0 by passing it 16KB as the length of the binary,I believe the length entry in the header is used by the function that does the copying from the MMC card to the internal SRAM, the downside is that BL0 will spend extra time loading none existing data to fully cover the 16KB it sees in the header, and possibly waste some more time calculating the checksum, by adding zero values that will not result in a change to the checksum value, but I am ok with it, mainly because this is just the beginning, and we will have more contents in BL1 for low level initialization later, and the number of zero bytes added will not be as big as it is now.
makeImage proceeds to insert a word of zeros as expected in the header, the checksum follows, and then another word of zeros. Then comes the actual BL1 data, inserted as it is, right after the header info.

The last function, showResults, is only compiled if DEBUG is defined, this function was written to check the reason why the target board's buzzer would not shut up every time I try SD card booting (indicating an error), as stated in the last paragraph, the reason was the file length miss match. What this function does is basically show the first 10 words of BL1.bin and BL1.boot, for me to compare.

compiling (not CROSS_COMPILING!) and running imageMake.c will produce a bootable output BL1 called BL1.boot.

Formatting the MMC

Time to format the MMC card, we'll be using fdisk for this, but really, you should use cfdisk.

#fdisk /dev/mmcblk0  (careful here!!)
Command (m for help): d
Command (m for help): n
Command (m for help): w

Go with defaults whenever prompted for a choice. Now that we created a partition, we need to format it, here's a simple bash script that gets the job done:

fuseBin.sh:

#!/bin/bash
device="/dev/mmcblk0"
partition="/dev/mmcblk0p1"
 
umount $partition 2> /dev/null
echo "mkfs.vfat -F 32 $partition"
mkfs.vfat -F 32 $partition
echo "writing BL1 ..."
dd iflag=dsync oflag=dsync if=BL1.boot of=$device seek=1

The script starts by defining the device and partition, those are SPECIFIC TO MY SETUP, you could suffer a great deal of pain if you copy-paste and then run this script! So what's going on in this script? Well, we're basically formatting the partition we created with fdisk earlier, before we can format it, we have to unmount it, then we are writing the BL1.boot binary to the first block of the mmc card, the seek=1 is important, because we do not want the dd write to default to block 0, which is not to be touched according to the iROM documentation.

Conclusion

The target board's buzzer finally stopped indicating errors, which was the intended result in this post, one might say that, what we did here, was extremely none productive, imagine having to change one line of assembly in the startup file, you would have to go through this whole process again !! This is why, you should thoroughly study the development target you intend to buy, because the lack of support from the vendor could reduce your productivity by a factor of a million! Luckily enough, the SoC used in the tinySDK I am using is produced by Samsung, so we, Linux users, have a fairly good amount of utilities provided by Samsung for their chip, they aren't as convenient as the ones Windows users have, but they do the job, And they afford you more flexibility, because of their extremely simple nature, so you can build your development environment exactly the way you'd like! From now on, the posts on this series will make use of such tools, but you should keep in mind that even if you have access to none, you can still pave your own way, if you have access to good documentation.


(1) Flash memory, your smart phone is set to this choice by default, all the hold power+volume up/down instructions you get from your hand phone's vendor basically instructs the BL0 (or BL1 as the vendor chooses) code to choose another device for loading the next executable code, changing the boot up process to another device.

(2) There are some features in the SoC that are configurable at the time the SoC is manufactured, those features basically decide whether a software integrity checksum algorithm is to be used to decide the authenticity of software before executing, if the checksum fails, the software (BL1) will not be executed and there will be some sort of an error code delivered to you, then some fail procedures will be carried out according to the BL0 implementation by the SoC manufacturer, otherwise, system will continue through to BL1 (passes integrity check), BL2 (passes integrity check) and finally, the signed OS that is provided by the Evil company. Read about ARM's TrustZone technology, there are clear and thorough documentations in arm info center ... enjoy!

(3) Some might argue that invalidating cache is a necessary step whenever one changes execution context, we may come back to this later, but I probably will not :-)

(4) I could have experimented with the buzzer sound in different conditions, and memorized the difference between the sounds every error code produced, because I have only three possible errors according to my boot up choices, and the table of error codes in the documentation, but the information in the document were clear enough that I didn't really worry about getting it wrong.

References

S5PV210_IROM_APPLICATION NOTE_REV 0.3.pdf
u-boot source code for s5pcXX Samsung chips.