korena's blog

11. Bootloader implementation - part 1

The target of this milestone is to implement some essential functionality for our system specific second stage bootloader BL2, as a first step, we will focus on getting our bootloader to a point where it can load a Linux zImage into DRAM and have it successfully run the decompressor, which in turn will jump to execute the Kernel startup entry point from file Linux/arch/arm/kernel/head.S, we want the decompressor, and the kernel startup code to have the ability to display early debugging messages to our initialized UART interface, so we have to configure the Linux image to do so, but that's the extent of our modification, we're not porting the Linux kernel for our system just yet, because our bootloader is still too primitive to support us through the process of porting.

Compiling Linux

We will be working with linux-4.4.1 because it's the kernel I have been working with for an ongoing project, which isn't a good justification, but we can use that as our starting point for having Linux work on our Tiny210v1 board, and change it if we come to the conclusion that it is missing something essential, which is unlikely.

All we need from the Linux kernel at this point is the ability to hook to our initialized UART interface, which means we have to modify the debugging UART interface before compiling the kernel.
For cross compiling Linux, I used Linaro's gcc-linaro-5.1-2015.08-x86_64_arm-linux-gnueabihf compiler. and went through the following (standard) steps to compile the kernel:

/h/k/linux-4.4.1$ make mrproper
CLEAN   scripts/basic
CLEAN   scripts/kconfig
CLEAN   .config
/h/k/linux-4.4.1$ ARCH=arm make s5pv210_defconfig
HOSTCC  scripts/basic/fixdep
HOSTCC  scripts/kconfig/conf.o
SHIPPED scripts/kconfig/zconf.tab.c
SHIPPED scripts/kconfig/zconf.lex.c
SHIPPED scripts/kconfig/zconf.hash.c
HOSTCC  scripts/kconfig/zconf.tab.o
HOSTLD  scripts/kconfig/conf

Then edited only one line of the generated .config file in the root directory, which is the line that defines CONFIG_DEBUG_S3C_UART=1 to CONFIG_DEBUG_S3C_UART=0 so the debugging console of the compiled kernel matches our initialized UART console, we also need to make sure that option CONFIG_DEBUG_S3C_UART0=y is set, that's all we're going to do here, as the defconfig we applied does all the necessary SoC specific configs we need for now, this will not be the case when we want to actually get a proper kernel, and we'd have to go through other configuration options for optimization, but this is enough for now.

After modifying the .config file, we proceed to compile the kernel :

/h/k/linux-4.4.1$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make

This will create a bunch of files under /linux-4.4.1/arch/arm/boot, we only care about the file zImage here, which is the compressed Linux kernel that we intent to load to DRAM. We could have defined the output of the last make command by specifying O=, but we did not.

Copying the kernel

To get the kernel compressed image into our bootloader, we're going to ship it along with our bl1 and bl2, in the MMC card we have. We will place it in a specific address, and have our bl2 code fetch it from that specific address and load it to memory for execution. Note that this isn't the best way to go, loading the kernel through USB, or TFTP are other, much better options than transferring the kernel in an MMC card through the development process.

For our bl2 code to be able to copy the zImage we will provide, it needs to know the address at which the zImage will be present in the MMC card, it also needs to know the size of the image, so it can decide when to stop copying.

let's start by viewing the size of the zImage we have:

/linux-4.4.1/arch/arm/boot$ ls -l
drwxrwxr-x 2 korena korena    4096 Apr 18 16:22 bootp
drwxrwxr-x 2 korena korena    4096 Apr 18 16:22 compressed
drwxrwxr-x 3 korena korena   73728 Apr 18 16:23 dts
-rwxrwxr-x 1 korena korena 3024256 Apr 18 16:22 Image
-rw-rw-r-- 1 korena korena    1648 Apr 18 16:22 install.sh
-rw-rw-r-- 1 korena korena    3137 Apr 18 16:22 Makefile
-rwxrwxr-x 1 korena korena 1516384 Apr 18 16:22 zImage

We see that the size is 1516384 bytes, since we 'dd' data into our MMC card using blocks of 512 bytes, we will be transferring 1516384/512 = 2962 blocks into the MMC card using our fuseBin.sh script, so we modify it to become:

#!/bin/bash
# for home
device="/dev/mmcblk0"
partition="/dev/mmcblk0p1"
# for work
#device="/dev/sdd"
#partition="/dev/sdd1"

#umount $partition 2> /dev/null
#echo "mkfs.vfat -F 32 $partition"
#mkfs.vfat -F 32 $partition
#echo "padding 16k ..."
#dd iflag=dsync oflag=dsync if=/dev/zero count=32 of=$device seek=1
echo "writing BL1 ..."
dd iflag=dsync oflag=dsync if=./target_bin/BL1.bin.boot of=$device seek=1
echo "writing bootloader ..."
dd iflag=dsync oflag=dsync if=./target_bin/BL2.bin.img of=$device seek=33
echo "clearing previous zImage..."
dd if=/dev/zero of=$device seek=160 count=2999
echo "writing kernel zImage ..."
dd iflag=dsync oflag=dsync if=./kernel/images/active/zImage of=$device seek=160

We're placing the zImage at the 160th block in our MMC card, since our bootloader will be using this piece of information to find the zImage, we have to take note of it, and make sure we pass it to the bootloader somehow.

setting up BL2 for Linux

The Linux kernel demands a couple of things to be passed to it, it also requires the system to be in a certain state before it (Linux) can boot up properly, instead of repeating this stuff, I'll point you to the file Linux/Documentation/arm/Booting in the kernel source documentation. Following that document (and other references linked to below), specifically, the story of ATAGS, we added the following structures and functions to our bl2/src/bootloader.c:

/* list of possible tags */
#define ATAG_NONE       0x00000000
#define ATAG_CORE       0x54410001
#define ATAG_MEM        0x54410002
#define ATAG_VIDEOTEXT  0x54410003
#define ATAG_RAMDISK    0x54410004
#define ATAG_INITRD2    0x54420005
#define ATAG_SERIAL     0x54410006
#define ATAG_REVISION   0x54410007
#define ATAG_VIDEOLFB   0x54410008
#define ATAG_CMDLINE    0x54410009

/* structures for each atag */
typedef struct atag_header {
        uint32_t size; /* length of tag in words including this header */
        uint32_t tag;  /* tag type */
} header_tag;

struct atag_core {
        uint32_t flags;
        uint32_t pagesize;
        uint32_t rootdev;
};

struct atag_mem {
        uint32_t     size;
        uint32_t     start;
};

struct atag_videotext {
        uint8_t              x;
        uint8_t              y;
        uint16_t             video_page;
        uint8_t              video_mode;
        uint8_t              video_cols;
        uint16_t             video_ega_bx;
        uint8_t              video_lines;
        uint8_t              video_isvga;
        uint16_t             video_points;
};

struct atag_ramdisk {
        uint32_t flags;
        uint32_t size;
        uint32_t start;
};

struct atag_initrd2 {
        uint32_t start;
        uint32_t size;
};

struct atag_serialnr {
        uint32_t low;
        uint32_t high;
};

struct atag_revision {
        uint32_t rev;
};

struct atag_videolfb {
        uint16_t             lfb_width;
        uint16_t             lfb_height;
        uint16_t             lfb_depth;
        uint16_t             lfb_linelength;
        uint32_t             lfb_base;
        uint32_t             lfb_size;
        uint8_t              red_size;
        uint8_t              red_pos;
        uint8_t              green_size;
        uint8_t              green_pos;
        uint8_t              blue_size;
        uint8_t              blue_pos;
        uint8_t              rsvd_size;
        uint8_t              rsvd_pos;
};

struct atag_cmdline {
        char    cmdline[1];
};

struct atag {
        struct atag_header hdr;
        union {
                struct atag_core         core;
                struct atag_mem          mem;
                struct atag_videotext    videotext;
                struct atag_ramdisk      ramdisk;
                struct atag_initrd2      initrd2;
                struct atag_serialnr     serialnr;
                struct atag_revision     revision;
                struct atag_videolfb     videolfb;
                struct atag_cmdline      cmdline;
        } u;
};


#define tag_next(t)     ((struct atag *)((uint32_t *)(t) + (t)->hdr.size))
#define tag_size(type)  ((sizeof(header_tag) + sizeof(struct type)) >> 2)
static struct atag *params; /* used to point at the current tag */




/* a set of functions to setup ATAGS ...*/

static void
setup_core_tag(void * address,long pagesize)
{
    params = (struct atag *)address;         /* Initialise parameters to start at given address */

    params->hdr.tag = ATAG_CORE;            /* start with the core tag */
    params->hdr.size = tag_size(atag_core); /* size the tag */

    params->u.core.flags = 1;               /* ensure read-only */
    params->u.core.pagesize = pagesize;     /* systems pagesize (4k) */
    params->u.core.rootdev = 0;             /* zero root device (typicaly overidden from commandline )*/

    params = tag_next(params);              /* move pointer to next tag */
}

static void
setup_ramdisk_tag(uint32_t size)
{
    params->hdr.tag = ATAG_RAMDISK;         /* Ramdisk tag */
    params->hdr.size = tag_size(atag_ramdisk);  /* size tag */

    params->u.ramdisk.flags = 0;            /* Load the ramdisk */
    params->u.ramdisk.size = size;          /* Decompressed ramdisk size */
    params->u.ramdisk.start = 0;            /* Unused */

    params = tag_next(params);              /* move pointer to next tag */
}

static void
setup_initrd2_tag(uint32_t start, uint32_t size)
{
    params->hdr.tag = ATAG_INITRD2;         /* Initrd2 tag */
    params->hdr.size = tag_size(atag_initrd2);  /* size tag */

    params->u.initrd2.start = start;        /* physical start */
    params->u.initrd2.size = size;          /* compressed ramdisk size */

    params = tag_next(params);              /* move pointer to next tag */
}

static void
setup_mem_tag(uint32_t start, uint32_t len)
{
    params->hdr.tag = ATAG_MEM;             /* Memory tag */
    params->hdr.size = tag_size(atag_mem);  /* size tag */

    params->u.mem.start = start;            /* Start of memory area (physical address) */
    params->u.mem.size = len;               /* Length of area */

    params = tag_next(params);              /* move pointer to next tag */
}

static void
setup_cmdline_tag(const char * line)
{
    int linelen = strlen(line);

    if(!linelen)
        return;                             /* do not insert a tag for an empty commandline */

    params->hdr.tag = ATAG_CMDLINE;         /* Commandline tag */
    params->hdr.size = (sizeof(struct atag_header) + linelen + 1 + 4) >> 2;

    strcpy(params->u.cmdline.cmdline,line); /* place commandline into tag */

    params = tag_next(params);              /* move pointer to next tag */
}

static void
setup_end_tag(void)
{
    params->hdr.tag = ATAG_NONE;            /* Empty tag ends list */
    params->hdr.size = 0;                   /* zero length */
}


static void
setup_tags(uint32_t *parameters)
{
    setup_core_tag(parameters, 4096);       /* standard core tag 4k pagesize */
    setup_mem_tag(DRAM_BASE, 0x1FFFFFFF);    /* 512MB at 0x20000000, only DMC0 is connected in Tiny210 board */
    setup_ramdisk_tag(8192);                /* create 8Mb ramdisk */ 
    setup_initrd2_tag(INITRD_LOAD_ADDRESS, 0x100000); /* 1Mb of compressed data placed 8Mb into memory */
    setup_cmdline_tag("root=/dev/ram0");    /* commandline setting root device */
    setup_end_tag();                    /* end of tags */
}

At the time of writing this, I was planning to use ATAGS as the more straight forward way of passing information to the kernel, but I think we better go with DTB, a way I have not yet experimented with, in any way, this is a question for a later post, since we do not intend to have the kernel actually boot up just yet. The next bits of code are more important to our quest today, as it contributes directly to our mission of loading the kernel to memory and successfully decompressing it, in bl2/src/bootloader.c:


/*... more code above ...*/

typedef uint32_t (*copy_mmc_to_mem)(uint32_t  channel, uint32_t  start_block, uint16_t block_size,
                                                            uint32_t  *target, uint32_t  init);

static uint32_t load_image(uint32_t block_start, uint32_t* load_address,uint16_t num_of_blocks){
        copy_mmc_to_mem copy_func = (copy_mmc_to_mem) (*(uint32_t *) 0xD0037F98); //SdMccCopyToMem function from iROM documentation
        uint32_t ret = copy_func(0,block_start, num_of_blocks,load_address, 0);
        if(ret == 0){
                debug_print("Copying failed :-(\n\r\0");
        }
        return ret;
}


#define DRAM_BASE 0x20000000
#define ZIMAGE_LOAD_ADDRESS (uint32_t*) (DRAM_BASE + 0x8000)  // 32k away from the base address of DRAM
#define ZIMAGE_BLOCK_SIZE ((1516384/512)+1) // TODO: no good, find a better way.
#define ZIMAGE_START_BLOCK_NUMBER 160   // the 160th block in MMC storage memory 

int
start_linux(void)
{
    void (*theKernel)(uint32_t zero, uint32_t arch, uint32_t *params);
    uint32_t i = 0, j = 0,ret;
    uint32_t *exec_at =  ZIMAGE_LOAD_ADDRESS;
    uint32_t *parm_at = (uint32_t *)( DRAM_BASE + 0x100) ;  // 256 bytes away from the base address of DRAM
    uint32_t machine_type;

    debug_print("about to copy linux image to load address: ");
        uart_print_address(exec_at);
        ret = load_image((uint32_t)ZIMAGE_START_BLOCK_NUMBER,(uint32_t*)exec_at,(uint16_t)ZIMAGE_BLOCK_SIZE);    /* copy image into RAM */

    debug_print("done copying linux image ...\n\r\0");

    debug_print("setting up ATAGS ...\n\r\0");

    setup_tags(parm_at);                    /* sets up parameters */

    machine_type = 3466;                      /* get machine type */

    theKernel = (void (*)(uint32_t, uint32_t, uint32_t*))exec_at; /* set the kernel address */
    
        debug_print("jumping to the kernel ... brace yourself!\n\r\0");
 /*The kernel expects caches to be disabled ...*/
        asm("mrc p15, 0, r1, c1, c0, 0"); /* Read Control Register configuration data*/
        asm("bic r1, r1, #(0x1 << 12)");  /* Disable I Cache*/
        asm("bic r1, r1, #(0x1 << 2)");   /* Disable D Cache*/
        asm("mcr p15, 0, r1, c1, c0, 0"); /* Write Control Register configuration data*/

     theKernel(0, machine_type, parm_at);    /* jump to kernel with register set */
        
    return 0;
}


A couple of things to note about this code. First, we're using the iROM provided MMC card copying function again, but instead of copying to SRAM like we did with BL2 in the last post, we're moving data from the MMC card straight to DRAM. a second thing to note is the random machine_type we passed to the kernel, this is actually the machine type I got from a kernel that was ported to some version of tiny210 at some point, but never contributed to the main stream kernel, which means this machine type will not be recognized by the uncompressed running kernel, and since the machine type is one of the first things the kernel startup code looks into, a failure by the kernel to recognize our machine type is what we should expect, that is if our decompressor, and our kernel loading procedure is successful.

Another thing to take note of is the fact that we're hard coding values to describe the kernel size, which is bad for business, you don't want to modify your bootloader every time you compile a new kernel, for that, we're going to use a feature of the zImage our kernel compilation process produced. It turns out that, the zImage is fully aware of it's own contents, and will provide, in the form of headers, a description of it's length (among other stuff), which the bootloader could extract and use to copy the kernel from what ever medium it resides in, to memory. for the purpose of demonstration, I've written this small piece of code that runs on the development host to demonstrate the process:
/host_tools/describe_zImage.c:

#include<stdio.h>
#include<stdint.h>
#include <errno.h>
#include <string.h>

#define ZIMAGE_MAGIC_OFFSET 0x24
#define ZIMAGE_START_OFFSET 0x28
#define ZIMAGE_END_OFFSET   0x2c



int main(int argc, char** argv){

        FILE* zImage = NULL;
        
        uint32_t buffer[2];
        uint32_t start_addr = 0;
        uint32_t end_addr = 0;
        uint32_t ret = 0;
        if(argc != 2){
                printf("Kernel zImage not supplied, exiting ...\n");
                return 1;
        }
        zImage = fopen(argv[1],"r");    
        if(zImage != NULL){
                // seek into zImage ....
                ret = fseek(zImage,ZIMAGE_START_OFFSET,0);
                if(ret == 0){
                        // read address data ...
                        ret = fread(buffer,4,2,zImage);
                        if(ret == 2){
                                printf("zImage start address is %#0x\n",buffer[0]);     
                                printf("zImage end address is %#0x\n",buffer[1]);       
                                printf("zImage length is %#0x\n",buffer[1] - buffer[0]);        
                                if(zImage != NULL)
                                        fclose(zImage); 
                                return 0;
                        }else{
                                printf("fread returned unexpected result: %d\n",ret);
                                printf("error: %s\n",strerror(errno));
                                if(zImage != NULL)
                                        fclose(zImage); 
                                return errno;
                        }       
                }else{
                if(zImage != NULL)
                        fclose(zImage); 
                printf("error seeking file, fseek returned: %d\n",ret);
                return 1;
                }
        }else{
                printf("error opening file, exiting ...\n");
                printf("error: %s\n",strerror(errno));
                return errno;
        }

return 0;
}


The above code simply reads values at certain offsets to define the magic number, start and end addresses of the image, which are helpful in determining the length of the image, read the defines in the above code to know about the proper offset values.

Compiling (not cross compiling!) this code into descZimage.o and running it produces:

h/k/to_linux/host_tools$ ./descZimage.o ../kernel/images/active/zImage
zImage start address is 0
zImage end address is 0x172360
zImage length is 0x172360

0x172360 is 1516384 in hex.

compiling and running

Running the following commands :

/h/k/car-dashboard$ make clean && make && make fuse

While the MMC card is in the host sd slot produces the expected BL1, BL2 and zImage in the expected block addresses.

Then inserting the MMC card in the board slot, and turning the power on produces the following output on the serial port debugging interface:

UART 0 Initialization complete ...

calling iROM copy function ...

Copying BL2 started ...

BL2 loading successful, running it ...

bl2 executing ...
                 Clock System initialization complete ...

memory initialization complete ...

about to copy linux image to load address: 20008000

done copying linux image ...

setting up ATAGS ...

jumping to the kernel ... brace yourself!

Uncompressing Linux... done, booting the kernel.



Error: unrecognized/unsupported machine ID (r1 = 0x00000d8a).
                     

                             
Available machine support: 



ID (hex)        NAME

ffffffff        Generic DT based system

ffffffff        Samsung S5PC110/S5PV210-based board



Please check your kernel config and/or bootloader.

Line 22 is emitted by the kernel decompressor, and the lines after by the first instructions of the kernel.

Conclusion

Having walked through the kernel loading and decompressing process, we can conclude this post. Imagine having to compile the kernel multiple times through the process of porting, and hitting numerous panics and errors along the way, the way our bootloader works now, you would have to keep fusing the MMC card every time you make a tiny change to the kernel, you would also have to modify the bootloader to accommodate the new kernel image size every time it changes, which is definitely not a way you want to go. A proper way to go about this process is to have the bootloader fetch the kernel from your host machine directly. The easiest and most convenient way would be using USB protocol to transfer the kernel to the bootloader, which will then load it to memory or perhaps save it in storage, which is the way a lot of development boards and production systems (think smartphones) handle this process. What I chose to go with, however, is TFTP loading, which is arguably less convenient, and requires a lot more code to achieve, but it's a worthy exercise, because it gives you an idea about the required steps one would need to port an Ethernet chip in a bare metal environment, and paves the way for network device driver implementation under Linux, if you can do it on a bare metal system, you can be very sure that, by knowing the internal workings of the echo-system Linux provides for such devices, you can successfully write a Network device driver with ease. Our next post will focus on adding some helper functions and modules to our bootloader, specifically formatting functions, and timer delays, so we can move on to the more serious process of porting an Ethenret chip driver from u-boot/Linux to work with our bare metal BL2 code.

References

  1. Booting arm linux
  2. Linux/Documentation/arm/Booting