korena's blog

12. Bootloader Implementation - part 2

In this post, we'll add some supporting functions to our bootloader BL2 in order to make the task of developing the Ethernet driver a bit easier.

Being able to print messages to the console using printf is an essential debugging feature that I don't think any serious developer could do without, printf is such a basic C library function in a desktop development environment that we take it's availability for granted. When it comes to embedded systems, things are a bit different, instead of printing to a virtual console, printf would have to print to a console of our choosing, in my case, UART 0 port, this means that we'd have to port the functionality of printf to fit our environment. Instead of looking up the source code of printf, reading through, and adapting, I decided to just start with an empty function, and write code that meets only the purpose of my application, which is:

  1. Print strings
  2. Format hex integers
  3. Format decimal integers
  4. Accept variable arguments
  5. Be safe (enough!)

This is a subset of what C library's printf offers, but it will do for my purpose.
The following function was added to c source file bl2/src/terminal.c:


#include <stdarg.h>
#include <terminal.h>
#include <stdint.h>
#include <string.h>


static char out[512] = {0};

/*
 * (probably unsafe) implementation of printf 
 */
int print_format(const char* str, ...){
        va_list ap;
        int i=0,j=0;
        int arg;
        va_start(ap,str);
        char charInt[12] ={0}; // maximum size of printable variable
        
        memset(out, 0, sizeof(out));

        while(str != NULL && *str != '\0' && i < 512){
                if(*str == '%'){
                        switch(*++str){
                                case 'x':{
                                                 arg = va_arg(ap, int);
                                                 printnum(charInt,12,"%x",arg);
                                                 for(j=0;j<12;j++){
                                                         if(charInt[j] != '\0'){
                                                                 out[i++] = charInt[j];
                                                         }else{
                                                                 str++;
                                                                 break;
                                                         }
                                                 }
                                         }
                                         break;
                                case 'd':{
                                                 arg = va_arg(ap, int);
                                                 printnum(charInt,12,"%d",arg);
                                                 for(j=0;j<12;j++){
                                                         if(charInt[j] != '\0'){
                                                                 out[i++] = charInt[j];
                                                         }else{
                                                                 str++;
                                                                 break;
                                                         }
                                                 }
                                         }
                                         break;
                                default:{
                                                out[i++]='%';
                                                out[i++]=*str++;
                                        }
                                        break;
                        }
                }else{
                        out[i++]=*str++;
                }
        }
        //better late than never !
        if(i < 512 && (out+i) != NULL && *(out+i) != '\0'){
                *(out+i)='\0';
        }
        uart_print(out);
        return 0;
}

Note that I'm compiling with flag -nostdlib, which means the standard libraries that I actually need will have to be explicitly provided to the compilation unit, this is the reason why the name of the function is not just printf, because if the compiler sees "printf", it will try to load a whole bunch of libraries used by the actual C library printf, and won't find them anywhere, producing a bunch of linker errors. Note that our print_format function calls uart_print for the final UART printing process, so uart_print is exposed to it in a some way, checkout the full code in Github.

Our print_format function calls another function called printnum(), which basically converts an integer into a character array to be used in the process of formatting:

/**
 * snprintf substitute,barely tested, no error checking, blind trust.
 */
int printnum (char *__restrict__ s, size_t maxlen, const char *__restrict__ format, uint32_t num){
int i=0;
uint32_t base =1;

for (;(i<maxlen) && (s != NULL);i++)
        s[i]=0;



while(format != NULL && *format != '\0'){
        if(*format == '%' && *(format+1) != '\0'){
                switch(*++format){
                case 'x':{
                         // print in hex format into *s  
                         
                        // in case zero was passed :
                        if(num == 0){
                                *s = '0';
                                 if(++s != NULL){
                                         *s='\0';
                                         return 0;
                                 } else{
                                         *--s='\0';
                                         return -1;
                                 }
                        }
                         for(i=7;i>=0;i--){
                                 if(((num >> i*4) & 0xF) != 0)
                                         break; // skipping leading zeros
                         }
                         for(;i>=0;i--){
                                 if(maxlen != 0 && s != NULL){
                                         if(((num >> i*4)& 0xF) <= 0x9){
                                                 *s =(char) (((num >> i*4) & 0xF) + '0');
                                         }else{
                                                 *s =(char) ((((num >> i*4) & 0xF) + 'A')-10);
                                         }
                                         maxlen--;
                                         s++;
                                 } else{
                                         return -1;
                                 }
                         }
                         // terminate string ...
                         if(maxlen !=0 && s != NULL){
                                 *s = '\0';
                         }else{
                                 return -1;
                         }
                         return 0;
                         }break;
                case 'd':{
                                 // print in decimal format into *s  
                        // in case zero was passed :
                        if(num == 0){
                                *s = '0';
                                 if(++s != NULL){
                                         *s='\0';
                                         return 0;
                                 } else{
                                         *--s='\0';
                                         return -1;
                                 }
                        }
                                 while(base < num){
                                         base = (base << 3)+(base << 1);
                                 }
                                 base /= 10;
                                 while((base >= 1) && (maxlen != 0) && (s != NULL)){
                                         *s = (char)(num/base + '0');
                                         num -= ((num/base)*base);
                                         maxlen--;
                                         s++;
                                         base /= 10;
                                 }
                                 // terminate string ...
                                 if(maxlen !=0 && s != NULL){
                                         *s = '\0';
                                 }else{
                                         return -1;
                                 }
                                 return 0;
                         }break;

                default: {
                                 return -2;
                         }
                }

        }

}
return -1;
}

I admit that this function looks terrible, and probably has many flaws that I am completely unaware of, but it does the job, it is a naive implementation of the C Library function snprintf, named differently because of the same reason print_format was not named printf, linking problems.

uDelay for microseconds delay

In any serious chip's initialization process, you'll often come across requirements for delays between initialization steps, this is primarily due to the fact that your processor is probably running at extreme speeds compared to the relatively humble speeds peripheral chips run at. Before you even think of jumping in and reading the documentation for your to-be initialized chip, take the time to write a couple of delay generating routines, they don't have to be fancy, and accuracy is probably not that important initially, so don't stress about it, just get something to work reasonably well. The way I go about it, is to provide one function that can generate a 1 microsecond delay, and then loop over it to generate anything up to a few seconds, this is bad, and could be achieved in a much better way by implementing different base time functions, instead of burning processor cycles in such a crude way, but for the purpose of getting things done, we're not gonna care much about efficiency here.

S5PV210 timer initialization

Although we could have burnt processor cycles and calculated our delays using primitive counting loops, I couldn't stoop so low, so I decided to initialize a hardware timer, for civility.

What the documentation says about it

We're looking at Chapter 2 of S5PV210 user manual (System timer), the first section (2.1) provides an overview of the system timer, what we can take from this section is the overall system timer block diagram in figure 2-1, which gives us a broad idea of what areas of the special function registers we need to touch, don't spend too much time on this, it's more useful when you have a better understanding of the steps you need to take to get this thing to work. A more useful figure is shown under section 2.3 INTERNAL FUNCTION OF SYSTEM TIMER, looking at both figure 2-1 and figure 2-2, we can deduce that:

  • There are two operational 'regions' for our timer, the first being a region (named clock generation region) that is fed directly from the clock divider (TCLK or timer clock source), which is the output of a chain of subsystems starting from the multiplexer that decides which clock source to use as input (TCLKB), an 8 bit prescaler that scales the chosen clock source signal, and a Clock divider that further refines the external clock source signal used to drive the timer system.
  • There are two registers used to control the interval of this region's tick generation, named TICNTB and TFCNTB, we don't yet know what that stands for, but we'll find out when we read more about it, for now, we know that in order to change this tick generation interval, we have to somehow stop the timer, which we'll learn how to do in a few.
  • We can observe the timer value as it runs, looking at the value of a register called TICNTO, we can see this value decremented with each generated tick, this could mean that all we need to deal with to generate basic timer intervals, and actually use it in our bootloader code, is to initialize this region, without necessarily paying any attention to the other region described in figure 2-2.
  • The second operational "region" for our timer (named Interrupt generation region) is driven by the (observable) tick interval generated by the first operational region described above, the intervals generated by this region can be configured by writing to a register called Interrupt Interval Counter (ICNTB), a notable difference between configuring this region's tick, and the first region above is that stopping the timer for changing this region's interval is not required.
  • We can also observe the output of this regions counter (before it actually generates the tick) by looking at regtister ICNTO (the actual count down is performed inside register INTCNT, but you observe it in register ICNTO).

Moving on to Figure 2-3, we can see the relationship between INTCNT internal counter, and the user observation register ICNTO. Figure 2-4 shows the relationship between INTCNT and ICNTO in one shot mode (no auto-reloading), we'll come back to these figures if we see the need to.

Fractional divider

Before we get into the special case of Fractional divider, let's have a very quick review of how timers generally operate.

Timers concept

A timer is essentially a counter (typically counts down), the size of the register that represents the counter, and the frequency of the physical input signal that feeds the counter determine the resolution of the counter, how the counter register(s) is(are) implemented is system specific, it could, for example be divided into high byte/word and low byte/word registers.
The general configuration principal follows the following steps:

  1. decide the output tick interval you'd like the timer to produce in seconds (e.g produce a tick every 1ms)
  2. get the required frequency of that interval in Hz (e.g 1ms is 1/0.001 = 1 KHz)
  3. configure the clock input signal frequency to fit a fraction of the desired frequency (use prescalers and clock dividers)
  4. place a value in the counter, keep in mind that this value will be decremented at a frequency equivalent to the frequency you configured in step 3, so the counter will be decremented at every duty cycle of the input frequency from step 3. The following psudo code demonstrates the process of actually
    getting a delay of the desired interval out of the whole system:
loop @ every input_clock_period {
    if(counter_register_value == 0){
        generate_desired_timer_tick; // the desired delay (step 1) is achieved between every two ticks here 
        counter_register_value = value_from_step_3_above;
    }
    counter_register_value--;
}

Now that that's out of the way, let's see how we can configure our timer to generate a 1 microsecond delay. from section 2.5 TICK GENERATION WITH FRACTIONAL DIVIDER , we see that the fractional divider in S5PV210 SoC is a substitute for prescalers and clock dividers in a conventional peripheral timer found in other SoCs and microcontrollers, to use the fractional divider, one explicitly bypasses divider mux and prescaler by writing bits 0 to bit 10 of TCFG register as 0, one also has to pay attention to some basic condition for using the fractional divider, the clock source frequency must be 4 times larger than the desired frequency we want from the fractional divider (target frequency) , so we have to follow the formula:
value_to_be_placed_in_the_fractional_devider_registers = Input_clock_frequency / 2 / target_frequency

You might be wondering why would we want to use the fractional divider instead of prescalling and clock dividing, which are what most embedded systems engineers usually use, the answer lies in the name of the fractional divider, you see, you can achieve much more refined timer ticks if you did not need to constrain yourself to none fractional values, to demonstrate how this is useful, take the given example of generating a tick interval of 2 KHz from an input clock of 9KHz, with the fractional divider, value_to_be_placed_in_the_fractional_devider_register is 9 / 2 / 2 = 2.25, which is a fractional value, you'd have to jump through hoops to get a value like that using prescallers and clock dividers, but with the fractional divider, you simply write register TICNTB as 1 ( which is 2-1) and TFCNTB as 16384 (which is 0.25*65536), and let the hardware deal with it.
In general, if your fractional value looks like a.b , you should set register TICNTB to (a-1), and register TFCNTB to (b x 65536), this is provided in the documentation, no magic here. So lets calculate the values we need to generate a 1 microsecond delay from a clock source of 24MHz, which is the system clock (we'll get to how we set the timer to use the system clock later):

value = 24MHz/2/(1/0.000001)
or
value = 12.00

in our case, we did not really need the fractional divider, because our value is a simple integer, so all we will have to do to generate the desired tick interval is set register TICNTB to 11 (which is 12-1) the proper thing to do here, is to actually indicate to the timer peripheral that we do not need the fractional divider, by setting bit 14 of TCFG register to zero, this will have the timer peripheral ignore whatever garbage values may be in TFCNTB, if any (this is an assumption.).

Referring to the tick generation regions explained above, the fractional divider stuff are related to the first operational region, so what we have achieved now is defining that the second region will be driven by a tick interval of 1 microsecond, So to define how many microseconds we'd like our uDelay(uint32_t u) function to achieve, we'll use the second tick generation region, which has to be done using a different set of registers.

Section 2.6 USAGE MODEL tells us about a couple of restrictions to keep in mind, you can refer to it if you're interested. Section 2.6.2 INTERRUPTS gives us important pieces of information, it states that we have five types of interrupt sources:

  • interrupt counter expired (this is important during operation, its what tells us that the counter value we set has expired).

  • four SFR write status bits, we'll use these interrupt status flags to determine whether registers TCON, TICNTB, TFCNTB, and ICNTB SFR are updated with the values we write to them during configuration phase or not.

Section 2.6.4 START TIMER finally gets to the point, it describes the steps used to initialize the timer, instead of copying the steps here, let's just jump straight to the initialization function, you can follow the manual and compare it to what we're doing in the function, it's a rather straight forward process, and comments are provided to explain what's happening

bl2/src/timer.c:

#include<timer.h>
#include<terminal.h>

#define DEBUG_TIM 1

int init_timer(void){

        // enable all timer interrupt flags ...
        *(INT_CSTAT) |= 0x1;
        if(DEBUG_TIM){
                print_format("The INT_CSTAT register value is: 0x%x\n\r",*INT_CSTAT);     
        }
        //perform software reset :
        *TCFG |= (uint32_t) TCFG_TICK_SWRST_bit;
        // wait for TCFG_TICK_SWRST_bit to auto clear ...
        while((*(TCFG) & TCFG_TICK_SWRST_bit) == TCFG_TICK_SWRST_bit){
                if(DEBUG_TIM)
                        print_format("waiting for software timer reset ...");   
        }

        // set the clock source to 00 for system clock (24MHz)
        *(TCFG) |= (uint32_t) TCFG_TCLKB_MUX_bits & (0 << 13);
        // set divider and prescaler to 0
        *(TCFG) |= (uint32_t) TCFG_Divider_MUX_bits & (0 << 10);
        *(TCFG) |= (uint32_t) TCFG_Prescaler_bits & (0 << 7);

        // set tickgen sel to fractional divider (set it to 1)
        *(TCFG) |= TCFG_TICKGEN_SEL_bit;
        if(DEBUG_TIM)
                print_format("the value of TCFG register is:0x%x\n\r",*(TCFG));

        // set the tick integer count buffer register 
        *(TICNTB) = (uint32_t) 11; // for 1us tick interval
        // set the tick fractional count buffer register
        //*(TFCNTB) = (uint32_t) 0; // for 1us tick interval

        // wait for INT_CSTAT[2] write status bit to assert ...
        while(((*(INT_CSTAT) & INT_CSTAT_TICNTBWS_bit) == INT_CSTAT_TICNTBWS_bit)){
                // wait until this bit is set to 1 (meaning asserted) ...
                print_format("waiting for INT_CSTAT[2] to assert(0x%x) ...\n\r",*INT_CSTAT);
        }

        // set INT_CSTAT[2] to one (this is actually clearing!)
        *(INT_CSTAT) |= INT_CSTAT_TICNTBWS_bit;
        // set INT_CSTAT[3] to one (this is actually clearing!)
        //*(INT_CSTAT) &= ~INT_CSTAT_TFCNTBWS_bit;

        // wait until TICNTB & TFCNTB are set ...

        while(*(TICNTB) != (uint32_t)11 || *(TFCNTB) != 0)
                print_format("waiting to confirm integer and fractional set values\n\r");

        // starting the tick generation timer, this is done once only ...
        *(TCON) |= TCON_TIMONOFF_bit ; // setting first bit to one
        // when this function returns, the main timer generator block produces 1 tick every 1us ...

        // test to see if the timer is actually running ...

        if(DEBUG_TIM)
                print_format("the value of TCON register is:0x%x\n\r",*(TCON));

        return 0;
}

And for the actual uDelay function :

void udelay(int u){
        int breaker = 1000000;
        // set ICNTB register value 
        *(ICNTB) |= (u+1);
        // wait to make sure this is set, (causes longer delay??)
        while((*(INT_CSTAT) & INT_CSTAT_ICNTBWS_bit) != INT_CSTAT_ICNTBWS_bit){
                if(--breaker == 0)
                        break;
        };
        *(INT_CSTAT) &= ~INT_CSTAT_ICNTBWS_bit; // clear interrupt bit ..
        *(TCON) |= TCON_INTMANUPDATE_bit; // update ICNTB
        while((*(INT_CSTAT) & INT_CSTAT_TCONWS_bit) != INT_CSTAT_TCONWS_bit){
                if(--breaker == 0)
                        break;
        };

        *(INT_CSTAT) &= ~INT_CSTAT_TCONWS_bit; // clear assertion (set to 0)
        // set interrupt timer mode to one-shot:
        *(TCON) |= TCON_INTTYPE_bit ;
        while((*(INT_CSTAT) & INT_CSTAT_TCONWS_bit) != INT_CSTAT_TCONWS_bit){
                if(--breaker == 0)
                        break;
        };
        *(INT_CSTAT) &= ~INT_CSTAT_ICNTBWS_bit;
        // start interrupt timer :
        *(TCON) |= TCON_INTONOFF_bit;
        while((*(INT_CSTAT) & INT_CSTAT_TCONWS_bit) != INT_CSTAT_TCONWS_bit){
                if(--breaker == 0)
                        break;
        };
        *(INT_CSTAT) &= ~INT_CSTAT_ICNTBWS_bit;

        // do nothing while the interrupt counter expired interrupt status is 0
        while((*(INT_CSTAT) & INT_CSTAT_INTCNTES_bit) != INT_CSTAT_INTCNTES_bit){
                doNotOptimize(); // to prevent GCC from optimizing
        };

        *(INT_CSTAT) &= ~INT_CSTAT_INTCNTES_bit; // clear assertion
        // stop interrupt timer :
        *(TCON) &= ~TCON_INTONOFF_bit;
        return;
}

int get_timer(int u){

        return 0;
}

The above code deals exclusively with the second operational region (Interrupt generation region). Note that we're never stopping the tick generation region (first region), it remains running forever, but we do stop the interrupt timer, which generates delays in multiples of a microsecond, we do that by following Section 2.6.5 STOP TIMER, but only using the first point under that section.
The rest of the chapter gives detailed description of the registers involved in dealing with the timer. straight forward stuff.

Trivial functions

To further simplify the process of developing the bare metal Ethernet driver, I added the following trivial, but useful functions to the bootloader.
bl2/asm/io.s

/*
* these are externed in io.h, because I will forget where they are in three days.
*/

.text
.code 32

.global __raw_writeb
.global __raw_writehw
.global __raw_writel
.global __raw_readb
.global __raw_readhw
.global __raw_readl

__raw_writeb:
        push    {lr}
        strb    r0,[r1]
        pop     {pc}

__raw_writehw:
        push    {lr}
        strh    r0,[r1]
        pop     {pc}

__raw_writel:
        push    {lr}
        str     r0,[r1]
        pop     {pc}

__raw_readb:
        push    {lr}
        ldrb    r0,[r0]
        pop     {pc}

__raw_readhw:
        push    {lr}
        ldrh    r0,[r0]
        pop     {pc}

__raw_readl:
        push    {lr}
        ldr     r0,[r0]
        pop     {pc}

These functions simply provide a way to read and write values with different lengths from and to registers.

Conclusion

In this post, we've developed some basic functionality into our bootloader, these are the most basic functions one would need for any serious development of a bootloader. Next, we'll take a look at the deep topic of networking layers and packets, we won't be doing any programming, we're merely feeling the space around us in the realm of networking, so we could have a basic understanding of what we're about to do.