korena's blog

8. Bare metal & low level init - part 3

Before digging out the specific DRAM related information in S5PV210 SoC, you may want to brush on the functional blocks involved in a general DRAM memory controller setup, a short, adequately descriptive article can be found here. I realize that explanations of the internal functional blocks of SDRAM chips and the construct of a modern memory controller are important, but that could take too much time, and delay the progress of this series, so more resources are presented at the end of this post, and I will have a detailed theoretical post on the matter some time later, that is if I fail to cast away the feeling of guilt.
Regarding the memory system at hand, we'll get most of the needed understanding of what is connected to what from CHAPTER 1 DRAM CONTROLLER, Figure 1-1 Overall Block Diagram in S5PV210 UM document, which will serve as our guide through the process of initialization. we'll start by laying down a road map, so we can have some sort of a plan before diving head first.

Introduction

Since the core processor is connected to the SDRAM controller through AXI (1), a Bus Interface Block is implemented in the memory controller, this Bus Interface block is responsible for saving the bus transactions for memory access in a command queue, and buffers read/write data. The Scheduler block mainly translates the received commands from the Bus Interface Block into valid memory commands, in addition to utilizing special internal components for a more efficient use of memory. Finally, a Memory Interface Block communicates with the actual memory device (Through the PHY interface.). A more elaborate explanation is presented in the paragraphs following figure 1-1 in CHAPTER 1 DRAM CONTROLLER.

Initialization sequence

The initialization procedure includes:

  • PHY DLL initialization
  • setting controller register
  • memory initialization (DDR2).

The steps as presented in the documentation (S5PV210 UM) are (for DDR2):

  1. To provide stable power for controller and memory device, the controller must assert and hold CKE to a logic low level. Then apply stable clock. Note: XDDR2SEL should be High level to hold CKE to low.
  2. Set the PhyControl0.ctrl_start_point and PhyControl0.ctrl_inc bit-fields to correct value according to clock frequency. Set the PhyControl0.ctrl_dll_on bit-field to ‘1’ to turn on the PHY DLL.
  3. DQS Cleaning: Set the PhyControl1.ctrl_shiftc and PhyControl1.ctrl_offsetc bit-fields to correct value according to clock frequency and memory tAC parameters.
  4. Set the PhyControl0.ctrl_start bit-field to ‘1’.
  5. Set the ConControl. At this moment, an auto refresh counter should be off.
  6. Set the MemControl. At this moment, all power down modes should be off.
  7. Set the MemConfig0 register. If there are two external memory chips, set the MemConfig1 register.
  8. Set the PrechConfig and PwrdnConfig registers.
  9. Set the TimingAref, TimingRow, TimingData and TimingPower registers according to memory AC parameters.
  10. If QoS scheme is required, set the QosControl0~15 and QosConfig0~15 registers.
  11. Wait for the PhyStatus0.ctrl_locked bit-fields to change to ‘1’. Check whether PHY DLL is locked.
  12. PHY DLL compensates the changes of delay amount caused by Process, Voltage and Temperature (PVT) variation during memory operation. Therefore, PHY DLL should not be off for reliable operation. It can be off except runs at low frequency. If off mode is used, set the PhyControl0.ctrl_force bit-field to correct value according to the PhyStatus0.ctrl_lock_value[9:2] bit-field to fix delay amount. Clear the PhyControl0.ctrl_dll_on bit-field to turn off PHY DLL.
  13. Confirm whether stable clock is issued minimum 200us after power on
  14. Issue a NOP command using the DirectCmd register to assert and to hold CKE to a logic high level.
    1-5
    S5PV210_UM 1 DRAM CONTROLLER
  15. Wait for minimum 400ns.
  16. Issue a PALL command using the DirectCmd register.
  17. Issue an EMRS2 command using the DirectCmd register to program the operating parameters.
  18. Issue an EMRS3 command using the DirectCmd register to program the operating parameters.
  19. Issue an EMRS command using the DirectCmd register to enable the memory DLLs.
  20. Issue a MRS command using the DirectCmd register to reset the memory DLL.
  21. Issue a PALL command using the DirectCmd register.
  22. Issue two Auto Refresh commands using the DirectCmd register.
  23. Issue a MRS command using the DirectCmd register to program the operating parameters without resetting the memory DLL.
  24. Wait for minimum 200 clock cycles.
  25. Issue an EMRS command using the DirectCmd register to program the operating parameters. If OCD calibration is not used, issue an EMRS command to set OCD Calibration Default. After that, issue an EMRS command to exit OCD Calibration Mode and to program the operating parameters.
  26. If there are two external memory chips, perform steps 14~25 for chip1 memory device.
  27. Set the ConControl to turn on an auto refresh counter. 28. If power down modes is required, set the MemControl registers.

Implementation and initialization procedure

I'll start by posting the complete assembly code, and will explain what's happening bit by bit.

updated startup.s:

/* Standard definitions of mode bits and interrupt (I & F) flags in PSRs */
.equ Mode_USR,   0x10
.equ Mode_FIQ,   0x11
.equ Mode_IRQ,   0x12
.equ Mode_SVC,   0x13
.equ Mode_ABT,   0x17
.equ Mode_UND,   0x1B
.equ Mode_SYS,   0x1F
  
/*useful addresses, fetched from the S5PV210 user manual*/
.equ GPIO_BASE,  0xE0200000
.equ GPJ2CON_OFFSET,  0x280   
.equ GPJ2DAT_OFFSET,   0x284
.equ GPJ2PUD_OFFSET,   0x288
.equ GPJ2DRV_SR_OFFSET,  0x28C
.equ GPJ2CONPDN_OFFSET,  0x290
.equ GPJ2PUDPDN_OFFSET,  0x294
  
  
/*clock configuration registers */
.equ ELFIN_CLOCK_POWER_BASE,  0xE0100000
.equ CLK_SRC6_OFFSET,  0x218
.equ CLK_SRC0_OFFSET,  0x200
  
.equ APLL_CON0_OFFSET,  0x100
.equ APLL_CON1_OFFSET,  0x104
.equ MPLL_CON_OFFSET,  0x108
.equ EPLL_CON_OFFSET,  0x110
.equ VPLL_CON_OFFSET,  0x120
  
.equ CLK_DIV0_OFFSET,  0x300
.equ CLK_DIV0_MASK,  0x7fffffff
.equ CLK_DIV6_OFFSET,  0x318
  
.equ CLK_OUT_OFFSET,  0x500
  
  
/* PMS values constants*/
.equ APLL_MDIV,   0x7D  @125
.equ APLL_PDIV,   0x3   @3
.equ APLL_SDIV,   0x1   @1
  
.equ MPLL_MDIV,   0x29b @667
.equ MPLL_PDIV,   0xc   @12
.equ MPLL_SDIV,   0x1   @1
  
.equ EPLL_MDIV,   0x60  @96
.equ EPLL_PDIV,   0x6   @6
.equ EPLL_SDIV,   0x2   @2
  
.equ VPLL_MDIV,   0x6c  @108
.equ VPLL_PDIV,   0x6   @6
.equ VPLL_SDIV,   0x3   @3
  
/*the next places MDIV value at address 16, PDIV at address 8, SDIV at address 0 of the CLK_DIV0 register, it also sets the highest bit to turn on the APLL,*/
.equ APLL_VAL,    ((1<<31)|(APLL_MDIV<<16)|(APLL_PDIV<<8)|(APLL_SDIV))
.equ MPLL_VAL,    ((1<<31)|(MPLL_MDIV<<16)|(MPLL_PDIV<<8)|(MPLL_SDIV))
.equ EPLL_VAL,    ((1<<31)|(EPLL_MDIV<<16)|(EPLL_PDIV<<8)|(EPLL_SDIV))
.equ VPLL_VAL,    ((1<<31)|(VPLL_MDIV<<16)|(VPLL_PDIV<<8)|(VPLL_SDIV))
/* Set AFC value */
.equ AFC_ON,    0x00000000
.equ AFC_OFF,    0x10000010
  
  
  
/* CLK_DIV0 constants*/
.equ APLL_RATIO,  0
.equ A2M_RATIO,   4
.equ HCLK_MSYS_RATIO,  8
.equ PCLK_MSYS_RATIO,  12
.equ HCLK_DSYS_RATIO,  16
.equ PCLK_DSYS_RATIO,  20
.equ HCLK_PSYS_RATIO,  24
.equ PCLK_PSYS_RATIO,  28
  
.equ CLK_DIV0_VAL,      ((0<<APLL_RATIO)|(4<<A2M_RATIO)|(4<<HCLK_MSYS_RATIO)|(1<<PCLK_MSYS_RATIO)|(3<<HCLK_DSYS_RATIO)|(1<<PCLK_DSYS_RATIO)|(4<<HCLK_PSYS_RATIO)|(1<<PCLK_PSYS_RATIO))
 
/*For UART*/
.equ GPA0CON_OFFSET,            0x000
.equ GPA1CON_OFFSET,            0x020
.equ UART_BASE,                 0XE2900000
.equ UART0_OFFSET,              0x0000
 
.equ ULCON_OFFSET,                  0x00
.equ UCON_OFFSET,                   0x04
.equ UFCON_OFFSET,                  0x08
.equ UMCON_OFFSET,                  0x0C
.equ UTXH_OFFSET,                   0x20
.equ UBRDIV_OFFSET,                 0x28
.equ UDIVSLOT_OFFSET,       0x2C
.equ UART_UBRDIV_VAL,       34  
.equ UART_UDIVSLOT_VAL,         0xDDDD
 
.equ UART_CONSOLE_BASE, (UART_BASE + UART0_OFFSET)
 
 
 
/*
 * SDRAM Controller
 */
.equ APB_DMC_0_BASE,   0xF0000000
.equ ASYNC_MSYS_DMC0_BASE,  0xF1E00000
 
.equ DMC_CONCONTROL,    0x00
.equ DMC_MEMCONTROL,    0x04
.equ DMC_MEMCONFIG0,    0x08
.equ DMC_MEMCONFIG1,    0x0C
.equ DMC_DIRECTCMD,    0x10
.equ DMC_PRECHCONFIG,   0x14
.equ DMC_PHYCONTROL0,   0x18
.equ DMC_PHYCONTROL1,   0x1C
.equ DMC_RESERVED,    0x20
.equ DMC_PWRDNCONFIG,   0x28
.equ DMC_TIMINGAREF,    0x30
.equ DMC_TIMINGROW,    0x34
.equ DMC_TIMINGDATA,    0x38
.equ DMC_TIMINGPOWER,   0x3C
.equ DMC_PHYSTATUS,    0x40
.equ DMC_CHIP0STATUS,   0x48
.equ DMC_CHIP1STATUS,   0x4C
.equ DMC_AREFSTATUS,    0x50
.equ DMC_MRSTATUS,    0x54
.equ DMC_PHYTEST0,    0x58
.equ DMC_PHYTEST1,    0x5C
 
.equ MP1_0DRV_SR_OFFSET,   0x3CC
.equ MP1_1DRV_SR_OFFSET,   0x3EC
.equ MP1_2DRV_SR_OFFSET,   0x40C
.equ MP1_3DRV_SR_OFFSET,   0x42C
.equ MP1_4DRV_SR_OFFSET,   0x44C
.equ MP1_5DRV_SR_OFFSET,   0x46C
.equ MP1_6DRV_SR_OFFSET,   0x48C
.equ MP1_7DRV_SR_OFFSET,   0x4AC
.equ MP1_8DRV_SR_OFFSET,   0x4CC
 
.equ ram_load_address,          0x20000000
  
.equ I_Bit,      0x80 /* when I bit is set, IRQ is disabled*/
.equ F_Bit,      0x40 /* when F bit is set, FIQ is disabled*/
  
.text
.code 32
.global _start
  
_start:
        b Reset_Handler
        b Undefined_Handler
        b SWI_Handler
        b Prefetch_Handler
        b Data_Handler
        nop /* Reserved vector*/
        b IRQ_Handler
/*FIQ handler would go right here ...*/
  
.globl _bss_start
_bss_start:
 .word bss_start
  
.globl _bss_end
_bss_end:
 .word bss_end
  
.globl _data_start
_data_start:
 .word data_start
  
.globl _rodata
_rodata:
 .word rodata
  
Reset_Handler:
   
/* set the cpu to SVC32 mode and disable IRQ & FIQ */
         msr CPSR_c, #Mode_SVC|I_Bit|F_Bit ;
       /* Disable Caches */
        mrc p15, 0, r1, c1, c0, 0 /* Read Control Register configuration data*/
        bic r1, r1, #(0x1 << 12)  /* Disable I Cache*/
        bic r1, r1, #(0x1 << 2)   /* Disable D Cache*/
        mcr p15, 0, r1, c1, c0, 0 /* Write Control Register configuration data*/
  
        /* Disable L2 cache (too specific, not needed now, but useful later)*/
        mrc p15, 0, r0, c1, c0, 1  /* reading auxiliary control register*/
        bic r0, r0, #(1<<1)
        mcr p15, 0, r0, c1, c0, 1  /* writing auxiliary control register*/
  
        /* Disable MMU */
        mrc p15, 0, r1, c1, c0, 0 /* Read Control Register configuration data*/
        bic r1, r1, #0x1
        mcr p15, 0, r1, c1, c0, 0 /* Write Control Register configuration data*/
  
        /* Invalidate L1 Instruction cache */
        mov r1, #0
        mcr p15, 0, r1, c7, c5, 0
  
        /*Invalidate L1 data cache and L2 unified cache*/
        bl invalidate_unified_dcache_all
  
        /*enable L1 cache*/
        mrc     p15, 0, r1, c1, c0, 0 /* Read Control Register configuration data*/
        orr     r1, r1, #(0x1 << 12)  /* enable I Cache*/
        orr     r1, r1, #(0x1 << 2)   /* enable D Cache*/
        mcr     p15, 0, r1, c1, c0, 0 /* Write Control Register configuration data*/
  
        /*enable L2 cache (in addition to I,D cache on all levels)*/
        mrc     p15, 0, r0, c1, c0, 1
        orr     r0, r0, #(1<<1)
        mcr     p15, 0, r0, c1, c0, 1
       
  
        ldr sp, =0xd0037d80 /* SVC stack top, from irom documentation*/
        sub sp, sp, #12 /* set stack */
       @mov fp, #0
  
        ldr r0,=0x0C
        bl flash_led
  
        bl clock_subsys_init
 
        ldr r0,=0x0F
        bl flash_led
        bl uart_asm_init
 
        ldr r0,=0x0C
        bl flash_led
 
        bl mem_ctrl_asm_init
 
        ldr r0,=0x0F
        bl flash_led
 
        /* copy a block of code from read only memory to ram,
         * and jump to execute it, the executed code should give a
         * certain message if successful*/
 
        bl      copy_To_Mem
        ldr     r0,=0x0C  @ corrupt r0
        ldr     ip,=ram_load_address
        mov     lr, pc
        bx      ip
 
 _end:
        bl      flash_led
        b       _end
 
  
Undefined_Handler:
        b .
SWI_Handler:
        b .
Prefetch_Handler:
        b .
Data_Handler:
        b .
IRQ_Handler:  
        b . 
            
  
  
/*==========================================
* useful routines
============================================ */
  
  
/*clock subsystem initialization code*/
clock_subsys_init:
  
         ldr r0, =ELFIN_CLOCK_POWER_BASE @0xE0100000
  
         ldr r1, =0x0
         str r1, [r0, #CLK_SRC0_OFFSET]
  
         ldr r1, =0x0
         str r1, [r0, #APLL_CON0_OFFSET]
         ldr r1, =0x0
         str r1, [r0, #MPLL_CON_OFFSET]
         ldr r1, =0x0
         str r1, [r0, #MPLL_CON_OFFSET]
   
 /*turn on PLLs and set the PMS values according to the recommendation*/
         ldr r1, =APLL_VAL
         str r1, [r0, #APLL_CON0_OFFSET]
  
         ldr r1, =MPLL_VAL
         str r1, [r0, #MPLL_CON_OFFSET]
  
         ldr r1, =VPLL_VAL
         str r1, [r0, #VPLL_CON_OFFSET]
  
         ldr r1, =AFC_ON
         str r1, [r0, #APLL_CON1_OFFSET]
  
         ldr r1, [r0, #CLK_DIV0_OFFSET]
         ldr r2, =CLK_DIV0_MASK
         bic r1, r1, r2
  
         ldr r2, =CLK_DIV0_VAL
         orr r1, r1, r2
         str r1, [r0, #CLK_DIV0_OFFSET]
  
    /*delay for the PLLs to lock*/
         mov r1, #0x10000
1:       subs r1, r1, #1
         bne  1b
      
 /* Set Mux to PLL (Bus clock) */
  
 /* CLK_SRC0 PLLsel -> APLLout(MSYS), MPLLout(DSYS,PSYS), EPLLout, VPLLout (glitch free)*/
         ldr r1, [r0, #CLK_SRC0_OFFSET]
         ldr r2, =0x10001111
         orr r1, r1, r2
         str r1, [r0, #CLK_SRC0_OFFSET]
  
 /* CLK_SRC6[25:24] -> MUXDMC0 clock select = SCLKMPLL (which is running at 667MHz, needs to be divided to a value below 400MHz)*/
         ldr r1, [r0, #CLK_SRC6_OFFSET]
         bic r1, r1, #(0x3<<24)
         orr r1, r1, #0x01000000
         str r1, [r0, #CLK_SRC6_OFFSET]
  
 /* CLK_DIV6[31:28] -> SCLK_DMC0 = MOUTDMC0 / (DMC0_RATIO + 1) -> 667/(3+1) = 166MHz*/
         ldr r1, [r0, #CLK_DIV6_OFFSET]
         bic r1, r1, #(0xF<<28)
         orr r1, r1, #0x30000000
         str r1, [r0, #CLK_DIV6_OFFSET]
  
        /*the clock output routes on of the configured clocks to an output pin, if you have a debugger to
         * verify the outcome of your configuration, I am at home, and have no access to such hardware at the moment of writing. 
        */
 /* CLK OUT Setting */
 /* DIVVAL[23:20], CLKSEL[16:12] */
         ldr r1, [r0, #CLK_OUT_OFFSET]
         ldr r2, =0x00909000
         orr r1, r1, r2
         str r1, [r0, #CLK_OUT_OFFSET]
  
         mov pc, lr
  
/*Massive data/unified cache cleaning to the point of coherency routine, loops all available levels!*/
  
clean_unified_dcache_all:
         mrc p15, 1, r0, c0, c0, 1 /* Read CLIDR into R0*/
         ands r3, r0, #0x07000000
         mov r3, r3, lsr #23 /* Cache level value (naturally aligned)*/
         beq Finished
         mov r10, #0
Loop1:
         add r2, r10, r10, lsr #1 /* Work out 3 x cache level*/
         mov r1, r0, lsr r2 /* bottom 3 bits are the Cache type for this level*/
         and r1, r1, #7 /* get those 3 bits alone*/
         cmp r1, #2
         blt Skip /* no cache or only instruction cache at this level*/
         mcr p15, 2, r10, c0, c0, 0 /* write CSSELR from R10*/
         isb /* ISB to sync the change to the CCSIDR*/
         mrc p15, 1, r1, c0, c0, 0 /* read current CCSIDR to R1*/
         and r2, r1, #7 /* extract the line length field*/
         add r2, r2, #4 /* add 4 for the line length offset (log2 16 bytes)*/
         ldr r4, =0x3FF
         ands r4, r4, r1, lsr #3 /* R4 is the max number on the way size (right aligned)*/
         clz r5, r4 /* R5 is the bit position of the way size increment*/
         mov r9, r4 /* R9 working copy of the max way size (right aligned)*/
Loop2:
         ldr r7, =0x00007FFF
         ands r7, r7, r1, lsr #13 /* R7 is the max num of the index size (right aligned)*/
Loop3:
         orr r11, r10, r9, lsl R5 /* factor in the way number and cache number into R11*/
         orr r11, r11, r7, lsl R2 /* factor in the index number*/
         mcr p15, 0, r11, c7, c10, 2 /* DCCSW, clean by set/way*/
         subs r7, r7, #1 /* decrement the index*/
         bge Loop3
         subs r9, r9, #1 /* decrement the way number*/
         bge Loop2
Skip:
         add r10, r10, #2 /* increment the cache number*/
         cmp r3, r10
         bgt Loop1
         dsb
Finished:
         mov pc, lr
  
  
  
/*Massive data/unified cache invalidation, loops all available levels!*/
invalidate_unified_dcache_all:
         mrc p15, 1, r0, c0, c0, 1 /* Read CLIDR into R0*/
         ands r3, r0, #0x07000000
         mov r3, r3, lsr #23 /* Cache level value (naturally aligned)*/
beq Finished_
         mov r10, #0
Loop_1:
         add r2, r10, r10, lsr #1 /* Work out 3 x cache level*/
         mov r1, r0, lsr r2 /* bottom 3 bits are the Cache type for this level*/
         and r1, r1, #7 /* get those 3 bits alone*/
         cmp r1, #2
         blt Skip_ /* no cache or only instruction cache at this level*/
         mcr p15, 2, r10, c0, c0, 0 /* write CSSELR from R10*/
         isb /* ISB to sync the change to the CCSIDR*/
         mrc p15, 1, r1, c0, c0, 0 /* read current CCSIDR to R1*/
         and r2, r1, #7 /* extract the line length field*/
         add r2, r2, #4 /* add 4 for the line length offset (log2 16 bytes)*/
         ldr r4, =0x3FF
         ands r4, r4, r1, lsr #3 /* R4 is the max number on the way size (right aligned)*/
         clz r5, r4 /* R5 is the bit position of the way size increment*/
         mov r9, r4 /* R9 working copy of the max way size (right aligned)*/
Loop_2:
         ldr r7, =0x00007FFF
         ands r7, r7, r1, lsr #13 /* R7 is the max num of the index size (right aligned)*/
Loop_3:
         orr r11, r10, r9, lsl R5 /* factor in the way number and cache number into R11*/
         orr r11, r11, r7, lsl R2 /* factor in the index number*/
         mcr p15, 0, r11, c7, c6, 2 /* Invalidate line described by r11*/
         subs r7, r7, #1 /* decrement the index*/
         bge Loop_3
         subs r9, r9, #1 /* decrement the way number*/
         bge Loop_2
Skip_:
         add r10, r10, #2 /* increment the cache number*/
         cmp r3, r10
         bgt Loop_1
         dsb
Finished_:
         mov pc, lr
  
uart_asm_init:
        stmfd sp!,{lr}
        /* set GPIO(GPA) to enable UART */
        @ GPIO setting for UART
        ldr     r0, =GPIO_BASE  @0xE0200000
        ldr     r1, =0x22222222
        str     r1, [r0,#GPA0CON_OFFSET]  @ storing 0010 in all reg fields, which configures PA0 for UART 0 and UART 1.
 
        ldr     r0, =UART_CONSOLE_BASE
 
        mov     r1, #0x0
        str     r1, [r0,#UFCON_OFFSET] @ resetting all bits in UFCON0 register, disabling FIFO, which means the Tx/Rx buffers 
                                       @ offer a single byte to hold transfer/receive data
        str     r1, [r0,#UMCON_OFFSET] @ resetting all bits in UMCON0 register, disabling auto flow control (AFC), and setting 'Request to 
                                            @ Send' bit to zero, which basically means that our UART0 module will tell the other side of the
                                            @ communication line that we can never receive anything, however, it is up to the sender to respect
                                            @ this, depending on it's configuration.
 
        mov     r1, #0x3
        str     r1, [r0,#ULCON_OFFSET]     @ setting bits [0:1] to 0b11, which means data packets are 8 bits long, standard stuff.
 
        ldr     r1, =0x3c5                  @ (0011-1100-0101)
        str     r1, [r0,#UCON_OFFSET]       @ Receive Mode: interrupt/polling mode, Transmit Mode: interrupt/polling mode,
                                            @ Send Break Signal: No, Loop-back Mode: disabled, Rx Error Status Interrupt: Enable,
                                            @ Rx Time Out: Enable,Rx Interrupt Type: level, Tx Interrupt Type: level.
                                            @  Clock Selection: PCLK,dont care for the rest.
 
        ldr     r1, =UART_UBRDIV_VAL   @ 34
        str     r1, [r0,#UBRDIV_OFFSET]
 
        ldr     r1, =UART_UDIVSLOT_VAL  @ 0xDDDD
        str     r1, [r0,#UDIVSLOT_OFFSET]
         
 ldr     r0,=init_uart_string
        mov     r1,#init_uart_len
        bl      uart_print_string
 
        ldmfd sp!,{pc}
 
 
/*void uart_print_string(char* string, int size)*/
uart_print_string:
        stmfd sp!,{r0-r4,lr}
        ldr     r2, =UART_CONSOLE_BASE          @0xE29000000
1:
        ldrb    r3,[r0],#1
        mov     r4, #0x10000 @ delay
2:      subs    r4, r4, #1
        bne     2b
        strb    r3,[r2,#UTXH_OFFSET]    
        subs    r1,r1,#1
        bne     1b
        ldmia sp!,{r0-r4, pc}
 
/*void uart_print_hex(uint32_t hexToPrint)*/
uart_print_hex:
ldmfd sp!,{r0-r4,lr}
 mov r1,r0
 mov r2,#8
loopLag:
 mov r1,r1,ror #28
 and r0,r1,#0x0000000F  @ mask
  
 cmp r0,#10
 bge hexVal
 add r0,r0,#0x30
 bal printIt
hexVal:
 add r0,r0,#0x37
printIt:
 ldr r3,=UART_CONSOLE_BASE 
 strb    r0,[r3,#0x20]   @ This is the UTXH register, the transmit buffer. 
        mov     r4, #0x10000 @ delay
2:      subs    r4, r4, #1
        bne     2b
 
 sub r2,r2,#1
 cmp r2,#0
 bne loopLag 
 mov r0,#0x0D
        strb    r0,[r3,#0x20]
        mov     r4, #0x10000 @ delay
3:      subs    r4, r4, #1
 bne     3b
        mov     r0,#0x0A
 strb r0,[r3,#0x20]
 ldmia sp!,{r0-r4,pc}
 
mem_ctrl_asm_init:
  stmfd sp!,{r4-r11, lr}
 ldr r0, =ASYNC_MSYS_DMC0_BASE
 ldr r1, =0x0
 str r1, [r0, #0x0]
 
 ldr r1, =0x0
 str r1, [r0, #0xC]
         
 /* DMC0 Drive Strength (Setting 4X) */
 ldr r0, =GPIO_BASE
 
 ldr r1, =0x0000FFFF
 str r1, [r0, #MP1_0DRV_SR_OFFSET]
 
 ldr r1, =0x0000FFFF
 str r1, [r0, #MP1_1DRV_SR_OFFSET]
 
 ldr r1, =0x0000FFFF
 str r1, [r0, #MP1_2DRV_SR_OFFSET]
 
 ldr r1, =0x0000FFFF
 str r1, [r0, #MP1_3DRV_SR_OFFSET]
 
 ldr r1, =0x0000FFFF
 str r1, [r0, #MP1_4DRV_SR_OFFSET]
 
 ldr r1, =0x0000FFFF
 str r1, [r0, #MP1_5DRV_SR_OFFSET]
 
 ldr r1, =0x0000FFFF
 str r1, [r0, #MP1_6DRV_SR_OFFSET]
 
 ldr r1, =0x0000FFFF
 str r1, [r0, #MP1_7DRV_SR_OFFSET]
 
 /*
 MP1_8[0]   Xm1CSn[0]   [1:0] => [1:1]
 MP1_8[1]   Xm1CSn[1]   [3:2] => [1:1]
 MP1_8[2]   Xm1RASn     [5:4] => [1:1]
 MP1_8[3]   Xm1CASn     [7:6] => [1:1]
 MP1_8[4]   Xm1WEn      [9:8] => [1:1]
 MP1_8[5]   Xm1GateIn   [11:10] => [1:1]
 MP1_8[6]   Xm1GateOut  [13:12] => [1:1]
 */
  
 ldr r1, =0x00003FFF
 str r1, [r0, #MP1_8DRV_SR_OFFSET]
  
 /* DMC0 initialization*/
 ldr r0, =APB_DMC_0_BASE
  
 ldr r1, =0x00101000    @PhyControl0 DLL parameter setting,
 str r1, [r0, #DMC_PHYCONTROL0]
  
 ldr r1, =0x00000086    @PhyControl1 DLL parameter setting
 str r1, [r0, #DMC_PHYCONTROL1]  @ (0000-0000-0000-0000-0000-0000-1000-0-110)
 
 ldr r1, =0x00101002    @PhyControl0 DLL on
 str r1, [r0, #DMC_PHYCONTROL0]
 
 ldr r1, =0x00101003    @PhyControl0 DLL start
 str r1, [r0, #DMC_PHYCONTROL0]
 
find_lock_val:
 ldr r1, [r0, #DMC_PHYSTATUS]  @Load Phystatus register value
 and r2, r1, #0x7    @masking bits [0:2] of PHYSTATUS register.
 cmp r2, #0x7    
 bne find_lock_val    @ Loop until DLL is locked, the loop waits for ctrl_clock, ctrl_flock and ctrl_locked 
       @ to be set
 
  
  /*block of code if we plan to not use DLL--start*/
.if 0 
 and r1, #0x3fc0    @  masking ctrl_lock_value[9:2] number of delay cells for coarse lock (mask = 11111111000000, bits       @  [0:5] cleared, bits [6:13] set
 mov r2, r1, LSL #18    @ r2 = VVVVVVVV000000000000000000000000, bits [0:23] cleared, bits [24:31] value bits
 orr r2, r2, #0x100000   @ r2 = VVVVVVVV000000000000000000100000 
 orr r2 ,r2, #0x1000    @ r2 = VVVVVVVV000000000000000000101000 
 
 orr r1, r2, #0x3    @ Force Value locking (r1 = VVVVVVVV000000000000000000101011)
 str r1, [r0, #DMC_PHYCONTROL0]  @ Store r1 in PHYCONTROL0, changing only the upper 8 bits [24:31], as explained in the
        @ documentation, This field is used instead of ctrl_lock_value[9:2] from the
                 @ PHY DLL when ctrl_dll_on is LOW. (i.e. If the DLL is off, this field is used to
                                                        @ generate  270' clock and  shift DQS by 90'.)
 
 orr r1, r2, #0x1    @DLL off
 str r1, [r0, #DMC_PHYCONTROL0]
.endif
 /*block of code if we plan to not use DLL--end */
 
 ldr r1, =0x0FFF2010    @ConControl auto refresh off (1111111111110010000000010000, bits [16:27],[13] and [4] 
 str r1, [r0, #DMC_CONCONTROL]  @ set).
       @ Following figure 1-7, we set the rd_fetch time to 2, accounting for the innevitable
                                                        @ existence of tDQSCK (skew between DQS and Ck), the rest of the bits are left
       @ unchanged.
       @ this maps to point 5.
 
 
 ldr r1, =0x00202400    @(was DMC0_MEMCONTROL) MemControl BL=4, 1 chip, DDR2 type, dynamic self refresh off,
 str r1, [r0, #DMC_MEMCONTROL]   @ power down modes off.
        @  0x00202400 (0000-0000-0010-0000-0010-0100-0000-0000)
 
 
 ldr r1, =0x20F01422    @ MemConfig0 512MB config, 4 banks,Mapping Method[12:15]0:linear, 1:linterleaved, 
 str r1, [r0, #DMC_MEMCONFIG0]  @ 2:Mixed 
       @ 0x20F01422 (0010-0000-1111-0000-0000-0011-0010-0010) 0x20E01323
                     @ 4 banks [3:0] = 0x2, row address bits = 14 => [4:7] = 0x2 (found in K4T1G084QF 5. 
                                                 @ DDR2 SDRAM Addressing),col address bits = 11 => [11:8] = 0x4 (I think!), memory  
                                                 @ address scheme  explained in 5.3 Address Mapping Scheme [15:12] = 0x0.
                                                 @ [23:16] kept at default, [31:24] also kept as default. 
 
 ldr r1, =0xFF000000    @ PrechConfig    this maps to point 8 in the sequence ... these are choices you
        str r1, [r0, #DMC_PRECHCONFIG]              @  can finetune later
                                                 @ left as defaults for now.
 
 ldr r1, =0x50F   @ Following the formula in the TIMINGAREF table, we will place 7.8 us * 166 MHz =  0x50F
  str r1, [r0, #DMC_TIMINGAREF]  @ in bits [0:15]
                                                  
 ldr r1, =0x18233287   @ TimingRow for @166Mhz  tRfc=105ns tRRD=10ns tRP=15ns trcd = 15ns trc = 60ns tRAS=40ns
 str r1, [r0, #DMC_TIMINGROW]        @ 0x18233287  (0001-1000-0010-0011-0011-001010-000111)  
 
 ldr r1, =0x23230000   @ TimingData CL=3
 str r1, [r0, #DMC_TIMINGDATA]       @   tWTR=2ns tWR=15ns tRTP= CL=3
 
 ldr r1, =0x07150232   @ TimingPower
 str r1, [r0, #DMC_TIMINGPOWER]      @ 0x07150232  (00-000111-00010101-00000010-0011-0010)
  
 ldr r1, =0x07000000    @DirectCmd chip0 NOP
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x01000000    @DirectCmd chip0 PALL
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x00020000    @DirectCmd chip0 EMRS2
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x00030000    @DirectCmd chip0 EMRS3
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x00010400    @DirectCmd chip0 EMRS1 (MEM DLL on, DQS# disable)
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x00000552    @DirectCmd chip0 MRS (MEM DLL reset) CL=3, BL=4
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x01000000    @DirectCmd chip0 PALL
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x05000000    @DirectCmd chip0 REFA
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x05000000    @DirectCmd chip0 REFA
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x00000452    @DirectCmd chip0 MRS (MEM DLL unreset)
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x00010780    @DirectCmd chip0 EMRS1 (OCD default)
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x00010400    @DirectCmd chip0 EMRS1 (OCD exit)
 str r1, [r0, #DMC_DIRECTCMD]
 
 ldr r1, =0x0FF02030    @ConControl auto refresh on
 str r1, [r0, #DMC_CONCONTROL]
 
 ldr r1, =0xFFFF00FF    @PwrdnConfig
 str r1, [r0, #DMC_PWRDNCONFIG]
 
 ldr r1, =0x00202400    @MemControl BL=4, 1 chip, DDR2 type, dynamic self refresh, force precharge, 
 str r1, [r0, #DMC_MEMCONTROL]  @ dynamic power down off
 
 ldr r0,=init_sdram_string
 
 mov r1,#init_sdram_len
 
 bl  uart_print_string
 
 
 ldmfd sp!,{r4-r11, pc}
 
  
.align 4,0x90
flash_led:
     ldr r4,=(GPIO_BASE+GPJ2CON_OFFSET)
     ldr r5,[r4]
     ldr r2,=1
     orr r5,r5,r2
     orr r5,r2,lsl#4
     orr r5,r2,lsl#8
     orr r5,r2,lsl#12
     orr r5,r5,r2
     str r5,[r4]
     ldr r4,=(GPIO_BASE+GPJ2DAT_OFFSET)
     ldr r5,[r4]
     ldr r3,=0xF
     orr r5,r5,r3  @ turn them all off ...
     bic r5,r5,r0
     str r5,[r4]
     mov r1, #0x10000  @ this should be passed meh!
1:  subs r1, r1, #1
  bne 1b
     orr r5,r5,r3  @ turn them all off again ...
     str r5,[r4]
     mov pc, lr
 
.align 4,0x90
copy_To_Mem:
        stmfd sp!,{r4-r11,lr}
 
        ldr     r0,=copy_sdram_start_string
        ldr     r1,=copy_sdram_start_len
        bl      uart_print_string
 
        ldr     r0,=_BL2
        ldr     r1,=ram_load_address
        mov     r2,#copy_lim
1:      ldr     r3,[r0],#4
        str     r3,[r1],#4
        subs    r2,r2,#1
        bne     1b
 /*test the first and last words of the copied segment, if they match, assume successful*/
  
 ldr r3,=_BL2
 ldr r0,[r3]
 ldr     r1,=ram_load_address
 ldr r2,[r1]  @ the contents of the ram load address.
 cmp r0,r2 @ This is test one, if not equal, go to exit_copy.
 bne exit_copy
 ldr r0,[r3,#copy_lim] 
 ldr r4,[r1,#copy_lim]
 cmp r0,r4 @ This is test two, if not equal, go to exit_copy.
        ldr     r0,=copy_sdram_end_string
        ldr     r1,=copy_sdram_end_len
        bl      uart_print_string
 ldmfd sp!,{r4-r11,pc}
exit_copy:
        ldr   r0,=copy_sdram_err_string
        ldr   r1,=copy_sdram_err_len
        bl    uart_print_string
 ldr   r1,=_BL2
 ldr   r0,[r1]
 bl    uart_print_hex
 ldr   r1,=ram_load_address
 ldr   r0,[r1]
 bl    uart_print_hex
 
        ldr   r1,=_BL2
        ldr   r0,[r1,#copy_lim]
        bl    uart_print_hex
        ldr   r1,=ram_load_address
        ldr   r0,[r1,#copy_lim]
        bl    uart_print_hex
 b .  @ loop forever upon failure ...
 
.section .rodata
init_uart_string:
.ascii "UART 0 Initialization complete ...\r\n"
.set init_uart_len,.-init_uart_string
init_clock_string:
.ascii "Clock System initialization complete ...\r\n"
.set init_clock_len,.-init_clock_string
init_sdram_string:
.ascii "memory initialization complete ...\r\n"
.set init_sdram_len,.-init_sdram_string
copy_sdram_start_string:
.ascii "copying code to dram started ...\r\n"
.set copy_sdram_start_len,.-copy_sdram_start_string
copy_sdram_end_string:
.ascii "copying code to dram complete ...\r\n"
.set copy_sdram_end_len,.-copy_sdram_end_string
copy_sdram_err_string:
.ascii "copying code to dram failed ...\r\n"
.set copy_sdram_err_len,.-copy_sdram_err_string
 
 
.align 4,0x90
_BL2:
        ldr     r0,=exec_sdram_string
        ldr     r1,=exec_sdram_len
        ldr     r2, =0xE2900000
1:
        ldrb    r3,[r0],#1
        mov     r4, #0x10000 @ delay
2:      subs    r4, r4, #1
        bne     2b
        strb    r3,[r2,#0x20]   @ This is the UTXH register, the transmit buffer. 
        subs    r1,r1,#1
        bne     1b
  
        ldr     r0,=0xD   @ This is to get a unique LED pattern 
        mov     pc,lr
  
exec_sdram_string:
.ascii "code execution from dram successful! ...\r\n"
.set exec_sdram_len,.-exec_sdram_string
.align 4
.set copy_lim,.-_BL2

lines [521-558] correspond to the setup of memory port 1 connections to the DDR2 chips. The schematic shows the pin to pin connections if you're interested.
From 2.1.5.3 Pin Mux Description (in S5PV210 UM), you can see the translation of these multiplexed lines. We're basically defining the drive strength of the physical pins connecting the DDR2 chips to the SoC, the same way we did for our hello world led lighting code in milestone 1. We're simply setting the drive strength to the maximum value for all, which is 4x. How do we come to decide on the drive strength you say? Well, normally, it's done by checking the electrical characteristics in the datasheet of the SDRAM chip, the best documentations spill it out plainly, or gives you a chart, but I dug for none of that, for no good reason.

Next, lines [563-564] deal with the setup of PHYCONTROL0 register, this process maps to the second point in the above initialization sequence. the value we're setting here is given in the documentation. In the PHYCONTROL0 table, it is stated that both PHYCONTROL0 [15:8] (ctrl_start_point) and PHYCONTROL0 [23:16] (ctrl_inc) should be set to 0x10. So we did that.

Moving on, table 1.4.1.8 PHY Control1 Register (PhyControl1, R/W, Address = 0xF000_001C, 0xF140_001C) states that bits [0:2] (ctrl_shiftc) should be 0x6 for DDR2@200MHz and that bit [3] (ctrl_ref) is the Reference Count for DLL Lock Confirmation (Not sure what this is, The closest thing I could think of is that this parameter prevents an infinite loop in hardware, by representing the worst case time scenario of a DLL locking routine, and sets the locked status register so we can assume locking is achieved, this is a hypothesis, backed by absolutely no hard evidence, other than knowledge of the operation of DLLs, and wanting to pass this point without having to dig for the truth.).This process maps to point three in the initialization sequence above, and is performed in lines [566-567].

From the PHYCONTROL0 table, in the description of PHYCONTROL0 [1]: " DLL On Active High start signal to activate the DLL. This signal should be kept HIGH for normal operation. If this signal becomes LOW, DLL is turned off and ctrl_clock and ctrl_flock become HIGH. This bit should be set before ctrl_start is set to turn on the DLL". Clear enough, so we set bit one in lines [569-570]. This still maps to the second point in the initialization sequence above.

From PHYCONTROL0 table, in the description of PHYCONTROL0 [0]: "HIGH active start signal to initiate the DLL run and lock. This signal should be kept HIGH during normal operation. If this signal becomes LOW, DLL stops running. To re-run DLL, make this signal HIGH again. In the case of re-running, DLL loses previous lock information. Before ctrl_start is set, make sure that ctrl_dll_on is HIGH.". This is covered in lines [572-573], and maps to the fourth point in the initialization sequence above.
So we started the DLL, and jumped into a loop waiting for it to lock. The loop is implemented in lines [575-579]

after the DLL locks, we move on to point five in the above initialization procedure, lines [601-602] are used to fill in the values in CONCONTROL register, comments in the code explain what's happening.

point 6 took me too much time, so it deserved a rant tag.

<rant> The documentation (schematic) seems to think that the four DDR2 RAM chips are of type 'K4T1G084QQ-HCE6 or HCF7 (DDR2, 1Gb)', the 1GB had me worried, because The total RAM size, according to the Tiny210 board user's manual, is 512MB, so I assumed something's wrong with the schematics, I grabbed my flash light, and crossed my eyes gazing (with a bit of a jaw gape) at the writings on the RAM chips, they read K4T1G084QF, close enough, I thought, the last 'F' must be something related to the package or electrical properties, who cares! But then I looked up the correct data sheet of K4T1G084QF RAM chip, which still insisted on the 1GB nonsense, then I found out that they mean to say that, 8 x K4T1G084QF chips = 1 GB, at this point, I've seen the light! It became obvious that I was looking at a Rank of DDR2 RAM consisting of four chips, 128MB each (4 banks), this observation did match the information in K4T1G084QF's documentation, particularly, Ordering Information table in page four, and DDR2 SDRAM ADDRESSING section. You would think that this information would help me figure out what to place in the num_chip bits of MEMCONTROL, NO.
But since S5PV210 chip stated in MEMCONTROL table, that the only supported number of chips (now what in the name of zeus do they mean by 'chip'!) are either one or two, I had the diabolical idea of trying both cases, but then it hit me, the schematic might have used the wrong part number for the memory chips, but it might still be capable of showing me what is connected to what. I took a look at the memory section of the schematic, memory port zero is connected to the nand controller, memory port one is connected to all the four chips, and memory port two is not connected to anything, assuming memory port 1 maps to DMC0, and memory port two maps to DMC1, I guessed that the meaning of 'chip' in the S5Pv210's users manual is actually rank, and I have only one rank controlled by DMC0. </rant>

Lines [609-623] set MEMCONTROL, MEMCONFIG0 and PRECHCONFIG registers using information found in the DDR2 datasheet, read the comments in the code for a better understanding.

Looking at the datasheet of the memory chip used in Tiny210, you can see that the DDR2 chip is a DDR2-800 or DDR2-667 (FriendlyARM's documentation should have stated it plainly, but they thought it isn't important), the first thing we can understand here is that the ram chip used is an overkill, since the bus speed (AXI) cannot exceed 200MHz.
We also realize that the RAM memory chip is capable of running at 200MHz, If we go back to the clock setup code, we see that, had we chosen to use SCLKA2M as the source of MUXDMC0 (lines 235-238 in the last code listing in the clock subsystem initialization post), we would have been able to divide it by two, achieving a better frequency for the memory clock, but I recklessly chose SCLKMPLL, which has an odd operating frequency, this poor choice costs us 32Mhz, Now you'd think I would go back and fix it, but you'd be wrong. Lines [626-627] deal with the setting of TIMINGAREF register bits [0:15], read the comments for clarifications.

From 13.1 Refresh Parameters by Device Density in the DDR2 datasheet, we see that trfc for 512 device density is 105ns, trrd (row-to-row activation delay) is 10ns (the highest number found in the datasheet of the memory device), it is worth mentioning that, the TIMINGROW table in S5PV210 UM seems to define trrd as bank to bank activation delay, which is just weird.

from 13.2 Speed Bins and CL, tRCD, tRP, tRC and tRAS for Corresponding Bin in DDR2 datasheet, we find trp = 15ns, trcd = 15ns, tras = 45ns and trc = 60ns,
fetching these values was easy enough, but how do you actually initialize the relevant registers? Lets take the example of t_rfc value in TIMINGROW table (from S5PV210 UM), the description field says:
"Auto refresh to Active / Auto refresh command period, in cycles
t_rfc * T(mclk) should be greater than or equal to the minimum value of memory tRFC.".
I think we can all agree that this sentence requires some serious cryptography skills to decipher. What this means, in plain English, is that if you are to multiply the number in these bits (bits [24:31) by the TIME PERIOD of the clock signal at which the DMC is driving the memory chip (166MHz in our case), then the result should be greater than or equal to the trp value you fetched from the memory datasheet, so:

105 <= trfc_chosen_number * 1/166Mhz ==> trfc_chosen_number = 105ns/6ns = 17.5

we will round up to 18, since we have the greater than condition, this will still be an acceptable value, note that the higher the value, the safer you are from errors, but the more waiting your processor would have to do in order to get a response from the memory chip. All other values are calculated the same way:

 t_rrd = 10ns/6ns =  1.66 ==> 2
 t_rp  = 15ns/6ns =  2.5  ==> 3
 t_rcd = 15ns/6ns =  2.5  ==> 3
 t_rc  = 60ns/6ns =  10
 t_ras = 40ns/6ns =  6.66 ==> 7

These values are then stored in TIMINGROW register, lines [629-630] take care of this step.

Moving on to the TIMINGDATA register setup, the most important property to study in this section is the CL (CAS latency), which is the time it takes the memory chip to make available the requested column data after the column address is placed in the address lines. There are some considerations one should take into account when configuring the CL value, generally speaking, the lower the CL value, the faster the response, lets take the example of our memory chip, which is running at 166Mhz. Because of the double data rate magic of DDR SDRAM, the effective frequency will be 332Mhz (2X166Mhz), if we consider a scenario in which the memory controller asks for data that is present in a certain column by issuing a read command and placing the column address on the address lines, if we follow the memory chip's datasheet, and use the highest CL value of 6 (remember that we are partially blind here, cause friendlyARM thought we do not really need to know exactly which chip is used), the time it will take for the memory chip to spit out the first burst word will be 6 x 1/332 seconds (0.018 seconds), for a CL value of 3, this time would be reduced to 0.009 seconds. If you're keeping up, you should be wondering why in the seven faced god's name (<- Game of Thrones brilliant reference) would we even consider choosing a higher CL value? Well, the short, non-technical answer would be because the datasheet states that, based on the rated clock speeds, the CL value should be as stated in 2. Key Features section's nameless table (first table under the section title in DDR2 datasheet), but you are given the option of setting CL at lower values, this is because you may be running the memory chip at a lower speed than the speed the manufacturer recommends, which is exactly what we are doing. The long technical answer requires digging into the electrical construct of the memory chip, which is beyond this post.

The values required for the TIMINGDATA register are found in the memory chip's datasheet in the tables of section 13.3 Timing Parameters by Speed Grade, the values are: tWTR = 7.5ns , tWR = 15ns trtp = 7.5ns , we'll calculate the values of the TIMINGDATA register fields the same way we calculated the values for TIMINGROW.

 t_wtr = 7.5ns/6ns = 1.25 ==> 2
 t_wr  = 15ns/6ns =  2.5  ==> 3
 t_rtp = 7.5ns/6ns = 1.25 ==> 2

Lines [632-633] handle storing these values into TIMINGDATA register.

for the TIMINGPOWER register, the same tables give us: tFAW = 37.5ns, tXSR = 200 nck,tXP = 2 nck, tCKE = 3 nck, tMRD= 2 nck. for the parameters that are defined in nck (number of clock cycles), we have to calculate the actual time delay in nanoseconds based on our clock frequency by calculating the time it takes to delay for the specified number of cycles, so tXSR would equal 200/166Mhz = 120.5 ns, tXP = 2/166Mhz = 12ns, tCKE = 3/166Mhz = 18ns, tMRD = 2/166Mhz = 12ns, now we can calculate the values like we did above:

t_faw = 37.5ns/6ns = 5.8 ==> 7
t_xsr = 120.5ns/6ns =  20.08 ==> 21 (0x15)
txp   = 12ns/6ns = 2
t_cke = 18ns/6ns = 3
t_mrd = 12ns/6ns = 2

Lines [635-636] handle storing these values into TIMINGPOWER register.

For the remaining commands, the description field of the DIRECTCMD register table explians:
"Type of Direct Command:
0x0 = MRS/EMRS (mode register setting),
0x1 = PALL (all banks precharge),
0x2 = PRE (per bank precharge),
0x3 = DPD (deep power down),
0x4 = REFS (self refresh),
0x5 = REFA (auto refresh),
0x6 = CKEL (active/ precharge power down),
0x7 = NOP (exit from active/ precharge power down or deep power down,
0x8 = REFSX (exit from self refresh)
0x9 = MRR (mode register reading),
0xa ~ 0xf = Reserved

If a direct command is issued, AXI masters must not access memory. It is strongly recommended to check the command queue’s state by Concontrol.chip0/1_empty before issuing a direct command You must disable dynamic power down, dynamic self refresh and force precharge function (MemControl register).
MRS/EMRS and MRR commands should be issued if all banks are in idle state."

so the required initialization steps 14 to 25 are performed sequentially following the explanation above. The subroutine wraps up by printing a message to the Host terminal, and returning back to the caller routine.

Running from memory

Lighting LEDs as a response of a successful execution is getting a bit boring, so we'll use the initialized UART interface of the S5PV210 processor, and have the processor execute some commands from RAM, and tell us about it if successful.
Three more routines were added for this purpose. The first is copy_To_mem subroutine (line 718), which takes the starting address of the to-be copied data segment and the length of this segment as arguments, in addition to the destination start address. It then loops through the data segment, copying a word in each iteration. Note that this subroutine copies data from SRAM to main memory, it is incapable of copying data from SD/MMC cards. There are better ways to do this, most notably the use of NEON's SIMD instructions, which could give a superb performance. We aren't looking for performance though, this is just a test, so we wont worry too much about it.
The second subroutine is print_uart_hex (line 479), which takes a hex number and prints it to the UART terminal after converting it to ascii. Finally, the third subroutine is _BL2(line 788), which is a piece of code that is saved in read only memory, so that it can be copied and executed in the recently initialized memory as a test.
There's an important point that you shouldn't miss about the read only section in our startup.s. The fact that we called it read only does not really make it read only, it simply maps the included code to a section titled read only. For the read only property's enforcement, you need to use your linker script (or scatter file) to map this section to the part of memory where you'd like to have read only code/data stored. And enforce it using certain software mechanisms that we'll see about in a couple of posts.

Moment of truth

Now that we've added all the requisite parts, lets put it to the test. You can clone the project from git repository . After making and fusing, I inserted the card in the development board's SD card slot, connected to the target board using kermit, and observed the results in the video below.

CONCLUSION

Now that we are confident that our memory initialization actually works, we'll move on, to a more practical task in the next post, which is still copying BL2 code into RAM, but from a source that is external to the BL1 SD card's 16KB segment, because in a real life scenario, unless your second stage bootloader has very limited functionality, it will probably be placed beyond the 16KB allocated for BL1.
Looking at the S5PV210 iROM applicaiton note document, you can see that there are copying functions implemented in the internal ROM code, that can be used to copy code from one place to another, The implementation that concerns us is SD/MMC Copy Function. We'll look into that next.


References

Some of these documents are too specific about a certain company's product, but reading them is still a good way to extract the essentials, you should also read the JEDEC specifications if you're into that kind of stuff.

  • MODERN DRAM MEMORY SYSTEMS: PERFORMANCE ANALYSIS AND SCHEDULING ALGORITHM by David Tawei Wang

  • ADSP-21161 SHARC DSP Hardware Reference (chapter 8)

  • HOW TO USE SDRAM USER’S MANUAL (http://www.elpida.com)

  • DDR2 Device Operations & Timing Diagram (SK hynix)

  • AMBA® DDR, LPDDR, and SDR Dynamic Memory Controller DMC-340 Technical Reference Manual

      (1) APB bus has access to the SDRAM controller as well, I reckon it is used for both initialization and direct memory access from certain peripherals, we'll investigate the matter more closely when we need to.