korena's blog

13. Bootloader Implementation - part 3

In this post, we'll walk through the general concepts of the typical networking stack from a practical point of view. I'll try my best to avoid going into any unnecessary details for our purpose. Let's start with a general sketch of how this post will be structured, we'll first describe the format of the data that is transferred between the nodes of the network, briefly describing what each section of the transferred unit of data means, then we'll move to describe this unit of data with data structures so we can relate to it more as "just a piece of data" to be processed from a software's point of view. Through this post, we'll hint to the OSI layer that is concerned with processing each section of the network data units involved in communication. We're only constraining ourselves to Ethernet technology, which is a property of the physical layer, because we're about to write a driver that deals with an Ethernet chip.

What goes on the wire (Data Link Frames)

The Data Link Frame is a unit for transmitting digital data through a network, this is what actually leaves the sending node's hardware, into the Ethernet cable, and finally into the receiving node's hardware, Of course, what actually goes on the wire are a sequence of bits representing a data frame, but you know what I mean.
Naturally, The Data Link Frame has a specific, standardized format, so let's get into it. A data link frame carries the following sections:

![](/content/images/2016/10/OSI_test.png)
OSI Model
######A. Frame header:

The frame header section of the Data Link Frame contains the following bits:

  1. The Media Access Control (MAC) header, which carries information such as
  • MAC addresses of source and destination Network Interface Cards (NIC).

  • Preamble byte, which tells the receiver how the sender wants it to clock-in data.

  • Start frame delimiter (SFD), which is a pattern that allows the receiver to know the position of the header in the byte stream, helps it learn how to parse the frame.

  1. Logical Link Control (LLC) header, which carries the unique identifier of the network layer(L3) protocol type (IPv4 IPv6 ICMP IPsec IGMP IPX AppleTalk X.25 PLP) which is used by the receiving end to hand over control to the correct network layer(L3) protocol handler, this handler is a software routine. It also carries other optional fields that we don't care about right now.

Note that in the OSI model, the previous two sub headers are the responsibility of the Data link layer (L2)


B. frame data:

carries the payload (packets), formatted the way the network layer(L3) would like it to be!. The frame data length is defined by the Maximum Transmission Unit (MTU), this is defined based on the properties of the physical layer(L1) and data link layer(L2). The MTU of standard Ethernet 10/100 Mbps links is 1500 bytes.

C. frame trailer:

contains a CRC/checksum value of the whole frame, sender calculates them, receiver verifies using them, could also contain an End Frame Delimiter (EFD).



Note 1: Since we're not using TCP/IP , we're doing everything in terms of the OSI model, however, layers often get blurred out in implementation, especially higher layers, so keep in mind that, with a slightly different distribution of tasks between layers, the same overall implementation can be developed further to handle TCP. This is in no way a hint to the not so much a fact that implementing TCP is a simple process !

Note 2: The above description is restricted to the functionality related to the lowest two layers, the upper five layers are concerned with the payload, which we reference here as segments for layer 4 and packets for layer 3

This has been a very brief description of the most basic structure of a data link frame, which is built/analyzed one step above the lowest layer of the OSI model, since the lowest layer (the physical layer) is implemented in hardware, our Ethernet chip will be handling it's duties. As we will see later, Ethernet chips tend to provide many services that are not strictly hardware related, but would complicate the development of drivers if left for our driver to handle, such as calculating CRCs.

We're now going to take a data link frame example, and represent it with data structures to get a good grip of what we're dealing with.

Ethernet II frame

The data link frame type we're going to look at is the Ethernet II frame. The Ethernet II frame has the structure:

![](/content/images/2016/10/ethernet_II.png)

The first thing we note here is that we broke the theoretical structure of a data link layer frame, the information is still there, but they're categorized differently, the reason for this is the fact that, as you deal with layer two upwards, you won't be seeing the first two physical layer related data packs, namely, preamble and SFD, the reason for this is the fact that your Ethernet chip, and the driver implementation usually take care of these bits, including the CRC checksum tail, and pass status flags to tell the above layers of software about the status of the frame and what to do about it if it has any errors, this is your operating system being helpful. To test this, try using a network raw packet dumping tool such as tcpdump, you will observe that, the first thing you see is indeed the destination MAC address, followed by the source MAC address, a possible 802.1Q tag (4 bytes, where I placed a question mark in the illustration above), then the Ethertype, which can be seen as the LLC header we talked about above. This will be followed by whatever payload there is, the contents of this payload will vary depending on what packets are actually being sent, take the example of sending a PING request to a remote host, the data being transmitted will look something like:

19:16:44.733901 IP (tos 0x0, ttl 64, id 50944, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.0.106 > ec2-52-74-128-31.ap-southeast-1.compute.amazonaws.com: ICMP echo request, id 6177, seq 2, length 64
        0x0000:  c8d3 a3de 5c18 846a 4c33 2414 0800 4500  ....\..:K4#...E.
        0x0010:  0054 c700 4000 4001 fe2c c0a8 006a 344a  .T..@.@..,...j4J
        0x0020:  801f 0800 b226 1821 0002 9c59 1b57 0000  .....&.!...Y.W..
        0x0030:  0000 ac32 0b00 0000 0000 1011 1213 1415  ...2............
        0x0040:  1617 1819 1a1b 1c1d 1e1f 2021 2223 2425  ...........!"#$%
        0x0050:  2627 2829 2a2b 2c2d 2e2f 3031 3233 3435  &'()*+,-./012345
        0x0060:  3637                                     67

Looking at line three in the above listing, which is the result of dumping a ping request to some host, you can see that :
the Destination MAC address is [c8d3 a3de 5c18], the Source MAC address is [846a 4c33 2414], the EtherType [0800] (specifies Internet Protocol version 4), then the payload.
Since this is a PING request, that involves IP addresses of type IPv4, the payload should conform to the IPv4 packet format, which consists of a header and data sections, the header section looks like:

![](/content/images/2016/10/IPv4-header.png)

So let's break down the data of our PING request dump into what they actually mean (keep an eye on the header illustration above):

  • first 4 bits hold the value [4], which is the version of the IPs used in this IP packet.
  • Second 4 bits hold the value [5], which is the Internet Header Length in multiples of 4 (5 x 4 = 20 bytes), this count starts from the first byte in this payload, which is [45] in line three of the dump above, until byte [1f] in line five.
  • next bytes shows ToS as [00], which describes the priority of this packet, irrelevant for now.
  • next half word shows the Length of the whole packet in bytes [0054], this is the entire size of the transmitted packet (84 in this case).
  • next half word shows the Identification tag [c700], this is used to help reconstruct packet from several fragments, this is not happening now, so just ignore it.
  • the next half word [4000] starts by three distinct flags, the first is zero, the second is set to one, which indicates that this packet is not to be fragmented, the third bit indicates if there are more fragments to follow this packet, the rest of the bits (13 bits) define the offset of this fragment, that is of course, in case this packet actually is a fragment, which is not in this particular case, so you see only zeros.
  • the next byte defines TTL to be [40], so if this packet was hoping through routers to get to it's destination, it would live through 0x40 hops until it's finally discarded and forgotten.
  • The next 8 bytes represent the protocol ID for the contents of this packet, in this case its [01], which is ICMP protocol
  • next you have the Header checksum [fe2c].
  • then comes a word holding the Source IP address [c0a8 006a] (192.168.0.106).
  • and finally a word holding the Destination IP address [344a 801f] (52.74.128.31).

The next stream of bytes represents possible optional fields, which in our ping request dump are not used (because we know that IHL is only 5), followed by the data section of the IP packet. Since the protocol field of our IP packet header specifies ICMP as the protocol, we should expect the data section of our packet to represent an ICMP segment.

NOTE: If you've been following closely, you see that we're moving from the outer shell, being the frame, which is the responsibility of the data link layer (L2), down to the packet, which is the responsibility of the network layer (L3), down to the segment within the data section of the packet, which is the responsibility of the transport layer (L4).


The structure of the ICMP (Internet Control Message Protocol) segment consists of a header section that looks like:

Followed by a data section, but let's match the data in our ping request dump to this header before we move forward, the first 2 bytes of the packet data segment (starts right after the destination IP address in line 5 of the listing above) is [0800], this represents the type and code fields of our ICMP segment header, referring to table Control messages table of ICMP, you can see that this byte represents echo request. Following the type and code fields, we have 2 bytes [ffc9], that's your checksum for error correction. Next, we have 2 bytes [0f5c] representing the identifier, followed by 2 bytes [0001] representing the sequence number, note that this stuff, the identifier and sequence number, are put here because they correspond to the type and code we passed before, which identify the purpose of this ICMP segment to be a ping request, if we were using a different set of type and code, we would have something other than these two half words in this section of the ICMP header. Here's a shameless quote from wikipedia :

The Identifier and Sequence Number can be used by the client to match the reply with the request that caused the reply. In practice, most Linux systems use a unique identifier for every ping process, and sequence number is an increasing number within that process. Windows uses a fixed identifier, which varies between Windows versions, and a sequence number that is only reset at boot time.

So you get the idea, following the sequence number, we have the payload that could consist of various information that are implementation dependent, eventually, when the ping target host gets this ping request, it should reply with the same payload (echo request), to let the pinging source machine know that it's alive and kicking.

What else is out there

We've deliberately ignored some of the variations of structure in the Ethernet frame, that are cause by introducing more standards and accommodating more options at the data link layer level, such as the possibility of substituting EtherType bits by the length of the payload, or the inclusion of the IEEE 802.1Q tag to indicate virtual lan membership and IEEE 802.1p priority. To make matters more complex and painful, other types of Ethernet frames exist, namely Novell raw IEEE 802.3, IEEE 802.2 LLC, and IEEE 802.2 SNAP. We focused on Ethernet version 2 because of it being the most commonly used type today, and we won't be paying attention to the rest as they are outside the scope of this project.

Having gone through all the above, you can see how challenging it is to implement a network stack to handle all the possible standards out there, where each protocol has it's own specs, and each spec has optional and mandatory fields and bits, for this reason, we're going to implement the absolute minimum required functionality to get us to load the Linux kernel over TFTP, and we won't worry too much about the rest of the networking protocols out there. But just to give you an idea of how deep this topic is, take the Ethertype half word we encountered in our Ethernet frame above, which contained the type of IPv4, and gave us a hint that the payload is an IPv4 packet format, this Ethertype value could be any of the values presented in this table!, further more, the protocol field that told us that the data segment contained in our IPv4 packet is of the protocol ICMP, could have held any of these values!!

Next, we'll look into representing the Ethernet frame, IP packet and ICMP segment with data structures. In our implementation of the networking stack for the bootloader we're working on, we're going to have to implement a couple more structures, but we'll just do this now, to get a better feel of the tasks ahead.

Data structures

Now that we have an idea about the data units used to pass data around on a network, let's actually represent these data using data structures, so we can move away from the theoretical concepts into implementation. The enclosing data structure we're going to use is of course a C struct:

/*
 *	Ethernet header
 */

struct ethernet_hdr {
	uint8_t		et_dest[6];	/* Destination node		*/
	uint8_t		et_src[6];	/* Source node			*/
	uint16_t	et_protlen;	/* Protocol or length		*/
};

/* Ethernet header size */
#define ETHER_HDR_SIZE	(sizeof(struct ethernet_hdr))


struct ip_hdr {
	uint8_t		ip_hl_v;	/* header length and version	*/
	uint8_t		ip_tos;		/* type of service		*/
	uint16_t	ip_len;		/* total length			*/
	uint16_t	ip_id;		/* identification		*/
	uint16_t	ip_off;		/* fragment offset field	*/
	uint8_t		ip_ttl;		/* time to live			*/
	uint8_t		ip_p;		/* protocol			*/
	uint16_t	ip_sum;		/* checksum			*/
	struct in_addr	ip_src;		/* Source IP address		*/
	struct in_addr	ip_dst;		/* Destination IP address	*/

};

#define IP_OFFS		0x1fff /* ip offset *= 8 */
#define IP_FLAGS        0xe000 /* first 3 bits */
#define IP_FLAGS_RES	0x8000 /* reserved */
#define IP_FLAGS_DFRAG	0x4000 /* don't fragments */
#define IP_FLAGS_MFRAG	0x2000 /* more fragments */

#define IP_HDR_SIZE		(sizeof(struct ip_hdr))

struct icmp_hdr {
	uint8_t		type;
	uint8_t		code;
	uint16_t	checksum;
	union {
		struct {
			uint16_t	id;
			uint16_t	sequence;
		} echo;
		uint32_t	gateway;
		struct {
			uint16_t	unused;
			uint16_t	mtu;
		} frag;
		uint8_t data[0];
	} un;
};

#define ICMP_HDR_SIZE		(sizeof(struct icmp_hdr))
#define IP_ICMP_HDR_SIZE	(IP_HDR_SIZE + ICMP_HDR_SIZE)

The data structures represent the Ethernet header, the IP header, and the ICMP header, you can see how it becomes quiet clear once you match the standard formats with structures, the structure representation maps the above illustration for each of these headers, take a look at them and compare. One thing worthy of noting about the ICMP header is the fact that it contains a union, that is because of the fact that the lower word of the header depends on the ICMP type and code.

Conclusion

We've done no coding in this post, but it was necessary to pave the way for what we're going to do next. One thing we can take away from this post, is that it would be very counter productive for anyone to try and implement a network driver from scratch without having a compelling reason to do so. You should always start where someone else has finished, so that you can get on with your life. In our case, all we want is to get TFTP to work, this requires a subset of the functionality available in open source drivers out there, so we'll get our hands on an open source driver from Linux/u-boot, and strip it down to the minimum required functionality we need for our purpose. Thanks for reading.

References