int i = 257;
int *iPtr = &i;
printf("%d %d", *((char*)iPtr), *((char*)iPtr+1) );
The integer value 257 is stored in the memory as, 00000001 00000001, so
the individual bytes are taken by casting it to char * and get printed.
int i = 258;
int *iPtr = &i;
printf("%d %d", *((char*)iPtr), *((char*)iPtr+1) );
The integer value 257 can be represented in binary as, 00000001
00000001. Remember that the INTEL machines are ‘small-endian’
machines. Small-endian means that the lower order bytes are stored in the
higher memory addresses and the higher order bytes are stored in lower
addresses. The integer value 258 is stored in memory as: 00000001
char *ptr = &i;
The integer value 300 in binary notation is: 00000001 00101100. It is
stored in memory (small-endian) as: 00101100 00000001. Result of the
expression *++ptr = 2 makes the memory representation as: 00101100
00000010. So the integer corresponding to it is 00000010 00101100 =>
556. So lowest byte is taken as left most byte in the number. Highest
orber byte is stored, in the lowest memory location.
What is byte?
- A byte is a sequence of 8 bits
- The "leftmost" bit in a byte is the biggest. So, the binary sequence 00001001 is the decimal number 9. 00001001 = (23 + 20 = 8 + 1 = 9).
- Bits are numbered from right-to-left. Bit 0 is the rightmost and the smallest; bit 7 is leftmost and largest.
So what's the problem -- computers agree on single bytes, right?
Well, this is fine for single-byte data, like ASCII text. However, a lot of data needs to be stored using multiple bytes, like integers or floating-point numbers. And there is no agreement on how these sequences should be stored.
Byte ExampleConsider a sequence of 4 bytes, named W X Y and Z - I avoided naming them A B C D because they are hex digits, which would be confusing. So, each byte has a value and is made up of 8 bits.
Byte Name: W X Y Z
Location: 0 1 2 3
Value (hex): 0x12 0x34 0x56 0x78
For example, W is an entire byte, 0x12 in hex or 00010010 in binary. If W were to be interpreted as a number, it would be "18" in decimal (by the way, there's nothing saying we have to interpret it as a number - it could be an ASCIIcharacter or something else entirely ).
With me so far? We have 4 bytes, W X Y and Z, each with a different value.
Understanding PointersPointers are a key part of programming, especially the C programming language. A pointer is a number that references a memory location. It is up to us (the programmer) to interpret the data at that location.
In C, when you cast (convert) a pointer to certain type (such as a char * or int *), it tells the computer how to interpret the data at that location. For example, let's declare
void *p = 0; // p is a pointer to an unknown data typeNote that we can't get the data from p because we don't know its type. p could be pointing at a single number, a letter, the start of a string, your horoscope, an image -- we just don't know how many bytes to read, or how to interpret what's there.
// p is a NULL pointer -- do not dereference
char *c; // c is a pointer to a single byte
Now, suppose we write
c = (char *)p;
Ah -- now this statement tells the computer to point to the same place as p, and interpret the data as a single character (1 byte). In this case, c would point to memory location 0, or byte W. If we printed c, we'd get the value in W, which is hex 0x12 (remember that W is a whole byte).
This example does not depend on the type of computer we have -- again, all computers agree on what a single byte is (in the past this was not the case).
The example is helpful, even though it is the same on all computers -- if we have a pointer to a single byte (char *, a single byte), we can walk through memory, reading off a byte at a time. We can examine any memory location and the endian-ness of a computer won't matter -- every computer will give back the same information.
So, what's the problem?Problems happen when computers try to read multiple bytes. Some data types contain multiple bytes, like long integers or floating-point numbers. A single byte has only 256 values, so can store 0 - 255.
Now problems start - when you read multi-byte data, where does the biggest byte appear?
- Big endian machine: Stores data big-end first. When looking at multiple bytes, the first byte (lowest address) is the biggest. This is like binary arithematic we do.
- Little endian machine: Stores data little-end first. When looking at multiple bytes, the first byte is smallest.
Again, endian-ness does not matter if you have a single byte. If you have one byte, it's the only data you read so there's only one way to interpret it (again, because computers agree on what a byte is).
Now suppose we have our 4 bytes (W X Y Z) stored the same way on a big-and little-endian machine. That is, memory location 0 is W on both machines, memory location 1 is X, etc.
We can create this arrangement by remembering that bytes are machine-independent. We can walk memory, one byte at a time, and set the values we need. This will work on any machine:
c = 0; // point to location 0 (won't work on a real machine!)This code will work on any machine, and we have both set up with bytes W, X, Y and Z in locations 0, 1, 2 and 3.
*c = 0x12; // Set W's value
c = 1; // point to location 1
*c = 0x34; // Set X's value
... // repeat for Y and Z; details left to reader
Interpreting DataNow let's do an example with multi-byte data (finally!). Quick review: a "short int" is a 2-byte (16-bit) number, which can range from 0 - 65535 (if unsigned). Let's use it in an example:
short *s; // pointer to a short int (2 bytes)So, s is a pointer to a short, and is now looking at byte location 0 (which has W). What happens when we read the value at s?
s = 0; // point to location 0; *s is the value
- Big endian machine: I think a short is two bytes, so I'll read them off: location s is address 0 (W, or 0x12) and location s + 1 is address 1 (X, or 0x34). Since the first byte is biggest (I'm big-endian!), the number must be 256 * byte 0 + byte 1, or 256*W + X, or 0x1234. I multiplied the first byte by 256 (2^8) because I needed to shift it over 8 bits.
- Little endian machine: I don't know what Mr. Big Endian is smoking. Yeah, I agree a short is 2 bytes, and I'll read them off just like him: location s is 0x12, and location s + 1 is 0x34. But in my world, the first byte is the littlest! The value of the short is byte 0 + 256 * byte 1, or 256*X + W, or 0x3412.
But do you see the problem? The big-endian machine thinks s = 0x1234 and the little-endian machine thinks s = 0x3412. The same exact data gives two different numbers. Probably not a good thing.
Yet another exampleLet's do another example with 4-byte integer for "fun":
int *i; // pointer to an int (4 bytes on 32-bit machine)Again we ask: what is the value at i?
i = 0; // points to location zero, so *i is the value there
- Big endian machine: An int is 4 bytes, and the first is the largest. I read 4 bytes (W X Y Z) and W is the largest. The number is 0x12345678.
- Little endian machine: Sure, an int is 4 bytes, but the first is smallest. I also read W X Y Z, but W belongs way in the back -- it's the littlest. The number is 0x78563412.