Friday, May 30, 2014

Understanding IA64; the Beginning

[This article was written in 2007. That time I was porting my 32bit driver to 64bit driver for x64 and IA64 platforms. This is what I learned from the experience. I just post it now for archiving purpose.)


Understanding IA64; the Beginning

Ive expected the 64-bit computing world to come soon for a while, but it has taken more time than I expected even though VISTA has already been released. Anyway I am sure the 64-bit platform will be used everywhere sooner or later. Even now you can find a PC which supports x64 easily so I guess you have already had a chance to write your own 64-bit program at least x64.
However, what about IA64? Its not for the PC platform but for a server platform, but I think everyone should also be aware of IA64. Are you ready to write a code or debug your code on IA64? Because it has different architectural issues compared with x86 or x64, there are several things you need to know about IA64. I am about to describe the basics in this article.

Register set of IA64
The register set of IA64 is completely different from x86 and x64. The first step of understanding CPU architecture is to understand the register model of the CPU, so let me show the register set of IA64 from the Intel Itanium Architecture Software Developers Manual.

Figure 1 register set of IA64



First of all, you cant find any of the familiar register names, EAX, EBX, ESP, or EBP used in x86. It can be an embarrassing situation to you so you need to concentrate on Figure 1 again. If you look at the names of the registers carefully, you will figure out the meaning of the registers. gr0 to gr127 are general registers which are used for general purposes like EAX, EBX and ECX. You can see the number of general registers was increased so a compiler might use the registers more efficiently. Another thing you need to know about general registers is that some registers are reserved for special purposes, I will mention it in the middle of this article.

Floating-point registers which are used for floating-point operations are named fr0 to fr127 and the instruction pointer is named IP. These kinds of registers might be familiar to you because x86 has a similar register set so you can understand the instruction include these registers easily. What about the others? You can reference IA64 manual when you encounter a register you dont know.

Instruction of IA64
Have you ever seen disassembly code from IA64? Figure 2 shows disassembly code from IA64. There is one thing you need to know - r represents gr registers and b represents br registers in WinDbg.

Figure 2 - Disassembly code of IA64

e0000000`8402ecc0        adds   ret0=f, r0
e0000000`8402ecc4        movl   r2=e0000000`ffa00000 ;;

e0000000`8402ecd0        nop.m  0
e0000000`8402ecd4        mov    b6=r2, +0
e0000000`8402ecd8        br.cond.sptk.few b6 ;;

Disassembly code of Figure 2 shows some instructions such as adds, movl, nop.m, mov, and br.cond.sptk.few. Its also hard to understand directly. However you can guess the meanings of the instructions as being the same as add, mov, nop, and jmp in the x86 platform. The operands of the instructions look odd but you can guess the meanings if you look at it carefully. In the case of the adds instruction, it means that the result of constant f + r0 register value should be placed into the ret0 register where r0 is gr0. movl means that the constant 0xe00000000ffa00000 should be moved into the r2 register where r2 is gr2. br.cond.sptk.few looks complicated but you can separate the string of instruction and find the meaning. br is abbreviated form of branch and it is equivalent to jmp in x86 instruction. br can have various meanings based on the following word. For example, br.cond means conditional jump, br.call means procedure call and br.ret means procedure return. sptk.few is a branch prediction hint which can be provided to the processor to improve branch prediction. You can ignore this part because it does not affect the functional behavior of the program. These instruction formats might be unfamiliar to you but Im convinced that you could get feel for these things.

Digging into instruction format
In order to get more sense, we need to dig into instruction format of IA64 further. Figure 3 shows the instruction format of IA64.

Figure 3 – Instruction format of IA64 - bundle



This bundle format comes from the IA64 processor architecture called Explicitly Parallel Instruction Computing (EPIC) in order to improve performance. A bundle which is 128 bits(16 bytes) in size is composed of three instruction slots and one template where each instruction size is 41 bits and the template size is 5 bits. Most instructions fit into one slot so most of the bundles have 3 instructions. On the other hand, some instructions take two slots because those need more space to describe the function of the instruction. I can now show the example to you in Figure 2. The first two lines show a bundle that has 2 instructions and the next three lines show a bundle that has 3 instructions.

This bundle format can make you analyze instruction addresses in WinDbg incorrectly. In the strict sense, the address e0000000`8402ecc4 in Figure 2 does not indicate movl   r2=e0000000`ffa00000 because the movl instruction exists at e0000000`8402ecc0 + 46 bits not e0000000`8402ecc0 + 32 bits(4 bytes). WinDbg can not express an address in bit units, I guess that they had no choice so they decided to display an estimated byte address. However, all bundles begin at every 16 byte boundary exactly.

Another side effect from the instruction format is that programs built by an IA64 compiler would be larger. Even though the same C source file is compiled for x86, x64 and IA64, the program size of IA64 is 2 or 3 times greater than that of x86 or x64. I saw that the compiled program has many more nop instructions. I guess it was generated by EPIC architecture to do Parallel Instruction Computing. Of course the compiler might do the right thing but its also true the code space for nop instructions seems to be a waste. Dont be surprised after building your own IA64 program.

Format of adds instruction
Each instruction has its own instruction format. For example, Figure 4 shows adds instruction. Since instruction size is limited by 41 bits, opcode, operands and extensions should be placed within the 41 bit boundary and the locations vary by every instruction.

Figure 4 – instruction format of adds



The opcode of adds instruction which is 8 is located in the 4 most significant bits. The operands are expressed as r1 = imm14, r3 where r1 is located in bit 6, r3 is located in bit 20, and the location of imm14 is spread throughout several positions in the instruction. Its complicated to understand so lets try to figure it out from an example the first instruction in Figure 2.

e0000000`8402ecc0        adds   ret0=f, r0

The operand r1 = imm14, r3 in Figure 4 is equivalent to ret0 = f, r0 above. Therefore, r1 (bit 6 ~ 12) has a value which indicates ret0 register and r3 (bit 20 ~ 26) has a value which indicates r0 register and Imm14 ( bit 13 ~ 19, 27 ~ 32, 36 ) has constant 0xF. We need to make sure of this by viewing the real memory data of the instruction address.

kd> dq e0000000`8402ecc0       
e0000000`8402ecc0  00ffa100`003c4005 68001000`40600000

We can now say that this is a 16 byte bundle of data which has 2 or 3 instructions. I want to concentrate on the first instruction adds in instruction slot 0 so we need to pick up bits 5 ~ 45 from the 16 byte(128 bit) bundle as follows:

MSB    1000 0 100 000000 0000000 0001111 0001000 000000    LSB
             op   s         Imm6d        r3         imm7b        r1          qp

You can see that r1 ( 0001000 ) is 8 and it means general register r8. Why is the register r8 not ret0? You cannot find ret0 at all in Figure 1. In fact, ret0 is just the mapping name of r8. Some general registers are mapped to special names, for example, r8 ~ r11 are mapped to ret0 ~ ret3. Figure 5 shows several examples.

Figure 5 – Mapping name of general registers

Mapping name
Original name
Meaning
ret0 ~ ret3
r8 ~ r11
Integer return value
rp
b0
Return pointer
gp
r1
Global pointer
sp
r12
Stack pointer

Lets get back to the 41 bit stream, r3 (0000000) is 0 and it means general register r0. In order to get imm14, you should take s, imm6d and imm7b (0, 000000, 0001111). It can be expressed as the 14 bits stream 00000000001111 and is 0xf in hex decimal. Therefore, you can figure out adds ret0 = f, r0 from the 41 bit instruction slot data. Now, we can write some code to get imm14 by analyzing instruction slot memory data.

ULONG GetImm14FromAdds(ULONG_PTR pFunction)
{
            ULONGLONG bundle;
            ULONGLONG slot0;
            ULONG_PTR imm14;

            // lower 64 bit of bundle => 00ffa100`003c4005
            bundle = *(ULONGLONG*)pFunction;                   

            // get rid of 5 bit template and get 41 bit first instruction
            slot0 = (bundle >> 5) & 0x1FFFFFFFFFF; 
           
            // make imm14 with s 1 bit, imm6d 6 bits, imm7b 7 bits in adds instruction
            imm14 =  ((slot0 >> 13) & 0x7F)                |
                        (((slot0 >> 27) & 0x3F) << 7)         |
                        (((slot0 >> 36) & 0x1) << 13);

            return (ULONG)imm14;
}

Other instructions can be analyzed by the above steps and by referencing IA64 manual.

Conclusion
We have just reviewed about some of the registers and instructions of IA64. At first glance, you would think IA64 has something totally different and it is all new to you. However, you can get over it and realize that its not as difficult as you expect.
Actually, I have felt excited to learn about this during my project concerned with IA64 so I hope you also have a chance to enjoy learning this. I also hope this article gives you confidence when you begin your IA64-based project.