Windows Debugging with WinDbg: Understanding IA64; the Beginning

[This article was written in 2007. That time I was porting my 32bit driver to 64bit driver for x64 and IA64 platforms. This is what I learned from the experience. I just post it now for archiving purpose.)

Understanding IA64; the Beginning

I’ve expected the 64-bit computing world to come soon for a while, but it has taken more time than I expected even though VISTA has already been released. Anyway I am sure the 64-bit platform will be used everywhere sooner or later. Even now you can find a PC which supports x64 easily so I guess you have already had a chance to write your own 64-bit program at least x64.

However, what about IA64? It’s not for the PC platform but for a server platform, but I think everyone should also be aware of IA64. Are you ready to write a code or debug your code on IA64? Because it has different architectural issues compared with x86 or x64, there are several things you need to know about IA64. I am about to describe the basics in this article.

Register set of IA64

The register set of IA64 is completely different from x86 and x64. The first step of understanding CPU architecture is to understand the register model of the CPU, so let me show the register set of IA64 from the Intel Itanium Architecture Software Developer’s Manual.

Figure 1 – register set of IA64

First of all, you can’t find any of the familiar register names, EAX, EBX, ESP, or EBP used in x86. It can be an embarrassing situation to you so you need to concentrate on Figure 1 again. If you look at the names of the registers carefully, you will figure out the meaning of the registers. gr0 to gr127 are general registers which are used for general purposes like EAX, EBX and ECX. You can see the number of general registers was increased so a compiler might use the registers more efficiently. Another thing you need to know about general registers is that some registers are reserved for special purposes, I will mention it in the middle of this article.

Floating-point registers which are used for floating-point operations are named fr0 to fr127 and the instruction pointer is named IP. These kinds of registers might be familiar to you because x86 has a similar register set so you can understand the instruction include these registers easily. What about the others? You can reference IA64 manual when you encounter a register you don’t know.

Instruction of IA64

Have you ever seen disassembly code from IA64? Figure 2 shows disassembly code from IA64. There is one thing you need to know - r represents gr registers and b represents br registers in WinDbg.

Figure 2 - Disassembly code of IA64

e0000000`8402ecc0 adds ret0=f, r0

e0000000`8402ecc4 movl r2=e0000000`ffa00000 ;;

e0000000`8402ecd0 nop.m 0

e0000000`8402ecd4 mov b6=r2, +0

e0000000`8402ecd8 br.cond.sptk.few b6 ;;

Disassembly code of Figure 2 shows some instructions such as adds, movl, nop.m, mov, and br.cond.sptk.few. It’s also hard to understand directly. However you can guess the meanings of the instructions as being the same as add, mov, nop, and jmp in the x86 platform. The operands of the instructions look odd but you can guess the meanings if you look at it carefully. In the case of the adds instruction, it means that the result of constant f + r0 register value should be placed into the ret0 register where r0 is gr0. movl means that the constant 0xe00000000’ffa00000 should be moved into the r2 register where r2 is gr2. br.cond.sptk.few looks complicated but you can separate the string of instruction and find the meaning. br is abbreviated form of branch and it is equivalent to jmp in x86 instruction. br can have various meanings based on the following word. For example, br.cond means conditional jump, br.call means procedure call and br.ret means procedure return. sptk.few is a branch prediction hint which can be provided to the processor to improve branch prediction. You can ignore this part because it does not affect the functional behavior of the program. These instruction formats might be unfamiliar to you but I’m convinced that you could get feel for these things.

Digging into instruction format

In order to get more sense, we need to dig into instruction format of IA64 further. Figure 3 shows the instruction format of IA64.

Figure 3 – Instruction format of IA64 - bundle

This bundle format comes from the IA64 processor architecture called Explicitly Parallel Instruction Computing (EPIC) in order to improve performance. A bundle which is 128 bits(16 bytes) in size is composed of three instruction slots and one template where each instruction size is 41 bits and the template size is 5 bits. Most instructions fit into one slot so most of the bundles have 3 instructions. On the other hand, some instructions take two slots because those need more space to describe the function of the instruction. I can now show the example to you in Figure 2. The first two lines show a bundle that has 2 instructions and the next three lines show a bundle that has 3 instructions.

This bundle format can make you analyze instruction addresses in WinDbg incorrectly. In the strict sense, the address e0000000`8402ecc4 in Figure 2 does not indicate movl r2=e0000000`ffa00000 because the movl instruction exists at e0000000`8402ecc0 + 46 bits not e0000000`8402ecc0 + 32 bits(4 bytes). WinDbg can not express an address in bit units, I guess that they had no choice so they decided to display an estimated byte address. However, all bundles begin at every 16 byte boundary exactly.

Another side effect from the instruction format is that programs built by an IA64 compiler would be larger. Even though the same C source file is compiled for x86, x64 and IA64, the program size of IA64 is 2 or 3 times greater than that of x86 or x64. I saw that the compiled program has many more nop instructions. I guess it was generated by EPIC architecture to do Parallel Instruction Computing. Of course the compiler might do the right thing but it’s also true the code space for nop instructions seems to be a waste. Don’t be surprised after building your own IA64 program.

Format of adds instruction

Each instruction has its own instruction format. For example, Figure 4 shows adds instruction. Since instruction size is limited by 41 bits, opcode, operands and extensions should be placed within the 41 bit boundary and the locations vary by every instruction.

Figure 4 – instruction format of adds

The opcode of adds instruction which is 8 is located in the 4 most significant bits. The operands are expressed as “r1 = imm14, r3” where r1 is located in bit 6, r3 is located in bit 20, and the location of imm14 is spread throughout several positions in the instruction. It’s complicated to understand so let’s try to figure it out from an example – the first instruction in Figure 2.

e0000000`8402ecc0 adds ret0=f, r0

The operand r1 = imm14, r3 in Figure 4 is equivalent to ret0 = f, r0 above. Therefore, r1 (bit 6 ~ 12) has a value which indicates ret0 register and r3 (bit 20 ~ 26) has a value which indicates r0 register and Imm14 ( bit 13 ~ 19, 27 ~ 32, 36 ) has constant 0xF. We need to make sure of this by viewing the real memory data of the instruction address.

kd> dq e0000000`8402ecc0

e0000000`8402ecc0 00ffa100`003c4005 68001000`40600000

We can now say that this is a 16 byte bundle of data which has 2 or 3 instructions. I want to concentrate on the first instruction adds in instruction slot 0 so we need to pick up bits 5 ~ 45 from the 16 byte(128 bit) bundle as follows:

MSB 1000 0 100 000000 0000000 0001111 0001000 000000 LSB

op s Imm6d r3 imm7b r1 qp

You can see that r1 ( 0001000 ) is 8 and it means general register r8. Why is the register r8 not ret0? You cannot find ret0 at all in Figure 1. In fact, ret0 is just the mapping name of r8. Some general registers are mapped to special names, for example, r8 ~ r11 are mapped to ret0 ~ ret3. Figure 5 shows several examples.

Figure 5 – Mapping name of general registers

Mapping name	Original name	Meaning
ret0 ~ ret3	r8 ~ r11	Integer return value
rp	b0	Return pointer
gp	r1	Global pointer
sp	r12	Stack pointer

Let’s get back to the 41 bit stream, r3 (0000000) is 0 and it means general register r0. In order to get imm14, you should take s, imm6d and imm7b (0, 000000, 0001111). It can be expressed as the 14 bits stream 00000000001111 and is 0xf in hex decimal. Therefore, you can figure out adds ret0 = f, r0 from the 41 bit instruction slot data. Now, we can write some code to get imm14 by analyzing instruction slot memory data.

ULONG GetImm14FromAdds(ULONG_PTR pFunction)

{

ULONGLONG bundle;

ULONGLONG slot0;

ULONG_PTR imm14;

// lower 64 bit of bundle => 00ffa100`003c4005

bundle = *(ULONGLONG*)pFunction;

// get rid of 5 bit template and get 41 bit first instruction

slot0 = (bundle >> 5) & 0x1FFFFFFFFFF;

// make imm14 with s 1 bit, imm6d 6 bits, imm7b 7 bits in adds instruction

imm14 = ((slot0 >> 13) & 0x7F) |

(((slot0 >> 27) & 0x3F) << 7) |

(((slot0 >> 36) & 0x1) << 13);

return (ULONG)imm14;

}

Other instructions can be analyzed by the above steps and by referencing IA64 manual.

Conclusion

We have just reviewed about some of the registers and instructions of IA64. At first glance, you would think IA64 has something totally different and it is all new to you. However, you can get over it and realize that it’s not as difficult as you expect.

Actually, I have felt excited to learn about this during my project concerned with IA64 so I hope you also have a chance to enjoy learning this. I also hope this article gives you confidence when you begin your IA64-based project.

Windows Debugging with WinDbg

Friday, May 30, 2014

Understanding IA64; the Beginning

No comments:

Post a Comment

About Me