Understanding IA64; the Beginning
I’ve expected
the 64-bit computing world to come soon for a while, but it has taken more time
than I expected even though VISTA has already been released. Anyway I am sure the
64-bit platform will be used everywhere sooner or later. Even now you can find
a PC which supports x64 easily so I guess you have already had a chance to
write your own 64-bit program at least x64.
However, what about IA64? It’s not for the PC platform but for a server platform, but I think everyone
should also be aware of IA64. Are you ready to write a code or debug your code on
IA64? Because it has different architectural issues compared with x86 or x64, there
are several things you need to know about IA64. I am about to describe the basics
in this article.
Register
set of IA64
The register set of IA64 is completely different
from x86 and x64. The first step of understanding CPU architecture is to understand
the register model of the CPU, so let me show the register set of IA64 from the
Intel Itanium Architecture Software Developer’s Manual.
Figure 1 – register set of IA64
First of all, you can’t find any of the familiar register names, EAX, EBX, ESP, or EBP
used in x86. It can be an embarrassing situation to you so you need to
concentrate on Figure 1 again. If you look at the names of the registers carefully,
you will figure out the meaning of the registers. gr0 to gr127 are general
registers which are used for general purposes like EAX, EBX and ECX. You can
see the number of general registers was increased so a compiler might use the
registers more efficiently. Another thing you need to know about general registers
is that some registers are reserved for special purposes, I will mention it in
the middle of this article.
Floating-point registers which are used for
floating-point operations are named fr0 to fr127 and the instruction pointer is
named IP. These kinds of registers might be familiar to you because x86 has a similar
register set so you can understand the instruction include these registers easily.
What about the others? You can reference IA64 manual when you encounter a
register you don’t know.
Instruction
of IA64
Have you ever seen disassembly code from
IA64? Figure 2 shows disassembly code from IA64. There is one thing you need to
know - r represents gr registers and b represents br registers in WinDbg.
Figure 2 - Disassembly code of IA64
e0000000`8402ecc0 adds ret0=f, r0
e0000000`8402ecc4 movl r2=e0000000`ffa00000 ;;
e0000000`8402ecd0 nop.m 0
e0000000`8402ecd4 mov b6=r2, +0
e0000000`8402ecd8
br.cond.sptk.few b6 ;;
Disassembly code of Figure 2 shows some
instructions such as adds, movl, nop.m, mov, and br.cond.sptk.few. It’s also hard to understand directly. However you can guess the meanings
of the instructions as being the same as add, mov, nop, and jmp in the x86
platform. The operands of the instructions look odd but you can guess the
meanings if you look at it carefully. In the case of the adds instruction, it
means that the result of constant f + r0 register value should be placed into the
ret0 register where r0 is gr0. movl means that the constant 0xe00000000’ffa00000 should be moved into the r2 register where r2 is gr2. br.cond.sptk.few
looks complicated but you can separate the string of instruction and find the meaning.
br is abbreviated form of branch and it is equivalent to jmp in x86 instruction.
br can have various meanings based on the following word. For example, br.cond
means conditional jump, br.call means procedure call and br.ret means procedure
return. sptk.few is a branch prediction hint which can be provided to the
processor to improve branch prediction. You can ignore this part because it
does not affect the functional behavior of the program. These instruction
formats might be unfamiliar to you but I’m convinced that you could get feel for these things.
Digging
into instruction format
In order to get more sense, we need to dig
into instruction format of IA64 further. Figure 3 shows the instruction format
of IA64.
Figure 3 – Instruction format of IA64 - bundle
This bundle format comes from the IA64
processor architecture called Explicitly Parallel Instruction Computing (EPIC)
in order to improve performance. A bundle which is 128 bits(16 bytes) in size is
composed of three instruction slots and one template where each instruction
size is 41 bits and the template size is 5 bits. Most instructions fit into one
slot so most of the bundles have 3 instructions. On the other hand, some instructions
take two slots because those need more space to describe the function of the instruction.
I can now show the example to you in Figure 2. The first two lines show a bundle
that has 2 instructions and the next three lines show a bundle that has 3
instructions.
This bundle format can make you analyze instruction
addresses in WinDbg incorrectly. In the strict sense, the address e0000000`8402ecc4 in Figure
2 does not indicate movl r2=e0000000`ffa00000 because the movl
instruction exists at e0000000`8402ecc0 + 46 bits not e0000000`8402ecc0 + 32
bits(4 bytes). WinDbg can not express an address in bit
units, I guess that they had no choice so they decided to display an estimated
byte address. However, all bundles begin at every 16 byte boundary exactly.
Another side effect from the instruction
format is that programs built by an IA64 compiler would be larger. Even though the
same C source file is compiled for x86, x64 and IA64, the program size of IA64 is
2 or 3 times greater than that of x86 or x64. I saw that the compiled program
has many more nop instructions. I guess it was generated by EPIC architecture to
do Parallel Instruction Computing. Of course the compiler might do the right
thing but it’s also true the code space
for nop instructions seems to be a waste. Don’t be surprised after building your own IA64 program.
Format
of adds instruction
Each instruction has its own instruction
format. For example, Figure 4 shows adds instruction. Since instruction size is
limited by 41 bits, opcode, operands and extensions should be placed within the
41 bit boundary and the locations vary by every instruction.
Figure 4 – instruction format of adds
The opcode of adds instruction which is 8
is located in the 4 most significant bits. The operands are expressed as “r1 = imm14, r3” where r1 is located in bit
6, r3 is located in bit 20, and the location of imm14 is spread throughout several
positions in the instruction. It’s
complicated to understand so let’s try to figure
it out from an example – the first instruction in
Figure 2.
e0000000`8402ecc0 adds
ret0=f, r0
The operand r1 = imm14, r3 in Figure 4 is equivalent
to ret0 = f, r0 above. Therefore, r1 (bit 6 ~ 12) has a value which indicates
ret0 register and r3 (bit 20 ~ 26) has a value which indicates r0 register and Imm14
( bit 13 ~ 19, 27 ~ 32, 36 ) has constant 0xF. We need to make sure of this by viewing
the real memory data of the instruction address.
kd> dq e0000000`8402ecc0
e0000000`8402ecc0 00ffa100`003c4005
68001000`40600000
We can now say that this is a 16 byte bundle
of data which has 2 or 3 instructions. I want to concentrate on the first
instruction adds in instruction slot 0 so we need to pick up bits 5 ~ 45 from
the 16 byte(128 bit) bundle as follows:
MSB 1000 0 100 000000 0000000 0001111
0001000 000000 LSB
op s Imm6d r3 imm7b r1 qp
You can see that r1 ( 0001000 ) is 8 and it
means general register r8. Why is the register r8 not ret0? You cannot find
ret0 at all in Figure 1. In fact, ret0 is just the mapping name of r8. Some
general registers are mapped to special names, for example, r8 ~ r11 are mapped
to ret0 ~ ret3. Figure 5 shows several examples.
Figure 5 – Mapping name of general registers
Mapping name
|
Original name
|
Meaning
|
ret0 ~ ret3
|
r8 ~ r11
|
Integer return value
|
rp
|
b0
|
Return pointer
|
gp
|
r1
|
Global pointer
|
sp
|
r12
|
Stack pointer
|
Let’s get back to the 41 bit stream, r3 (0000000) is 0 and it means
general register r0. In order to get imm14, you should take s, imm6d and imm7b
(0, 000000, 0001111). It can be expressed as the 14 bits stream 00000000001111
and is 0xf in hex decimal. Therefore, you can figure out adds ret0 = f, r0 from
the 41 bit instruction slot data. Now, we can write some code to get imm14 by
analyzing instruction slot memory data.
ULONG GetImm14FromAdds(ULONG_PTR pFunction)
{
ULONGLONG bundle;
ULONGLONG slot0;
ULONG_PTR imm14;
// lower 64 bit of bundle
=> 00ffa100`003c4005
bundle =
*(ULONGLONG*)pFunction;
// get rid of 5 bit template
and get 41 bit first instruction
slot0 = (bundle >>
5) & 0x1FFFFFFFFFF;
// make imm14 with s 1
bit, imm6d 6 bits, imm7b 7 bits in adds instruction
imm14 = ((slot0 >> 13) & 0x7F) |
(((slot0 >> 27) & 0x3F) << 7) |
(((slot0
>> 36) & 0x1) << 13);
return (ULONG)imm14;
}
Other instructions can be analyzed by the above
steps and by referencing IA64 manual.
Conclusion
We have just reviewed about some of the registers
and instructions of IA64. At first glance, you would think IA64 has something totally
different and it is all new to you. However, you can get over it and realize
that it’s not as difficult as you expect.
Actually, I have felt excited to learn
about this during my project concerned with IA64 so I hope you also have a
chance to enjoy learning this. I also hope this article gives you confidence when
you begin your IA64-based project.