Sunday, November 16, 2014

Debugging memory corruption (Advanced)

Sometimes software debugging is too tired task if there is no evidence to find a root cause of the problem, in that case, we need to concentrate on collecting any little clues in the crash dump. Therefore, I’d like to name the subtitle of this article as “Looking for a needle in a haystack”.

Today, we are going to look at a crash dump caused by memory corruption. We can guess that this is a memory corruption case and mydrv.sys might corrupt the memory when we open the crash dump at a glance but it’s hard to find the source code line which has the bug in mydrv.sys because this crash dump does not have any call stack related with mydrv.sys. Generally, after memory corruption has been happened and then the kernel would be crashed later, so we have to invent some proper way in order to find out the root cause of the problem.

Let’s go with "!analyze -v"

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003.  This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG.  This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG.  This will let us see why this breakpoint is
happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 8092e20b, The address that the exception occurred at
Arg3: f3a4ea60, Trap Frame
Arg4: 00000000
Debugging Details:
------------------
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - 0x%08lx
FAULTING_IP: 
nt!ObpLookupDirectoryEntry+ee
8092e20b 394608          cmp     dword ptr [esi+8],eax
TRAP_FRAME:  f3a4ea60 -- (.trap 0xfffffffff3a4ea60)
ErrCode = 00000000
eax=0000e470 ebx=e1fdb600 ecx=e1753c2c edx=00000011 esi=00740065 edi=e1753be8
eip=8092e20b esp=f3a4ead4 ebp=f3a4eaec iopl=0         nv up ei pl nz na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010206
nt!ObpLookupDirectoryEntry+0xee:
8092e20b 394608          cmp     dword ptr [esi+8],eax ds:0023:0074006d=????????
Resetting default scope
DEFAULT_BUCKET_ID:  DRIVER_FAULT
BUGCHECK_STR:  0x8E
PROCESS_NAME:  iexplore.exe
CURRENT_IRQL:  0
LAST_CONTROL_TRANSFER:  from 8085bba7 to 8087c480
STACK_TEXT:  
f3a4e62c 8085bba7 0000008e c0000005 8092e20b nt!KeBugCheckEx+0x1b
f3a4e9f0 808346b4 f3a4ea0c 00000000 f3a4ea60 nt!KiDispatchException+0x3a2
f3a4ea58 80834668 f3a4eaec 8092e20b badb0d00 nt!CommonDispatchException+0x4a
f3a4ea78 80935a06 00000000 f3a4eb67 f3a4eb7c nt!Kei386EoiHelper+0x186
f3a4eaec 8092c2ab e1753b98 f3a4eb14 00000050 nt!SepAccessCheck+0x41d
f3a4eb58 80939f97 00000000 f3a4ec3c 00000000 nt!ObpLookupObjectName+0x45f
f3a4ec48 809305c8 00000000 00000000 00000000 nt!ObReferenceObjectByName+0x8e
f3a4ed10 80920300 0559cd88 0315b948 0315b954 nt!NtSecureConnectPort+0x1d6
f3a4ed3c 80833bdf 0559cd88 0315b948 0315b954 nt!NtConnectPort+0x24
f3a4ed3c 7c9685ec 0559cd88 0315b948 0315b954 nt!KiFastCallEntry+0xfc
...
STACK_COMMAND:  kb
FOLLOWUP_IP: 
nt!ObpLookupDirectoryEntry+ee
8092e20b 394608          cmp     dword ptr [esi+8],eax
SYMBOL_STACK_INDEX:  0
SYMBOL_NAME:  nt!ObpLookupDirectoryEntry+ee
FOLLOWUP_NAME:  MachineOwner
MODULE_NAME: nt
IMAGE_NAME:  ntkrnlmp.exe
DEBUG_FLR_IMAGE_TIMESTAMP:  45ec14ca
FAILURE_BUCKET_ID:  0x8E_nt!ObpLookupDirectoryEntry+ee
BUCKET_ID:  0x8E_nt!ObpLookupDirectoryEntry+ee
Followup: MachineOwner
---------


According to the messages, this is a very common bugcheck but there is not useful information to debug.
There is no my module name on the call stack except NT kernel, so there are no hints to analyze this problem.

At the end of the messages, it points a function that the exception occurred at is nt!ObpLookupDirectoryEntry+ee, but it is not useful either because it’s really rare case that it is from NT kernel bug, most cases are from third party driver’s bug or mine. Therefore, we need to make every possible effort to find out any hint of the crash.

Let’s begin with .trap command.

0: kd> .trap 0xfffffffff3a4ea60

ErrCode = 00000000

eax=0000e470 ebx=e1fdb600 ecx=e1753c2c edx=00000011 esi=00740065 edi=e1753be8
eip=8092e20b esp=f3a4ead4 ebp=f3a4eaec iopl=0         nv up ei pl nz na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010206
nt!ObpLookupDirectoryEntry+0xee:
8092e20b 394608          cmp     dword ptr [esi+8],eax ds:0023:0074006d=????????


It was crashed when the instruction at 0x8092e20b which dereferences 0x0074006d was executed.  
0x0074006d is invalid address so the exception could not be handled.

Where is the value 0x0074006d from?
We can trace back the last executed instructions.

0: kd> ub eip

nt!ObpLookupDirectoryEntry+0xd1:

8092e1eb f00fb111        lock cmpxchg dword ptr [ecx],edx
8092e1ef 85c0            test    eax,eax
8092e1f1 0f8554ccfdff    jne     nt!ObpLookupDirectoryEntry+0xd9 (8090ae4b)
8092e1f7 c746103412dddd  mov     dword ptr [esi+10h],0DDDD1234h
8092e1fe 8b33            mov     esi,dword ptr [ebx]    <=== esi = 00740065 from [ebx]
8092e200 85f6            test    esi,esi
8092e202 0f848a470000    je      nt!ObpLookupDirectoryEntry+0x122 (80932992)
8092e208 8b45f8          mov     eax,dword ptr [ebp-8]
0: kd> u eip
nt!ObpLookupDirectoryEntry+0xee:
8092e20b 394608          cmp     dword ptr [esi+8],eax    <=== faulting address
8092e20e 0f851a690000    jne     nt!ObpLookupDirectoryEntry+0x112 (80934b2e)
8092e214 8b4604          mov     eax,dword ptr [esi+4]
8092e217 0fb648f4        movzx   ecx,byte ptr [eax-0Ch]

0: kd> r


Last set context:

eax=0000e470 ebx=e1fdb600 ecx=e1753c2c edx=00000011 esi=00740065 edi=e1753be8
eip=8092e20b esp=f3a4ead4 ebp=f3a4eaec iopl=0         nv up ei pl nz na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010206
nt!ObpLookupDirectoryEntry+0xee:
8092e20b 394608          cmp     dword ptr [esi+8],eax ds:0023:0074006d=????????


0x0074006d was from esi+8 where esi was 0x00740065.
0x00740065 in esi was from above address 0x8092e1fe, it was from [ebx].

The current value in ebx is 0xe1fdb600 and the data in the address should be a valid memory pointer, but the data 0x00740065 was not valid memory pointer.
We can check the memory information of 0xe1fdb600 using !pool command.

0: kd> !pool e1fdb600

Pool page e1fdb600 region is Paged pool

 e1fdb000 size:  290 previous size:    0  (Allocated)  Gla:
 e1fdb290 size:   10 previous size:  290  (Free)       9.0.
 ...
 e1fdb490 size:  168 previous size:   a0  (Allocated)  MyMm
e1fdb5f8 is not a valid large pool allocation, checking large session pool...
e1fdb5f8 is freed (or corrupt) pool
Bad previous allocation size @e1fdb5f8, last size was 2d
***
*** An error (or corruption) in the pool was detected;
...
None found


There are several memory informations but it can't display memory informations from 0xe1fdb5f8, the message indicates that the memory area could be freed or corrupted. So we can guess the original data in 0xe1fdb600 might be valid memory pointer, but it was corrupted or overwritten by someone else.

Generally, a driver using the valid memory 0xe1fdb490 nearby corrupted area is suspect. The memory tag of 0xe1fdb490 is MyMm and this tag belongs to mydrv.sys. Therefore, mydrv.sys might corrupt the data in 0xe1fdb600 while it handles it's memory which has MyMm tag. 

However, we can’t fix this problem yet because we don’t know where mydrv.sys uses 0xe1fdb490, MyMm is general purpose memory tag in this driver so it’s hard to find specific code using this memory tag. How can we find the source code position of mydrv.sys we have to fix?

We have only one clue that mydrv.sys had used the memory 0xe1fdb490, we have to find the code using it. To simplify this problem, I settle an assumption that 0xe1fd490 is saved in the memory area of mydrv.sys as global variable.

We can check memory range of mydrv.sys like this:

0: kd> lmvm mydrv
start    end        module name
f3fa3000 f400ba00   mydrv   (deferred)             
    Image path: \??\C:\WINDOWS\system32\drivers\mydrv.sys
    Image name: mydrv.sys
    ...


mydrv.sys is located from 0xf3fa3000 to 0xf400ba00. I will search for a global variable containing 0xe1db490. I have to search for 0xe1fdb498 (0xe1fdb490 + 8) because 0xe1fdb490 displayed in !pool command output is the address of memory header, the real address allocated to mydrv.sys is 0xe1fdb498 where 0xe1fdb490 + memory header size.

We can search the memory as following:

0: kd> s f3fa3000 f400ba00 98 b4 fd e1


There is no result. It means that there is no global variable containing this value in mydrv.sys.
We need to change our strategy, next step would be to search whole memory area in order to find the memory allocated to mydrv.sys, it is like looking for a needle in a haystack. :)

I'll check the paged pool memory range because mydrv.sys uses only Paged Pool.
We can check Paged Pool range using !pool command without parameter.

0: kd> !pool

Paged Pool: e1000000 .. eb7fffff
e1000000: 1000 - busy
e1001000: 1000 - busy
e1002000: 1000 - busy
...
...
...
e3e74000: 22000 - busy
e3e9d000: 15000 - free
e3eb2000: 41000 - free
e3ef3000: 22000 - busy
e3f15000: a0c000 - free
e4921000: 6edf000 - free


The Paged Pool range is from 0xe1000000 to 0xeb7fffff but those are all freed from 0xe3f15000 so the valid address range for searching would be from 0xe1000000 to 0xe3f15000. Let’s look for a needle(0xe1fdb498) in the pool range.

0: kd> s e1000000 e3f15000 98 b4 fd e1
e115aec8  98 b4 fd e1 00 00 00 00-05 06 26 0c 42 63 4d 63  ..........&.MyMm


Wow! I’ve got a needle at 0xe115aec8. As you can see, we can find a meaningful information with any little clue.

To check the memory information, use !pool command.

0: kd> !pool e115aec8
Pool page e115aec8 region is Paged pool
 e115a000 size:  9d0 previous size:    0  (Allocated)  MyMm
 e115a9d0 size:   18 previous size:  9d0  (Allocated)  Ntf0 ...
 e115ae20 size:   48 previous size:   68  (Allocated)  Ntfc
 e115ae68 size:   40 previous size:   48  (Allocated)  MyMm
*e115aea8 size:   28 previous size:   40  (Allocated) *MyMm
  Owning component : Unknown (update pooltag.txt)


0xe115aec8 is within 0xe115aea8 allocation and memory tag MyMm means that it belongs to mydrv.sys
MyMm is general purpose memory tag, we can’t know yet what the memory is for.

I will search the pool for 0xe115ea8 in order to verify which code uses the memory.
We need to use 0xe115aeb0, added memory header size 8.

0: kd> s e1000000 e3f15000 b0 ae 15 e1
e18407e4  b0 ae 15 e1 a0 b4 fd e1-32 00 00 00 c4 ea a4 f3  ........2.......


0xe18407e4 contains 0xe115aeb0, check it with !pool.

0: kd> !pool e18407e4
Pool page e18407e4 region is Paged pool
 e1840000 size:   b8 previous size:    0  (Allocated)  Port (Protected)
 e18400b8 size:   10 previous size:   b8  (Allocated)  Glnk
  ...
 e1840720 size:   30 previous size:   18  (Allocated)  CMVa
 e1840750 size:   18 previous size:   30  (Allocated)  Uspi Process: 85b31c88
*e1840768 size:   98 previous size:   18  (Allocated) *MyMm  Owning component : Unknown (update pooltag.txt)
 e1840800 size:   50 previous size:   98  (Allocated)  LStb ...


Hmm… it is same pattern. Search for 0xe1840770 one more time.

0: kd> s e1000000 e3f15000 70 07 84 e1
e2f3f874  70 07 84 e1 48 21 13 e2-01 00 00 00 00 00 00 00  p...H!..........


0xe2f3f874 contains 0xe1840770 , check it with !pool.

0: kd> !pool e2f3f874
Pool page e2f3f874 region is Paged pool
 e2f3f000 size:  100 previous size:    0  (Allocated)  IoNm
 e2f3f100 size:   98 previous size:  100  (Allocated)  NtFs ...
 e2f3f760 size:  108 previous size:   38  (Allocated)  CM39
*e2f3f868 size:   58 previous size:  108  (Allocated) *MySt
  Owning component : Unknown (update pooltag.txt)
 e2f3f8c0 size:   80 previous size:   58  (Allocated)  IoNm ...


0xe2f3f874 is within 0xe2f3f868 allocation and memory tag MySt means that it belongs to mydrv.sys as well, but MySt tag is used for specific structure in the driver so we can retrieve the structure from source code and match with the memory data. We are lucky!

typedef struct _MY_STRUCT {
 ULONG            ulType;
 DRV_STRUCT  *pDrvStruct1;
 DRV_STRUCT  *pDrvStruct2;
 ...
} MY_STRUCT;


This is MY_STRUCT structure and TYPE_ID of MY_STRUCT is defined as 0x10. 
We have to make sure the first field ulType in the memory 0xe2f3f870 (0xe2f3f868 + 8).

0: kd> db e2f3f870
e2f3f870  10 00 00 00 70 07 84 e1-48 21 13 e2 01 00 00 00  ....p...H!......
e2f3f880  00 00 00 00 00 00 00 00-65 73 00 00 00 00 00 00  ........es......
e2f3f890  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................


It has correct value 0x10 so we are sure the memory is for MY_STRUCT. From now, we can trace back the addresses we've found above with structure defined in the source code of mydrv.sys. 

Use dt command with structure type and address.

0: kd> dt mydrv!MY_STRUCT e2f3f870
   +0x000 ulType          : 16
   +0x004 pDrvStruct1    : 0xe1840770 
   +0x008 pDrvStruct2    : 0xe2132148 
   ...


We can see the familiar address 0xe1840770 and the type of pDrvStruct1 is DRV_STRUCT.

0: kd> dt mydrv!DRV_STRUCT e1840770
   +0x000 id               : 64 
    ...
   +0x074 pPathStruct   : 0xe115aeb0 
   +0x078 pOsPath   : 0xe1fdb4a0  -> 0x5c
   ...


We can see the familiar address 0xe115aeb0 as well and the type of pPathStruct is PATH_STRUCT.

0: kd> dt mydrv!PATH_STRUCT e115aeb0
   +0x000 id               : 6d
   ...
   +0x018 pPathData     : 0xe1fdb498


Eventaully, we can see the address 0xe1fdb498 and the type of pPathData is PATH_DATA.

Let’s check the contents of the structure.

0: kd> dt mydrv!PATH_DATA e1fdb498
   +0x000 pPrev            : (null) 
   +0x004 pLimit           : 0xe1fdb5f2  "u"
   +0x008 Data             : []  "\"

0: kd> db e1fdb498
e1fdb498  00 00 00 00 f2 b5 fd e1-5c 00 44 00 65 00 76 00  ........\.D.e.v.
e1fdb4a8  69 00 63 00 65 00 5c 00-48 00 61 00 72 00 64 00  i.c.e.\.H.a.r.d.
e1fdb4b8  64 00 69 00 73 00 6b 00-56 00 6f 00 6c 00 75 00  d.i.s.k.V.o.l.u.
e1fdb4c8  6d 00 65 00 32 00 5c 00-57 00 49 00 4e 00 44 00  m.e.2.\.W.I.N.D.

e1fdb4d8  4f 00 57 00 53 00 5c 00-73 00 79 00 73 00 74 00  O.W.S.\.s.y.s.t.
e1fdb4e8  65 00 6d 00 33 00 32 00-5c 00 6f 00 6c 00 65 00  e.m.3.2.\.o.l.e.
e1fdb4f8  33 00 32 00 2e 00 64 00-6c 00 6c 00 00 00 6c 00  3.2...d.l.l...l.
e1fdb508  6c 00 00 00 6c 00 6c 00-00 00 74 00 6f 00 72 00  l...l.l...t.o.r.
e1fdb518  5c 00 4c 00 6f 00 63 00-61 00 6c 00 20 00 53 00  \.L.o.c.a.l. .S.
e1fdb528  65 00 74 00 74 00 69 00-6e 00 67 00 73 00 5c 00  e.t.t.i.n.g.s.\.
e1fdb538  54 00 65 00 6d 00 70 00-6f 00 72 00 61 00 72 00  T.e.m.p.o.r.a.r.
e1fdb548  79 00 20 00 49 00 6e 00-74 00 65 00 72 00 6e 00  y. .I.n.t.e.r.n.
e1fdb558  65 00 74 00 20 00 46 00-69 00 6c 00 65 00 73 00  e.t. .F.i.l.e.s.
e1fdb568  5c 00 43 00 6f 00 6e 00-74 00 65 00 6e 00 74 00  \.C.o.n.t.e.n.t.
e1fdb578  2e 00 49 00 45 00 35 00-5c 00 39 00 46 00 4c 00  ..I.E.5.\.9.F.L.
e1fdb588  4b 00 4f 00 4d 00 4a 00-31 00 5c 00 6d 00 61 00  K.O.M.J.1.\.m.a.
e1fdb598  70 00 5b 00 31 00 5d 00-2e 00 67 00 69 00 66 00  p.[.1.]...g.i.f.
e1fdb5a8  00 00 5d 00 2e 00 67 00-69 00 66 00 00 00 69 00  ..]...g.i.f...i.
e1fdb5b8  66 00 00 00 00 00 69 00-66 00 00 00 69 00 63 00  f.....i.f...i.c.
e1fdb5c8  65 00 5c 00 48 00 61 00-72 00 64 00 64 00 69 00  e.\.H.a.r.d.d.i.
e1fdb5d8  73 00 6b 00 56 00 6f 00-6c 00 75 00 6d 00 65 00  s.k.V.o.l.u.m.e.
e1fdb5e8  32 00 5c 00 44 00 6f 00-63 00 75 00 6d 00 65 00  2.\.D.o.c.u.m.e.
e1fdb5f8  6e 00 74 00 73 00 53 00-65 00 74 00 74 00 69 00  n.t.s.S.e.t.t.i.
e1fdb608  6e 00 67 00 73 00 5c 00-41 00 64 00 6d 00 69 00  n.g.s.\.A.d.m.i.
e1fdb618  6e 00 69 00 73 00 74 00-72 00 61 00 74 00 6f 00  n.i.s.t.r.a.t.o.
e1fdb628  72 00 5c 00 4c 00 6f 00-63 00 61 00 6c 00 20 00  r.\.L.o.c.a.l. .
e1fdb638  53 00 65 00 74 00 74 00-69 00 6e 00 67 00 73 00  S.e.t.t.i.n.g.s.
e1fdb648  5c 00 54 00 65 00 6d 00-70 00 6f 00 72 00 61 00  \.T.e.m.p.o.r.a.
e1fdb658  72 00 79 00 20 00 49 00-6e 00 74 00 65 00 72 00  r.y. .I.n.t.e.r.
e1fdb668  6e 00 65 00 74 00 20 00-46 00 69 00 6c 00 65 00  n.e.t. .F.i.l.e.
e1fdb678  73 00 5c 00 43 00 6f 00-6e 00 74 00 65 00 6e 00  s.\.C.o.n.t.e.n.
e1fdb688  74 00 2e 00 49 00 45 00-35 00 5c 00 39 00 46 00  t...I.E.5.\.9.F.
e1fdb698  4c 00 4b 00 4f 00 4d 00-4a 00 31 00 5c 00 62 00  L.K.O.M.J.1.\.b.
e1fdb6a8  30 00 32 00 5f 00 6f 00-6e 00 5f 00 6f 00 66 00  0.2._.o.n._.o.f.
e1fdb6b8  66 00 2d 00 4c 00 5b 00-31 00 5d 00 2e 00 67 00  f.-.L.[.1.]...g.
e1fdb6c8  69 00 66 00 00 00 01 00-88 20 c6 85 88 a8 cd 85  i.f...... ......


It is filled with string data from 0xe1fdb498 to 0xe1fdb6c8 but the range of memory allocated to 0xe1fdb498 is from 0xe1fdb490 to 0xe1fdb5f7.

0: kd> !pool e1fdb498
Pool page e1fdb498 region is Paged pool
 e1fdb000 size:  290 previous size:    0  (Allocated)  Gla:
 ...
 e1fdb3f0 size:   a0 previous size:    8  (Allocated)  Gla8
*e1fdb490 size:  168 previous size:   a0  (Allocated) *MyMm
  Owning component : Unknown (update pooltag.txt)
e1fdb5f8 is not a valid large pool allocation, checking large session pool...
e1fdb5f8 is freed (or corrupt) pool


Maybe, 0xe1fdb5f8 was allocated to NT kernel so there are NT kernel functions on the top of the call stack in this crash dump. NT kernel lost its data and was crashed because mydrv.sys overwrote the string data over its allocation range.

Finally, We've found a bug of the function handling PATH_DATA in mydrv.sys corrupts a memory in the paged pool. 

It was my fault. :)

Monday, June 2, 2014

Suspended Process (KiDeliverApc)

There are a lot of software problems we can experience when we are using our computer. Today, I am going to talk about a story of analysis for a problem from suspended process.

It was a system hang case. There were a lot of processes suspended in the system. I am going to find out who suspends the processes.

Here is a call stack of taskmgr.exe in the system.


THREAD 8956e660  Cid 0c18.0bcc  Teb: 7ffdd000 Win32Thread: e2975360 WAIT: (Suspended) KernelMode Non-Alertable SuspendCount 1
...
    Owning Process            899cd5a8       Image:      taskmgr.exe
    Attached Process          N/A            Image:         N/A
...
    ChildEBP RetAddr  
    aad52c18 80504d50 nt!KiSwapContext+0x2f (FPO: [Uses EBP] [0,0,4])
    aad52c24 804fcf40 nt!KiSwapThread+0x8a (FPO: [0,0,0])
    aad52c4c 8050448c nt!KeWaitForSingleObject+0x1c2 (FPO: [5,5,4])
    aad52c64 80500dfa nt!KiSuspendThread+0x18 (FPO: [3,0,0])
    aad52cac 80504d6e nt!KiDeliverApc+0x124 (FPO: [3,10,0])
    aad52cc4 804fcf40 nt!KiSwapThread+0xa8 (FPO: [0,0,0])
    aad52cec 805c108c nt!KeWaitForSingleObject+0x1c2 (FPO: [5,5,4])
    aad52d50 8054289c nt!NtWaitForSingleObject+0x9a (FPO: [Non-Fpo])
    aad52d50 7c93e514 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ aad52d64)
    00dbff84 7c802532 ntdll!KiFastSystemCallRet
    00dbff98 0100d90d kernel32!WaitForSingleObject+0x12 (FPO: [2,0,0])
    00dbffb4 7c80b699 taskmgr!WorkerThread+0x3a (FPO: [1,0,4])
    00dbffec 00000000 kernel32!BaseThreadStart+0x37 (FPO: [Non-Fpo])


We can see KiSwapContext call on top of the call stack, it usually means context switching but there is an odd thing in this call stack that there are two KiSwapThread calls, this is not usual case. The KiSwapThread in the middle of the call stack is for context switching, what is the second KiSwapTherad on top of the call stack?

You can see a call flow that KiDeliverApc, KiSuspendThread, KeWaitForSingleObject, KiSwapThread, so the second KiSwapThread is from KiSuspendThread. I can figure out that some process called SuspendThread API to taskmgr.exe and it didn’t call ResumeThread.

Now we need to find out a process to call SuspendThread to all other processes in the system. In order to find the caller of SuspendThread, first of all, we have to understand KiDeliverApc function to call KiSuspendThread and APC(Asynchronous Procedure Call) mechanism in Windows.

Windows kernel has a concept of IRQL (Interrupt Request Level) and thread scheduler of Windows kernel do thread context switching at DISPATCH_LEVEL (It is IRQL 2). After scheduler chooses thread to run, IRQL becomes lower to APC_LEVEL and the APCs queued in the thread are executed and then IRQL becomes lower to PASSIVE_LEVEL that the thread codes can run.

Therefore, KiDeliverApc means that it delivers APCs queued in the thread and KiSuspendThread called by KiDeliverApc means that KiSuspendThread was inserted into APC queue of the thread by APC. The prototype of KiDeliverApc is as following.


VOID
NTAPI
KiDeliverApc(IN KPROCESSOR_MODE DeliveryMode,
             IN PKEXCEPTION_FRAME ExceptionFrame,
             IN PKTRAP_FRAME TrapFrame)

It doesn’t take any APC structures through it’s parameters. I wondered how KiDeliverApc deliver APC of the thread, so I analyzed inside of KiDeliverApc function. Following pseudo codes represent the functionality of it.

KiDeliverApc()
{
    PKTHREAD Thread = KeGetCurrentThread();

    //
    // If APCs exist in ApcState.ApcListHead of current thread,
    // KiDeliverApc gets all APCs from the list and call KernelRoutine, NormalRoutine.
    //    
    while (!IsListEmpty(&Thread->ApcState.ApcListHead[KernelMode]))
    {
        ApcListEntry = Thread->ApcState.ApcListHead[KernelMode].Flink;
        Apc = CONTAINING_RECORD(ApcListEntry, KAPC, ApcListEntry);

        KernelRoutine = Apc->KernelRoutine;
        NormalRoutine = Apc->NormalRoutine;
        KernelRoutine(Apc, ...);
        NormalRoutine(...);
    }
}

The point is that someone inserts APC containing KiSuspendThread as NormalRoutine to ApcState.ApcListHead of current thread.

To find out who is someone, I tried to set memory break point at the address of Thread->ApcState.ApcListHead in order to get debugger break when any APC is inserted to this thread where ApcState field offset is 0x34 in KTHREAD structure.


kd> dt _KTHREAD ApcState.
nt!_KTHREAD
   +0x034 ApcState  : 
      +0x000 ApcListHead : [2] _LIST_ENTRY
      +0x010 Process   : Ptr32 _KPROCESS
      +0x014 KernelApcInProgress : UChar
      +0x015 KernelApcPending : UChar
      +0x016 UserApcPending : UChar
   +0x138 ApcStatePointer : [2] 
   +0x165 ApcStateIndex : UChar

kd> ba w4 (Address of Thread+0x34)


Following call stack is when debugger has got the break point. This is an operation during thread creation time.

kd> k
ChildEBP RetAddr  
f9b9bb60 804fdaa3 nt!KiInsertQueueApc+0x79
f9b9bb80 805c7436 nt!KeSuspendThread+0x67
f9b9bcc4 805c7f02 nt!PspCreateThread+0x570
f9b9bd3c 8053ea48 nt!NtCreateThread+0xfc
f9b9bd3c 7c93e514 nt!KiFastCallEntry+0xf8

kd> ub nt!KiInsertQueueApc+0x79
nt!KiInsertQueueApc+0x64:
804ff374 eb3f            jmp     nt!KiInsertQueueApc+0xa5 (804ff3b5)
804ff376 0fbeda          movsx   ebx,dl
804ff379 8d3cdf          lea     edi,[edi+ebx*8]
804ff37c 8b5f04          mov     ebx,dword ptr [edi+4]
804ff37f 8d700c          lea     esi,[eax+0Ch]       ; ApcListEntry
804ff382 893e            mov     dword ptr [esi],edi
804ff384 895e04          mov     dword ptr [esi+4],ebx  
804ff387 8933            mov     dword ptr [ebx],esi ; Insertion to list


A list is inserted at instruction pointer 804ff387, it means that ebx is the address of Thread->ApcState.ApcListHead and esi is an address of ListEntry. esi was from instruction pointer 804ff37f, it is eax+0Ch. We can be aware of esi is ApcListEntry and eax is an address of KAPC by following structure.

kd> dt _KAPC @eax
nt!_KAPC
   +0x000 Type             : 18
   +0x002 Size             : 48
   +0x004 Spare0           : 0
   +0x008 Thread           : 0x812f1020 _KTHREAD
   +0x00c ApcListEntry     : _LIST_ENTRY [ 0x812f1054 - 0x812f1054 ]
   +0x014 KernelRoutine    : 0x80501ed4     void  nt!KiSuspendNop+0
   +0x018 RundownRoutine   : 0x805277ba     void  nt!PopAttribNop+0
   +0x01c NormalRoutine    : 0x8050230a     void  nt!KiSuspendThread+0
   +0x020 NormalContext    : (null) 
   +0x024 SystemArgument1  : (null) 
   +0x028 SystemArgument2  : (null) 
   +0x02c ApcStateIndex    : 0 ''
   +0x02d ApcMode          : 0 ''
   +0x02e Inserted         : 0 ''


We can see nt!KiSuspendThread+0 in NormalRoutine at offset 0x01c. We've found an instruction pointer where KAPC is inserted and field offset of KAPC where KiSuspendThread pointer saved, now I am going to set a break point at 804ff389 (nt!KiInsertQueueApc+0x79) with a condition only if the NormalRoutine is KiSuspendThread in order to get a break point when someone inserts an APC having KiSuspendThread.

kd> u nt!KiSuspendThread L1
nt!KiSuspendThread:
8050230a 64a124010000    mov     eax,dword ptr fs:[00000124h]

kd> bp nt!KiInsertQueueApc+79 ".if (poi(@eax+1c)==8050230a) {} .else {gc}”

After several break hits from thread creation, eventually, I've got a call stack as following.

kd> k
ChildEBP RetAddr  
f766ecc8 804fdaa3 nt!KiInsertQueueApc+0x79
f766ece8 805cb671 nt!KeSuspendThread+0x67
f766ed28 805cb706 nt!PsSuspendThread+0x6f
f766ed44 805cb8fe nt!PsSuspendProcess+0x28
f766ed58 8053ea48 nt!NtSuspendProcess+0x40
f766ed58 7c93e514 nt!KiFastCallEntry+0xf8
WARNING: Stack unwind information not available. Following frames may be wrong.
00000000 00000000 ntdll!KiFastSystemCallRet

This call stack means that this process called Win32 API SuspendProcess in user mode and it has a responsibility to call ResumeProcess but it didn’t, so the process is the root cause of system hang.

Friday, May 30, 2014

Understanding IA64; the Beginning

[This article was written in 2007. That time I was porting my 32bit driver to 64bit driver for x64 and IA64 platforms. This is what I learned from the experience. I just post it now for archiving purpose.)


Understanding IA64; the Beginning

Ive expected the 64-bit computing world to come soon for a while, but it has taken more time than I expected even though VISTA has already been released. Anyway I am sure the 64-bit platform will be used everywhere sooner or later. Even now you can find a PC which supports x64 easily so I guess you have already had a chance to write your own 64-bit program at least x64.
However, what about IA64? Its not for the PC platform but for a server platform, but I think everyone should also be aware of IA64. Are you ready to write a code or debug your code on IA64? Because it has different architectural issues compared with x86 or x64, there are several things you need to know about IA64. I am about to describe the basics in this article.

Register set of IA64
The register set of IA64 is completely different from x86 and x64. The first step of understanding CPU architecture is to understand the register model of the CPU, so let me show the register set of IA64 from the Intel Itanium Architecture Software Developers Manual.

Figure 1 register set of IA64



First of all, you cant find any of the familiar register names, EAX, EBX, ESP, or EBP used in x86. It can be an embarrassing situation to you so you need to concentrate on Figure 1 again. If you look at the names of the registers carefully, you will figure out the meaning of the registers. gr0 to gr127 are general registers which are used for general purposes like EAX, EBX and ECX. You can see the number of general registers was increased so a compiler might use the registers more efficiently. Another thing you need to know about general registers is that some registers are reserved for special purposes, I will mention it in the middle of this article.

Floating-point registers which are used for floating-point operations are named fr0 to fr127 and the instruction pointer is named IP. These kinds of registers might be familiar to you because x86 has a similar register set so you can understand the instruction include these registers easily. What about the others? You can reference IA64 manual when you encounter a register you dont know.

Instruction of IA64
Have you ever seen disassembly code from IA64? Figure 2 shows disassembly code from IA64. There is one thing you need to know - r represents gr registers and b represents br registers in WinDbg.

Figure 2 - Disassembly code of IA64

e0000000`8402ecc0        adds   ret0=f, r0
e0000000`8402ecc4        movl   r2=e0000000`ffa00000 ;;

e0000000`8402ecd0        nop.m  0
e0000000`8402ecd4        mov    b6=r2, +0
e0000000`8402ecd8        br.cond.sptk.few b6 ;;

Disassembly code of Figure 2 shows some instructions such as adds, movl, nop.m, mov, and br.cond.sptk.few. Its also hard to understand directly. However you can guess the meanings of the instructions as being the same as add, mov, nop, and jmp in the x86 platform. The operands of the instructions look odd but you can guess the meanings if you look at it carefully. In the case of the adds instruction, it means that the result of constant f + r0 register value should be placed into the ret0 register where r0 is gr0. movl means that the constant 0xe00000000ffa00000 should be moved into the r2 register where r2 is gr2. br.cond.sptk.few looks complicated but you can separate the string of instruction and find the meaning. br is abbreviated form of branch and it is equivalent to jmp in x86 instruction. br can have various meanings based on the following word. For example, br.cond means conditional jump, br.call means procedure call and br.ret means procedure return. sptk.few is a branch prediction hint which can be provided to the processor to improve branch prediction. You can ignore this part because it does not affect the functional behavior of the program. These instruction formats might be unfamiliar to you but Im convinced that you could get feel for these things.

Digging into instruction format
In order to get more sense, we need to dig into instruction format of IA64 further. Figure 3 shows the instruction format of IA64.

Figure 3 – Instruction format of IA64 - bundle



This bundle format comes from the IA64 processor architecture called Explicitly Parallel Instruction Computing (EPIC) in order to improve performance. A bundle which is 128 bits(16 bytes) in size is composed of three instruction slots and one template where each instruction size is 41 bits and the template size is 5 bits. Most instructions fit into one slot so most of the bundles have 3 instructions. On the other hand, some instructions take two slots because those need more space to describe the function of the instruction. I can now show the example to you in Figure 2. The first two lines show a bundle that has 2 instructions and the next three lines show a bundle that has 3 instructions.

This bundle format can make you analyze instruction addresses in WinDbg incorrectly. In the strict sense, the address e0000000`8402ecc4 in Figure 2 does not indicate movl   r2=e0000000`ffa00000 because the movl instruction exists at e0000000`8402ecc0 + 46 bits not e0000000`8402ecc0 + 32 bits(4 bytes). WinDbg can not express an address in bit units, I guess that they had no choice so they decided to display an estimated byte address. However, all bundles begin at every 16 byte boundary exactly.

Another side effect from the instruction format is that programs built by an IA64 compiler would be larger. Even though the same C source file is compiled for x86, x64 and IA64, the program size of IA64 is 2 or 3 times greater than that of x86 or x64. I saw that the compiled program has many more nop instructions. I guess it was generated by EPIC architecture to do Parallel Instruction Computing. Of course the compiler might do the right thing but its also true the code space for nop instructions seems to be a waste. Dont be surprised after building your own IA64 program.

Format of adds instruction
Each instruction has its own instruction format. For example, Figure 4 shows adds instruction. Since instruction size is limited by 41 bits, opcode, operands and extensions should be placed within the 41 bit boundary and the locations vary by every instruction.

Figure 4 – instruction format of adds



The opcode of adds instruction which is 8 is located in the 4 most significant bits. The operands are expressed as r1 = imm14, r3 where r1 is located in bit 6, r3 is located in bit 20, and the location of imm14 is spread throughout several positions in the instruction. Its complicated to understand so lets try to figure it out from an example the first instruction in Figure 2.

e0000000`8402ecc0        adds   ret0=f, r0

The operand r1 = imm14, r3 in Figure 4 is equivalent to ret0 = f, r0 above. Therefore, r1 (bit 6 ~ 12) has a value which indicates ret0 register and r3 (bit 20 ~ 26) has a value which indicates r0 register and Imm14 ( bit 13 ~ 19, 27 ~ 32, 36 ) has constant 0xF. We need to make sure of this by viewing the real memory data of the instruction address.

kd> dq e0000000`8402ecc0       
e0000000`8402ecc0  00ffa100`003c4005 68001000`40600000

We can now say that this is a 16 byte bundle of data which has 2 or 3 instructions. I want to concentrate on the first instruction adds in instruction slot 0 so we need to pick up bits 5 ~ 45 from the 16 byte(128 bit) bundle as follows:

MSB    1000 0 100 000000 0000000 0001111 0001000 000000    LSB
             op   s         Imm6d        r3         imm7b        r1          qp

You can see that r1 ( 0001000 ) is 8 and it means general register r8. Why is the register r8 not ret0? You cannot find ret0 at all in Figure 1. In fact, ret0 is just the mapping name of r8. Some general registers are mapped to special names, for example, r8 ~ r11 are mapped to ret0 ~ ret3. Figure 5 shows several examples.

Figure 5 – Mapping name of general registers

Mapping name
Original name
Meaning
ret0 ~ ret3
r8 ~ r11
Integer return value
rp
b0
Return pointer
gp
r1
Global pointer
sp
r12
Stack pointer

Lets get back to the 41 bit stream, r3 (0000000) is 0 and it means general register r0. In order to get imm14, you should take s, imm6d and imm7b (0, 000000, 0001111). It can be expressed as the 14 bits stream 00000000001111 and is 0xf in hex decimal. Therefore, you can figure out adds ret0 = f, r0 from the 41 bit instruction slot data. Now, we can write some code to get imm14 by analyzing instruction slot memory data.

ULONG GetImm14FromAdds(ULONG_PTR pFunction)
{
            ULONGLONG bundle;
            ULONGLONG slot0;
            ULONG_PTR imm14;

            // lower 64 bit of bundle => 00ffa100`003c4005
            bundle = *(ULONGLONG*)pFunction;                   

            // get rid of 5 bit template and get 41 bit first instruction
            slot0 = (bundle >> 5) & 0x1FFFFFFFFFF; 
           
            // make imm14 with s 1 bit, imm6d 6 bits, imm7b 7 bits in adds instruction
            imm14 =  ((slot0 >> 13) & 0x7F)                |
                        (((slot0 >> 27) & 0x3F) << 7)         |
                        (((slot0 >> 36) & 0x1) << 13);

            return (ULONG)imm14;
}

Other instructions can be analyzed by the above steps and by referencing IA64 manual.

Conclusion
We have just reviewed about some of the registers and instructions of IA64. At first glance, you would think IA64 has something totally different and it is all new to you. However, you can get over it and realize that its not as difficult as you expect.
Actually, I have felt excited to learn about this during my project concerned with IA64 so I hope you also have a chance to enjoy learning this. I also hope this article gives you confidence when you begin your IA64-based project.