Sunday, November 16, 2014

Debugging memory corruption (Advanced)

Sometimes software debugging is too tired task if there is no evidence to find a root cause of the problem, in that case, we need to concentrate on collecting any little clues in the crash dump. Therefore, I’d like to name the subtitle of this article as “Looking for a needle in a haystack”.

Today, we are going to look at a crash dump caused by memory corruption. We can guess that this is a memory corruption case and mydrv.sys might corrupt the memory when we open the crash dump at a glance but it’s hard to find the source code line which has the bug in mydrv.sys because this crash dump does not have any call stack related with mydrv.sys. Generally, after memory corruption has been happened and then the kernel would be crashed later, so we have to invent some proper way in order to find out the root cause of the problem.

Let’s go with "!analyze -v"

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003.  This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG.  This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG.  This will let us see why this breakpoint is
happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 8092e20b, The address that the exception occurred at
Arg3: f3a4ea60, Trap Frame
Arg4: 00000000
Debugging Details:
------------------
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - 0x%08lx
FAULTING_IP: 
nt!ObpLookupDirectoryEntry+ee
8092e20b 394608          cmp     dword ptr [esi+8],eax
TRAP_FRAME:  f3a4ea60 -- (.trap 0xfffffffff3a4ea60)
ErrCode = 00000000
eax=0000e470 ebx=e1fdb600 ecx=e1753c2c edx=00000011 esi=00740065 edi=e1753be8
eip=8092e20b esp=f3a4ead4 ebp=f3a4eaec iopl=0         nv up ei pl nz na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010206
nt!ObpLookupDirectoryEntry+0xee:
8092e20b 394608          cmp     dword ptr [esi+8],eax ds:0023:0074006d=????????
Resetting default scope
DEFAULT_BUCKET_ID:  DRIVER_FAULT
BUGCHECK_STR:  0x8E
PROCESS_NAME:  iexplore.exe
CURRENT_IRQL:  0
LAST_CONTROL_TRANSFER:  from 8085bba7 to 8087c480
STACK_TEXT:  
f3a4e62c 8085bba7 0000008e c0000005 8092e20b nt!KeBugCheckEx+0x1b
f3a4e9f0 808346b4 f3a4ea0c 00000000 f3a4ea60 nt!KiDispatchException+0x3a2
f3a4ea58 80834668 f3a4eaec 8092e20b badb0d00 nt!CommonDispatchException+0x4a
f3a4ea78 80935a06 00000000 f3a4eb67 f3a4eb7c nt!Kei386EoiHelper+0x186
f3a4eaec 8092c2ab e1753b98 f3a4eb14 00000050 nt!SepAccessCheck+0x41d
f3a4eb58 80939f97 00000000 f3a4ec3c 00000000 nt!ObpLookupObjectName+0x45f
f3a4ec48 809305c8 00000000 00000000 00000000 nt!ObReferenceObjectByName+0x8e
f3a4ed10 80920300 0559cd88 0315b948 0315b954 nt!NtSecureConnectPort+0x1d6
f3a4ed3c 80833bdf 0559cd88 0315b948 0315b954 nt!NtConnectPort+0x24
f3a4ed3c 7c9685ec 0559cd88 0315b948 0315b954 nt!KiFastCallEntry+0xfc
...
STACK_COMMAND:  kb
FOLLOWUP_IP: 
nt!ObpLookupDirectoryEntry+ee
8092e20b 394608          cmp     dword ptr [esi+8],eax
SYMBOL_STACK_INDEX:  0
SYMBOL_NAME:  nt!ObpLookupDirectoryEntry+ee
FOLLOWUP_NAME:  MachineOwner
MODULE_NAME: nt
IMAGE_NAME:  ntkrnlmp.exe
DEBUG_FLR_IMAGE_TIMESTAMP:  45ec14ca
FAILURE_BUCKET_ID:  0x8E_nt!ObpLookupDirectoryEntry+ee
BUCKET_ID:  0x8E_nt!ObpLookupDirectoryEntry+ee
Followup: MachineOwner
---------


According to the messages, this is a very common bugcheck but there is not useful information to debug.
There is no my module name on the call stack except NT kernel, so there are no hints to analyze this problem.

At the end of the messages, it points a function that the exception occurred at is nt!ObpLookupDirectoryEntry+ee, but it is not useful either because it’s really rare case that it is from NT kernel bug, most cases are from third party driver’s bug or mine. Therefore, we need to make every possible effort to find out any hint of the crash.

Let’s begin with .trap command.

0: kd> .trap 0xfffffffff3a4ea60

ErrCode = 00000000

eax=0000e470 ebx=e1fdb600 ecx=e1753c2c edx=00000011 esi=00740065 edi=e1753be8
eip=8092e20b esp=f3a4ead4 ebp=f3a4eaec iopl=0         nv up ei pl nz na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010206
nt!ObpLookupDirectoryEntry+0xee:
8092e20b 394608          cmp     dword ptr [esi+8],eax ds:0023:0074006d=????????


It was crashed when the instruction at 0x8092e20b which dereferences 0x0074006d was executed.  
0x0074006d is invalid address so the exception could not be handled.

Where is the value 0x0074006d from?
We can trace back the last executed instructions.

0: kd> ub eip

nt!ObpLookupDirectoryEntry+0xd1:

8092e1eb f00fb111        lock cmpxchg dword ptr [ecx],edx
8092e1ef 85c0            test    eax,eax
8092e1f1 0f8554ccfdff    jne     nt!ObpLookupDirectoryEntry+0xd9 (8090ae4b)
8092e1f7 c746103412dddd  mov     dword ptr [esi+10h],0DDDD1234h
8092e1fe 8b33            mov     esi,dword ptr [ebx]    <=== esi = 00740065 from [ebx]
8092e200 85f6            test    esi,esi
8092e202 0f848a470000    je      nt!ObpLookupDirectoryEntry+0x122 (80932992)
8092e208 8b45f8          mov     eax,dword ptr [ebp-8]
0: kd> u eip
nt!ObpLookupDirectoryEntry+0xee:
8092e20b 394608          cmp     dword ptr [esi+8],eax    <=== faulting address
8092e20e 0f851a690000    jne     nt!ObpLookupDirectoryEntry+0x112 (80934b2e)
8092e214 8b4604          mov     eax,dword ptr [esi+4]
8092e217 0fb648f4        movzx   ecx,byte ptr [eax-0Ch]

0: kd> r


Last set context:

eax=0000e470 ebx=e1fdb600 ecx=e1753c2c edx=00000011 esi=00740065 edi=e1753be8
eip=8092e20b esp=f3a4ead4 ebp=f3a4eaec iopl=0         nv up ei pl nz na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010206
nt!ObpLookupDirectoryEntry+0xee:
8092e20b 394608          cmp     dword ptr [esi+8],eax ds:0023:0074006d=????????


0x0074006d was from esi+8 where esi was 0x00740065.
0x00740065 in esi was from above address 0x8092e1fe, it was from [ebx].

The current value in ebx is 0xe1fdb600 and the data in the address should be a valid memory pointer, but the data 0x00740065 was not valid memory pointer.
We can check the memory information of 0xe1fdb600 using !pool command.

0: kd> !pool e1fdb600

Pool page e1fdb600 region is Paged pool

 e1fdb000 size:  290 previous size:    0  (Allocated)  Gla:
 e1fdb290 size:   10 previous size:  290  (Free)       9.0.
 ...
 e1fdb490 size:  168 previous size:   a0  (Allocated)  MyMm
e1fdb5f8 is not a valid large pool allocation, checking large session pool...
e1fdb5f8 is freed (or corrupt) pool
Bad previous allocation size @e1fdb5f8, last size was 2d
***
*** An error (or corruption) in the pool was detected;
...
None found


There are several memory informations but it can't display memory informations from 0xe1fdb5f8, the message indicates that the memory area could be freed or corrupted. So we can guess the original data in 0xe1fdb600 might be valid memory pointer, but it was corrupted or overwritten by someone else.

Generally, a driver using the valid memory 0xe1fdb490 nearby corrupted area is suspect. The memory tag of 0xe1fdb490 is MyMm and this tag belongs to mydrv.sys. Therefore, mydrv.sys might corrupt the data in 0xe1fdb600 while it handles it's memory which has MyMm tag. 

However, we can’t fix this problem yet because we don’t know where mydrv.sys uses 0xe1fdb490, MyMm is general purpose memory tag in this driver so it’s hard to find specific code using this memory tag. How can we find the source code position of mydrv.sys we have to fix?

We have only one clue that mydrv.sys had used the memory 0xe1fdb490, we have to find the code using it. To simplify this problem, I settle an assumption that 0xe1fd490 is saved in the memory area of mydrv.sys as global variable.

We can check memory range of mydrv.sys like this:

0: kd> lmvm mydrv
start    end        module name
f3fa3000 f400ba00   mydrv   (deferred)             
    Image path: \??\C:\WINDOWS\system32\drivers\mydrv.sys
    Image name: mydrv.sys
    ...


mydrv.sys is located from 0xf3fa3000 to 0xf400ba00. I will search for a global variable containing 0xe1db490. I have to search for 0xe1fdb498 (0xe1fdb490 + 8) because 0xe1fdb490 displayed in !pool command output is the address of memory header, the real address allocated to mydrv.sys is 0xe1fdb498 where 0xe1fdb490 + memory header size.

We can search the memory as following:

0: kd> s f3fa3000 f400ba00 98 b4 fd e1


There is no result. It means that there is no global variable containing this value in mydrv.sys.
We need to change our strategy, next step would be to search whole memory area in order to find the memory allocated to mydrv.sys, it is like looking for a needle in a haystack. :)

I'll check the paged pool memory range because mydrv.sys uses only Paged Pool.
We can check Paged Pool range using !pool command without parameter.

0: kd> !pool

Paged Pool: e1000000 .. eb7fffff
e1000000: 1000 - busy
e1001000: 1000 - busy
e1002000: 1000 - busy
...
...
...
e3e74000: 22000 - busy
e3e9d000: 15000 - free
e3eb2000: 41000 - free
e3ef3000: 22000 - busy
e3f15000: a0c000 - free
e4921000: 6edf000 - free


The Paged Pool range is from 0xe1000000 to 0xeb7fffff but those are all freed from 0xe3f15000 so the valid address range for searching would be from 0xe1000000 to 0xe3f15000. Let’s look for a needle(0xe1fdb498) in the pool range.

0: kd> s e1000000 e3f15000 98 b4 fd e1
e115aec8  98 b4 fd e1 00 00 00 00-05 06 26 0c 42 63 4d 63  ..........&.MyMm


Wow! I’ve got a needle at 0xe115aec8. As you can see, we can find a meaningful information with any little clue.

To check the memory information, use !pool command.

0: kd> !pool e115aec8
Pool page e115aec8 region is Paged pool
 e115a000 size:  9d0 previous size:    0  (Allocated)  MyMm
 e115a9d0 size:   18 previous size:  9d0  (Allocated)  Ntf0 ...
 e115ae20 size:   48 previous size:   68  (Allocated)  Ntfc
 e115ae68 size:   40 previous size:   48  (Allocated)  MyMm
*e115aea8 size:   28 previous size:   40  (Allocated) *MyMm
  Owning component : Unknown (update pooltag.txt)


0xe115aec8 is within 0xe115aea8 allocation and memory tag MyMm means that it belongs to mydrv.sys
MyMm is general purpose memory tag, we can’t know yet what the memory is for.

I will search the pool for 0xe115ea8 in order to verify which code uses the memory.
We need to use 0xe115aeb0, added memory header size 8.

0: kd> s e1000000 e3f15000 b0 ae 15 e1
e18407e4  b0 ae 15 e1 a0 b4 fd e1-32 00 00 00 c4 ea a4 f3  ........2.......


0xe18407e4 contains 0xe115aeb0, check it with !pool.

0: kd> !pool e18407e4
Pool page e18407e4 region is Paged pool
 e1840000 size:   b8 previous size:    0  (Allocated)  Port (Protected)
 e18400b8 size:   10 previous size:   b8  (Allocated)  Glnk
  ...
 e1840720 size:   30 previous size:   18  (Allocated)  CMVa
 e1840750 size:   18 previous size:   30  (Allocated)  Uspi Process: 85b31c88
*e1840768 size:   98 previous size:   18  (Allocated) *MyMm  Owning component : Unknown (update pooltag.txt)
 e1840800 size:   50 previous size:   98  (Allocated)  LStb ...


Hmm… it is same pattern. Search for 0xe1840770 one more time.

0: kd> s e1000000 e3f15000 70 07 84 e1
e2f3f874  70 07 84 e1 48 21 13 e2-01 00 00 00 00 00 00 00  p...H!..........


0xe2f3f874 contains 0xe1840770 , check it with !pool.

0: kd> !pool e2f3f874
Pool page e2f3f874 region is Paged pool
 e2f3f000 size:  100 previous size:    0  (Allocated)  IoNm
 e2f3f100 size:   98 previous size:  100  (Allocated)  NtFs ...
 e2f3f760 size:  108 previous size:   38  (Allocated)  CM39
*e2f3f868 size:   58 previous size:  108  (Allocated) *MySt
  Owning component : Unknown (update pooltag.txt)
 e2f3f8c0 size:   80 previous size:   58  (Allocated)  IoNm ...


0xe2f3f874 is within 0xe2f3f868 allocation and memory tag MySt means that it belongs to mydrv.sys as well, but MySt tag is used for specific structure in the driver so we can retrieve the structure from source code and match with the memory data. We are lucky!

typedef struct _MY_STRUCT {
 ULONG            ulType;
 DRV_STRUCT  *pDrvStruct1;
 DRV_STRUCT  *pDrvStruct2;
 ...
} MY_STRUCT;


This is MY_STRUCT structure and TYPE_ID of MY_STRUCT is defined as 0x10. 
We have to make sure the first field ulType in the memory 0xe2f3f870 (0xe2f3f868 + 8).

0: kd> db e2f3f870
e2f3f870  10 00 00 00 70 07 84 e1-48 21 13 e2 01 00 00 00  ....p...H!......
e2f3f880  00 00 00 00 00 00 00 00-65 73 00 00 00 00 00 00  ........es......
e2f3f890  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................


It has correct value 0x10 so we are sure the memory is for MY_STRUCT. From now, we can trace back the addresses we've found above with structure defined in the source code of mydrv.sys. 

Use dt command with structure type and address.

0: kd> dt mydrv!MY_STRUCT e2f3f870
   +0x000 ulType          : 16
   +0x004 pDrvStruct1    : 0xe1840770 
   +0x008 pDrvStruct2    : 0xe2132148 
   ...


We can see the familiar address 0xe1840770 and the type of pDrvStruct1 is DRV_STRUCT.

0: kd> dt mydrv!DRV_STRUCT e1840770
   +0x000 id               : 64 
    ...
   +0x074 pPathStruct   : 0xe115aeb0 
   +0x078 pOsPath   : 0xe1fdb4a0  -> 0x5c
   ...


We can see the familiar address 0xe115aeb0 as well and the type of pPathStruct is PATH_STRUCT.

0: kd> dt mydrv!PATH_STRUCT e115aeb0
   +0x000 id               : 6d
   ...
   +0x018 pPathData     : 0xe1fdb498


Eventaully, we can see the address 0xe1fdb498 and the type of pPathData is PATH_DATA.

Let’s check the contents of the structure.

0: kd> dt mydrv!PATH_DATA e1fdb498
   +0x000 pPrev            : (null) 
   +0x004 pLimit           : 0xe1fdb5f2  "u"
   +0x008 Data             : []  "\"

0: kd> db e1fdb498
e1fdb498  00 00 00 00 f2 b5 fd e1-5c 00 44 00 65 00 76 00  ........\.D.e.v.
e1fdb4a8  69 00 63 00 65 00 5c 00-48 00 61 00 72 00 64 00  i.c.e.\.H.a.r.d.
e1fdb4b8  64 00 69 00 73 00 6b 00-56 00 6f 00 6c 00 75 00  d.i.s.k.V.o.l.u.
e1fdb4c8  6d 00 65 00 32 00 5c 00-57 00 49 00 4e 00 44 00  m.e.2.\.W.I.N.D.

e1fdb4d8  4f 00 57 00 53 00 5c 00-73 00 79 00 73 00 74 00  O.W.S.\.s.y.s.t.
e1fdb4e8  65 00 6d 00 33 00 32 00-5c 00 6f 00 6c 00 65 00  e.m.3.2.\.o.l.e.
e1fdb4f8  33 00 32 00 2e 00 64 00-6c 00 6c 00 00 00 6c 00  3.2...d.l.l...l.
e1fdb508  6c 00 00 00 6c 00 6c 00-00 00 74 00 6f 00 72 00  l...l.l...t.o.r.
e1fdb518  5c 00 4c 00 6f 00 63 00-61 00 6c 00 20 00 53 00  \.L.o.c.a.l. .S.
e1fdb528  65 00 74 00 74 00 69 00-6e 00 67 00 73 00 5c 00  e.t.t.i.n.g.s.\.
e1fdb538  54 00 65 00 6d 00 70 00-6f 00 72 00 61 00 72 00  T.e.m.p.o.r.a.r.
e1fdb548  79 00 20 00 49 00 6e 00-74 00 65 00 72 00 6e 00  y. .I.n.t.e.r.n.
e1fdb558  65 00 74 00 20 00 46 00-69 00 6c 00 65 00 73 00  e.t. .F.i.l.e.s.
e1fdb568  5c 00 43 00 6f 00 6e 00-74 00 65 00 6e 00 74 00  \.C.o.n.t.e.n.t.
e1fdb578  2e 00 49 00 45 00 35 00-5c 00 39 00 46 00 4c 00  ..I.E.5.\.9.F.L.
e1fdb588  4b 00 4f 00 4d 00 4a 00-31 00 5c 00 6d 00 61 00  K.O.M.J.1.\.m.a.
e1fdb598  70 00 5b 00 31 00 5d 00-2e 00 67 00 69 00 66 00  p.[.1.]...g.i.f.
e1fdb5a8  00 00 5d 00 2e 00 67 00-69 00 66 00 00 00 69 00  ..]...g.i.f...i.
e1fdb5b8  66 00 00 00 00 00 69 00-66 00 00 00 69 00 63 00  f.....i.f...i.c.
e1fdb5c8  65 00 5c 00 48 00 61 00-72 00 64 00 64 00 69 00  e.\.H.a.r.d.d.i.
e1fdb5d8  73 00 6b 00 56 00 6f 00-6c 00 75 00 6d 00 65 00  s.k.V.o.l.u.m.e.
e1fdb5e8  32 00 5c 00 44 00 6f 00-63 00 75 00 6d 00 65 00  2.\.D.o.c.u.m.e.
e1fdb5f8  6e 00 74 00 73 00 53 00-65 00 74 00 74 00 69 00  n.t.s.S.e.t.t.i.
e1fdb608  6e 00 67 00 73 00 5c 00-41 00 64 00 6d 00 69 00  n.g.s.\.A.d.m.i.
e1fdb618  6e 00 69 00 73 00 74 00-72 00 61 00 74 00 6f 00  n.i.s.t.r.a.t.o.
e1fdb628  72 00 5c 00 4c 00 6f 00-63 00 61 00 6c 00 20 00  r.\.L.o.c.a.l. .
e1fdb638  53 00 65 00 74 00 74 00-69 00 6e 00 67 00 73 00  S.e.t.t.i.n.g.s.
e1fdb648  5c 00 54 00 65 00 6d 00-70 00 6f 00 72 00 61 00  \.T.e.m.p.o.r.a.
e1fdb658  72 00 79 00 20 00 49 00-6e 00 74 00 65 00 72 00  r.y. .I.n.t.e.r.
e1fdb668  6e 00 65 00 74 00 20 00-46 00 69 00 6c 00 65 00  n.e.t. .F.i.l.e.
e1fdb678  73 00 5c 00 43 00 6f 00-6e 00 74 00 65 00 6e 00  s.\.C.o.n.t.e.n.
e1fdb688  74 00 2e 00 49 00 45 00-35 00 5c 00 39 00 46 00  t...I.E.5.\.9.F.
e1fdb698  4c 00 4b 00 4f 00 4d 00-4a 00 31 00 5c 00 62 00  L.K.O.M.J.1.\.b.
e1fdb6a8  30 00 32 00 5f 00 6f 00-6e 00 5f 00 6f 00 66 00  0.2._.o.n._.o.f.
e1fdb6b8  66 00 2d 00 4c 00 5b 00-31 00 5d 00 2e 00 67 00  f.-.L.[.1.]...g.
e1fdb6c8  69 00 66 00 00 00 01 00-88 20 c6 85 88 a8 cd 85  i.f...... ......


It is filled with string data from 0xe1fdb498 to 0xe1fdb6c8 but the range of memory allocated to 0xe1fdb498 is from 0xe1fdb490 to 0xe1fdb5f7.

0: kd> !pool e1fdb498
Pool page e1fdb498 region is Paged pool
 e1fdb000 size:  290 previous size:    0  (Allocated)  Gla:
 ...
 e1fdb3f0 size:   a0 previous size:    8  (Allocated)  Gla8
*e1fdb490 size:  168 previous size:   a0  (Allocated) *MyMm
  Owning component : Unknown (update pooltag.txt)
e1fdb5f8 is not a valid large pool allocation, checking large session pool...
e1fdb5f8 is freed (or corrupt) pool


Maybe, 0xe1fdb5f8 was allocated to NT kernel so there are NT kernel functions on the top of the call stack in this crash dump. NT kernel lost its data and was crashed because mydrv.sys overwrote the string data over its allocation range.

Finally, We've found a bug of the function handling PATH_DATA in mydrv.sys corrupts a memory in the paged pool. 

It was my fault. :)

2 comments: