
近期手头收到了不少蓝屏崩溃的dump,其中有两个比较离谱所以特意将分析过程记录一下。接下来废话不多说,咱们直接开始。
Page Fault
首先祭出万能的!analyze -v命令,让windbg自动为我们跑一下dump的现场:
- 0: kd> !analyze -v
- *******************************************************************************
- * *
- * Bugcheck Analysis *
- * *
- *******************************************************************************
- PAGE_FAULT_IN_NONPAGED_AREA (50)
- Invalid system memory was referenced. This cannot be protected by try-except.
- Typically the address is just plain bad or it is pointing at freed memory.
- Arguments:
- Arg1: ffffba84023d5000, memory referenced.
- Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
- Arg3: fffff8077ec91174, If non-zero, the instruction address which referenced the bad memory
- address.
- Arg4: 0000000000000002, (reserved)
- Debugging Details:
- ------------------
- Unable to load image \??\C:\Windows\system32\drivers\TargetSys64.sys, Win32 error 0n2
- "C:\Windows\System32\KERNELBASE.dll" was not found in the image list.
- Debugger will attempt to load "C:\Windows\System32\KERNELBASE.dll" at given base 00000000`00000000.
- Please provide the full image name, including the extension (i.e. kernel32.dll)
- for more reliable results.Base address and size overrides can be given as
- .reload <image.ext>=<base>,<size>.
- Unable to add module at 00000000`00000000
- KEY_VALUES_STRING: 1
- Key : AV.Type
- Value: Read
- Key : Analysis.CPU.mSec
- Value: 3640
- Key : Analysis.DebugAnalysisManager
- Value: Create
- Key : Analysis.Elapsed.mSec
- Value: 127157
- Key : Analysis.Init.CPU.mSec
- Value: 2968
- Key : Analysis.Init.Elapsed.mSec
- Value: 87184
- Key : Analysis.Memory.CommitPeak.Mb
- Value: 119
- Key : WER.OS.Branch
- Value: vb_release
- Key : WER.OS.Timestamp
- Value: 2019-12-06T14:06:00Z
- Key : WER.OS.Version
- Value: 10.0.19041.1
- FILE_IN_CAB: App.dmp
- VIRTUAL_MACHINE: VMware
- BUGCHECK_CODE: 50
- BUGCHECK_P1: ffffba84023d5000
- BUGCHECK_P2: 0
- BUGCHECK_P3: fffff8077ec91174
- BUGCHECK_P4: 2
- READ_ADDRESS: ffffba84023d5000 Nonpaged pool
- MM_INTERNAL_CODE: 2
- IMAGE_NAME: TargetSys64.sys
- MODULE_NAME: TargetSys64
- FAULTING_MODULE: fffff8077ec70000 TargetSys64
- PROCESS_NAME: App.exe
- TRAP_FRAME: fffffe8727a9ab70 -- (.trap 0xfffffe8727a9ab70)
- NOTE: The trap frame does not contain all registers.
- Some register values may be zeroed or incorrect.
- rax=ffff994ca6532701 rbx=0000000000000000 rcx=8a000000a5282863
- rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
- rip=fffff8077ec91174 rsp=fffffe8727a9ad00 rbp=0000000000000208
- r8=fffffe8727a9c000 r9=ffff995d42011ea0 r10=ffffba84023d4ff9
- r11=ffff994ca6532000 r12=0000000000000000 r13=0000000000000000
- r14=0000000000000000 r15=0000000000000000
- iopl=0 nv up ei pl nz na pe nc
- TargetSys64+0x21174:
- fffff807`7ec91174 488b07 mov rax,qword ptr [rdi] ds:00000000`00000000=????????????????
- Resetting default scope
- STACK_TEXT:
- fffffe87`27a9a118 fffff805`23b2af82 : fffffe87`27a9a280 fffff805`23990f60 fffff807`7ec70000 00000000`00000000 : nt!DbgBreakPointWithStatus
- fffffe87`27a9a120 fffff805`23b2a566 : fffff807`00000003 fffffe87`27a9a280 fffff805`23a27a90 fffffe87`27a9a7d0 : nt!KiBugCheckDebugBreak+0x12
- fffffe87`27a9a180 fffff805`23a10747 : 00000000`00000000 00000000`00000000 ffffba84`023d5000 ffffba84`023d5000 : nt!KeBugCheck2+0x946
- fffffe87`27a9a890 fffff805`23a4bcbf : 00000000`00000050 ffffba84`023d5000 00000000`00000000 fffffe87`27a9ab70 : nt!KeBugCheckEx+0x107
- fffffe87`27a9a8d0 fffff805`23843730 : fffff805`23613000 00000000`00000000 fffffe87`27a9abf0 00000000`00000000 : nt!MiSystemFault+0x1de34f
- fffffe87`27a9a9d0 fffff805`23a201d8 : 00000000`00000000 00000000`70503454 00000000`00000080 00000000`00000657 : nt!MmAccessFault+0x400
- fffffe87`27a9ab70 fffff807`7ec91174 : ffffba84`0ccf0600 00000000`000007d4 ffffba84`0ccf0600 00000000`00000000 : nt!KiPageFault+0x358
- fffffe87`27a9ad00 fffff807`7fe11fca : ffffba83`f8c7e890 ffffba84`00000103 ffffba83`f5eb1814 ffffba84`0cef6790 : TargetSys64+0x21174
- fffffe87`27a9ad90 fffff807`7fe12c8e : 00000000`00040246 00000000`00000008 fffff807`7ec74b98 00000000`00000000 : TargetSys64+0x11a1fca
- fffffe87`27a9b660 fffff805`23984777 : ffffba84`00646e50 fffff805`23fe81be 00000000`00000000 ffffba83`fb7cb7d0 : TargetSys64+0x11a2c8e
- fffffe87`27a9b700 fffff805`23fdbf2a : ffffba84`00646e50 ffffba83`fb7cb7d0 00000000`20206f49 00000000`00000000 : nt!IopfCallDriver+0x53
- fffffe87`27a9b740 fffff805`23a35c6f : 00000000`00000002 ffffba84`07446d80 00000000`00000028 ffffba84`01884af0 : nt!IovCallDriver+0x266
- fffffe87`27a9b780 fffff805`23c1442c : 00000000`00000001 ffffba84`00646f20 ffffba84`07446d80 fffffe87`27a9bb00 : nt!IofCallDriver+0x21265f
- fffffe87`27a9b7c0 fffff805`23c14081 : ffffba84`00646f20 fffffe87`27a9bb00 00000000`00010000 ffffba84`00646f20 : nt!IopSynchronousServiceTail+0x34c
- fffffe87`27a9b860 fffff805`23c133f6 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!IopXxxControlFile+0xc71
- fffffe87`27a9b9a0 fffff805`23a23ef5 : 00000000`00000670 00000000`09d0e678 00000000`09d0e688 00000000`00000008 : nt!NtDeviceIoControlFile+0x56
- fffffe87`27a9ba10 00000000`77231cfc : 00000000`77231933 00000023`772b2e1c 00000000`00000023 00000000`00000004 : nt!KiSystemServiceCopyEnd+0x25
- 00000000`09d0ef88 00000000`77231933 : 00000023`772b2e1c 00000000`00000023 00000000`00000004 00000000`09e0edc4 : wow64cpu!CpupSyscallStub+0xc
- 00000000`09d0ef90 00000000`772311b9 : 00000000`09e0f788 00007fff`18fa3a74 00000000`00000000 00007fff`18fa3b6f : wow64cpu!DeviceIoctlFileFault+0x31
- 00000000`09d0f040 00007fff`18fa3989 : 00000000`077b8020 00000000`00000000 00000000`00000000 00000000`09d0f480 : wow64cpu!BTCpuSimulate+0x9
- 00000000`09d0f080 00007fff`18fa337d : 00000000`00000000 00000000`00000001 00000000`00000000 00000000`00000000 : wow64!RunCpuSimulation+0xd
- 00000000`09d0f0b0 00007fff`1a405059 : 00000000`00000000 00000000`00000000 00000000`00000001 00000000`00000000 : wow64!Wow64LdrpInitialize+0x12d
- 00000000`09d0f360 00007fff`1a404c43 : 00000000`00000000 00007fff`1a390000 00000000`00000000 00000000`051e0000 : ntdll!LdrpInitialize+0x3fd
- 00000000`09d0f400 00007fff`1a404bee : 00000000`09d0f480 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!LdrpInitialize+0x3b
- 00000000`09d0f430 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!LdrInitializeThunk+0xe
- SYMBOL_NAME: TargetSys64+21174
- STACK_COMMAND: .cxr; .ecxr ; kb
- BUCKET_ID_FUNC_OFFSET: 21174
- FAILURE_BUCKET_ID: AV_VRF_R_(null)_TargetSys64!unknown_function
- OS_VERSION: 10.0.19041.1
- BUILDLAB_STR: vb_release
- OSPLATFORM_TYPE: x64
- OSNAME: Windows 10
- FAILURE_ID_HASH: {202d262d-2ca9-029e-2e1d-9bf74e9fa6df}
- Followup: MachineOwner
- ---------
初步看上去仅仅是一次平平无奇的PAGE_FAULT_IN_NONPAGED_AREA错误,继续关注一下产生该错误的指令如下:
- rax=ffff994ca6532701 rbx=0000000000000000 rcx=8a000000a5282863
- rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
- rip=fffff8077ec91174 rsp=fffffe8727a9ad00 rbp=0000000000000208
- r8=fffffe8727a9c000 r9=ffff995d42011ea0 r10=ffffba84023d4ff9
- r11=ffff994ca6532000 r12=0000000000000000 r13=0000000000000000
- r14=0000000000000000 r15=0000000000000000
- iopl=0 nv up ei pl nz na pe nc
- TargetSys64+0x21174:
- fffff807`7ec91174 488b07 mov rax,qword ptr [rdi] ds:00000000`00000000=????????????????
其现场看上去是对rdi地址做解引用操作,而rdi的值——也就是解引用的地址——为0,常识告诉我们对0地址做读操作会产生BSOD。
到目前为止,一切都看上去都很正常,可是当笔者尝试进一步分析为何指针会是0的时候,发现事情没有想象中那么简单。让我们先来看一下产生异常的指令附近的代码:
- 0: kd> ub fffff807`7ec91174
- fffff807`7ec91160 33f6 xor esi,esi
- fffff807`7ec91162 4885ff test rdi,rdi
- fffff807`7ec91165 7430 je TargetSys64+0x21197 (fffff807`7ec91197)
- fffff807`7ec91167 488bcf mov rcx,rdi
- fffff807`7ec9116a 52 push rdx
- fffff807`7ec9116b e8e6e45600 call TargetSys64+0x58f656 (fffff807`7f1ff656)
- fffff807`7ec91170 84c0 test al,al
- fffff807`7ec91172 7423 je TargetSys64+0x21197 (fffff807`7ec91197)
- 0: kd> u fffff807`7ec91174
- fffff807`7ec91174 488b07 mov rax,qword ptr [rdi]
- fffff807`7ec91177 4839442420 cmp qword ptr [rsp+20h],rax
- fffff807`7ec9117c 750a jne TargetSys64+0x21188 (fffff807`7ec91188)
- fffff807`7ec9117e 4989b424a0000000 mov qword ptr [r12+0A0h],rsi
- fffff807`7ec91186 eb0f jmp TargetSys64+0x21197 (fffff807`7ec91197)
- fffff807`7ec91188 4839442428 cmp qword ptr [rsp+28h],rax
- fffff807`7ec9118d 7508 jne TargetSys64+0x21197 (fffff807`7ec91197)
- fffff807`7ec9118f 4989b42498000000 mov qword ptr [r12+98h],rsi
fffff807`7ec91162处的test rdi,rdi这里已经判断了rdi是否为零,如果是零的话则会跳转到fffff807`7ec91197处(函数结尾)。而且笔者翻遍了所有反汇编代码,也没有发现可以绕开这条检查指令的跳转。
此时冲突就来了:既然有检查指针是否为空的地方,那么导致蓝屏的这条对0地址的解引用指令中rdi的值0,是如何产生的呢?
经过长时间瞎猫碰死耗子式的折腾,突然发现windbg有这么几行提示:
- TRAP_FRAME: fffffe8727a9ab70 -- (.trap 0xfffffe8727a9ab70)
- NOTE: The trap frame does not contain all registers.
- Some register values may be zeroed or incorrect.
原来TrapFramRecord中记录的寄存器还可以是不对的吗?
经过查阅张银奎老师的《软件调试》可知,当发生PageFault异常时CR2寄存器会存放导致异常的内存地址,我们来看一下此时该寄存器中的值是多少:
- 0: kd> r @cr2
- cr2=ffffba84023d5000
- 0: kd> db @cr2
- ffffba84`023d5000 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
- ffffba84`023d5010 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
那么接下来的问题就是搞清楚rdi的值应该是多少,这需要我们分析异常上下文的代码:
- 0: kd> uf fffff8077ec91174
- ……
- fffff807`7ec9113b 65488b3c2588010000 mov rdi,qword ptr gs:[188h]
- ……
- fffff807`7ec91154 4c8daf00100000 lea r13,[rdi+1000h]
- fffff807`7ec9115b 493bfd cmp rdi,r13
- fffff807`7ec9115e 7342 jae TargetSys64+0x211a2 (fffff807`7ec911a2) Branch
- ……
- fffff807`7ec91162 4885ff test rdi,rdi
- fffff807`7ec91165 7430 je TargetSys64+0x21197 (fffff807`7ec91197) Branch
- ……
- fffff807`7ec91174 488b07 mov rax,qword ptr [rdi]
- ……
- fffff807`7ec91197 48ffc7 inc rdi
- ……
- fffff807`7ec9119d 493bfd cmp rdi,r13
- fffff807`7ec911a0 72c0 jb TargetSys64+0x21162 (fffff807`7ec91162) Branch
在X86_64下,gs:[188h]其实就是ETHREAD:

阅读上述反汇编代码,等价于下面的C代码:
- struct _ETHREAD* CurrentThread = KeGetCurrentThread();
- for (PUCHAR CurrentPointer = (PUCHAR)CurrentThread; CurrentPointer <= (PUCHAR)CurrentThread + 0x1000; ++CurrentPointer)
- {
- ……
- auto rax = *(QWORD *)CurrentPointer;
- ……
- }
说人话就是从当前线程结构搜索一个页(0x1000也就是4kb)大小的内存。
那么当前线程是多少?由windbg可知:
- 0: kd> .thread
- Implicit thread is now ffffba84`023d4080
- 0: kd> dp gs:[0x188]
- 002b:00000000`00000188 ffffba84`023d4080 00000000`00000000
- 002b:00000000`00000198 fffff805`2433aa00 00000000`01010100
使用.thread命令或直接查看寄存器相对偏移处的值均可得到相同的值,即当前线程是ffffba84`023d4080,程序将会从这个地址开始,遍历一个内存页大小的数据。
而ffffba84`023d4080这个地址属于如下页范围:
- >> rax2 =16 '0xffffba84023d4080/0x1000*0x1000'
- 0xffffba84023d4000
- >> rax2 =16 '0xffffba84023d4080/0x1000*0x1000+0x1000-1'
- 0xffffba84023d4fff
- 0: kd> db ffffba84`023d4fff
- ffffba84`023d4fff ff ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? .???????????????
- ffffba84`023d500f ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
- ffffba84`023d501f ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
可见从ffffba84`023d4fff+1,也就是ffffba84`023d5000开始内存就变为不可用了,而此地址正是上文中我们分析得到的位于CR2中存储的导致Page Fault的地址。
至此,这个dump成因已经明确:
- 程序遍历从当前线程的ETHREAD开始、大小为0x1000的地址
- 地址以1递增,每次取一个QWORD长度的数据进行操作
- 当地址递增到ffffba84`023d4ff9时,取一个QWORD长度,此时会访问ffffba84`023d5000地址
- ffffba84`023d5000地址为跨页后的第一个地址,为无效地址,导致Page Fault发生
因此,这个bug的解决方案为,二选一即可:
- 限定搜索范围为起始地址至当前地址所属页的最后一个地址
- 判断跨页的情况
Double Fault
ZwQueryVirtualMemory引发的异常
照例先上现场的!analyze -v日志:
- *** Fatal System Error: 0x0000007f
- (0x00000008,0x80F4F000,0x00000000,0x00000000)
- Break instruction exception - code 80000003 (first chance)
- A fatal system error has occurred.
- Debugger entered on first try; Bugcheck callbacks have not been invoked.
- A fatal system error has occurred.
- For analysis of this file, run !analyze -v
- WARNING: Process directory table base D77EB060 doesn't match CR3 001A8000
- WARNING: Process directory table base D77EB060 doesn't match CR3 001A8000
- nt!RtlpBreakWithStatusInstruction:
- 81945064 cc int 3
- kd> !analyze -v
- Connected to Windows 10 14393 x86 compatible target at (Wed Dec 27 12:19:59.494 2023 (UTC + 8:00)), ptr64 FALSE
- Loading Kernel Symbols
- ...............................................................
- ................................................................
- ...........................
- Loading User Symbols
- ...............
- Loading unloaded module list
- .............ReadVirtual: 12f40000 not properly sign extended
- *******************************************************************************
- * *
- * Bugcheck Analysis *
- * *
- *******************************************************************************
- UNEXPECTED_KERNEL_MODE_TRAP (7f)
- This means a trap occurred in kernel mode, and it's a trap of a kind
- that the kernel isn't allowed to have/catch (bound trap) or that
- is always instant death (double fault). The first number in the
- BugCheck params is the number of the trap (8 = double fault, etc)
- Consult an Intel x86 family manual to learn more about what these
- traps are. Here is a *portion* of those codes:
- If kv shows a taskGate
- use .tss on the part before the colon, then kv.
- Else if kv shows a trapframe
- use .trap on that value
- Else
- .trap on the appropriate frame will show where the trap was taken
- (on x86, this will be the ebp that goes with the procedure KiTrap)
- Endif
- kb will then show the corrected stack.
- Arguments:
- Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
- Arg2: 80f4f000
- Arg3: 00000000
- Arg4: 00000000
- Debugging Details:
- ------------------
- ReadVirtual: 14100000 not properly sign extended
- *************************************************************************
- *** ***
- *** ***
- *** Either you specified an unqualified symbol, or your debugger ***
- *** doesn't have full symbol information. Unqualified symbol ***
- *** resolution is turned off by default. Please either specify a ***
- *** fully qualified symbol module!symbolname, or enable resolution ***
- *** of unqualified symbols by typing ".symopt- 100". Note that ***
- *** enabling unqualified symbol resolution with network symbol ***
- *** server shares in the symbol path may cause the debugger to ***
- *** appear to hang for long periods of time when an incorrect ***
- *** symbol name is typed or the network symbol server is down. ***
- *** ***
- *** For some commands to work properly, your symbol path ***
- *** must point to .pdb files that have full type information. ***
- *** ***
- *** Certain .pdb files (such as the public OS symbols) do not ***
- *** contain the required information. Contact the group that ***
- *** provided you with these symbols if you need this command to ***
- *** work. ***
- *** ***
- *** Type referenced: kernelbase!gpServerNlsUserInfo ***
- *** ***
- *************************************************************************
- KEY_VALUES_STRING: 1
- Key : Analysis.CPU.mSec
- Value: 5405
- Key : Analysis.DebugAnalysisManager
- Value: Create
- Key : Analysis.Elapsed.mSec
- Value: 16078
- Key : Analysis.Init.CPU.mSec
- Value: 21530
- Key : Analysis.Init.Elapsed.mSec
- Value: 81099
- Key : Analysis.Memory.CommitPeak.Mb
- Value: 78
- Key : WER.OS.Branch
- Value: rs1_release
- Key : WER.OS.Timestamp
- Value: 2016-07-15T16:16:00Z
- Key : WER.OS.Version
- Value: 10.0.14393.0
- BUGCHECK_CODE: 7f
- BUGCHECK_P1: 8
- BUGCHECK_P2: ffffffff80f4f000
- BUGCHECK_P3: 0
- BUGCHECK_P4: 0
- TSS: 00000028 -- (.tss 0x28)
- eax=0006b090 ebx=81a41200 ecx=81a41200 edx=00000010 esi=88a8e1c0 edi=00000010
- eip=818c4c59 esp=a5dd9000 ebp=a5dd9008 iopl=0 nv up ei pl nz na po cy
- cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010203
- nt!MiDecreaseAvailablePages+0x9:
- 818c4c59 57 push edi
- Resetting default scope
- PROCESS_NAME: csrss.exe
- DEVICE_OBJECT: 899fda60
- DRIVER_OBJECT: 00000000
- TRAP_FRAME: a5dd9478 -- (.trap 0xffffffffa5dd9478)
- ErrCode = 00000000
- eax=00000000 ebx=8a8b0000 ecx=bfffff00 edx=92a22100 esi=c0019a08 edi=00000000
- eip=8187663b esp=a5dd94ec ebp=a5dd9524 iopl=0 nv up ei pl zr na pe nc
- cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
- nt!MiQueryAddressState+0x1ab:
- 8187663b 8b0b mov ecx,dword ptr [ebx] ds:0023:8a8b0000=????????
- Resetting default scope
- STACK_TEXT:
- a5dd9008 818c4ba9 ffffffff 00000000 87c2e680 nt!MiDecreaseAvailablePages+0x9
- a5dd9030 818c4a1b 00000001 00000000 81a41200 nt!MiUnlinkNodeLargePageHelper+0xb9
- a5dd9060 8187d822 02000000 00000000 87c2e680 nt!MiUnlinkNodeLargePage+0x11b
- a5dd90e4 8187d2a5 00000002 00000000 00000001 nt!MiGetFreeOrZeroPage+0x452
- a5dd9124 8187cfd0 00000002 00454580 a5dd9244 nt!MiGetPage+0x55
- a5dd91d4 8187b3b3 00000000 00000004 00000042 nt!MiGetPageChain+0x130
- a5dd9230 8187a1bf 8a8b0000 a5dd9478 a5dd92f8 nt!MiResolvePrivateZeroFault+0x93
- a5dd9294 818bb043 00000000 81a40090 a5dd9478 nt!MiResolveDemandZeroFault+0x15f
- a5dd92f8 8187794a a5dd9478 8a8b0000 a5dd9360 nt!MiSystemFault+0x8d3
- a5dd93d8 819558cc 00000000 8a8b0000 00000000 nt!MmAccessFault+0x78a
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a5dd93d8 8187663b (T) 00000000 8a8b0000 00000000 nt!KiTrap0E+0xec
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a5dd9524 818763ec (T) 92a22100 c0019a00 00002000 nt!MiQueryAddressState+0x1ab
- a5dd9570 81ad8f03 03340000 9424b4b8 a5dd967c nt!MiQueryAddressSpan+0x9c
- a5dd9640 81ad8a96 00000000 a5dd9790 0000001c nt!MmQueryVirtualMemory+0x463
- a5dd965c 819522c7 80000d24 03140000 00000000 nt!NtQueryVirtualMemory+0x1e
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a5dd965c 81942665 (T) 80000d24 03140000 00000000 nt!KiSystemServicePostCall
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a5dd96ec bf444f6e (T) 80000d24 03140000 00000000 nt!ZwQueryVirtualMemory+0x11
- WARNING: Stack unwind information not available. Following frames may be wrong.
- a5dd97f8 bf444a0a b6337440 80000d24 92a22100 TargetSys32+0x34f6e
- a5dd9908 bf444a0a b6337440 80000d24 9a54b008 TargetSys32+0x34a0a
- a5dd9a18 bf444a0a b6337440 80000d24 90ad62a0 TargetSys32+0x34a0a
- a5dd9b28 bf444b9b b6337440 80000d24 92a295c0 TargetSys32+0x34a0a
- a5dd9c38 bf444b9b b6337440 80000d24 912de488 TargetSys32+0x34b9b
- a5dd9d48 bf444a0a b6337440 80000d24 a4661a58 TargetSys32+0x34b9b
- a5dd9e58 bf445bdd b6337440 80000d24 8ef9ce4c TargetSys32+0x34a0a
- a5dda12c bf445fd2 b6337440 8ef9cbc0 a5ddba4c TargetSys32+0x35bdd
- a5dda13c bf414fe9 b6337440 8ef9cbc0 0022010e TargetSys32+0x35fd2
- a5ddba4c bf4163d9 abd30040 00000004 00000000 TargetSys32+0x4fe9
- a5ddba80 818864a3 8fefc830 a47bbd80 00220108 TargetSys32+0x63d9
- a5ddba9c 81ac7d83 a47bbe14 a47bbd80 0022e208 nt!IofCallDriver+0x43
- a5ddbaf0 81ac77f0 899fda60 00000000 81b34f01 nt!IopSynchronousServiceTail+0x133
- a5ddbbb8 81ac73fa 00000000 00000000 049bf570 nt!IopXxxControlFile+0x3e0
- a5ddbbe4 819522c7 0000040c 00000000 00000000 nt!NtDeviceIoControlFile+0x2a
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a5ddbbe4 770d38b0 (T) 0000040c 00000000 00000000 nt!KiSystemServicePostCall
- <Intermediate frames may have been skipped due to lack of complete unwind>
- 049bf590 00000000 (T) 00000000 00000000 00000000 ntdll!KiFastSystemCallRet
- STACK_COMMAND: .tss 0x28 ; kb
- SYMBOL_NAME: TargetSys32+34f6e
- MODULE_NAME: TargetSys32
- IMAGE_NAME: TargetSys32.sys
- BUCKET_ID_FUNC_OFFSET: 34f6e
- FAILURE_BUCKET_ID: 0x7f_8_TargetSys32!unknown_function
- OS_VERSION: 10.0.14393.0
- BUILDLAB_STR: rs1_release
- OSPLATFORM_TYPE: x86
- OSNAME: Windows 10
- FAILURE_ID_HASH: {00fcc8ef-033e-c4ad-af92-bd9b0e8be063}
- Followup: MachineOwner
- ---------
可以看到分析出的结论其实是UNEXPECTED_KERNEL_MODE_TRAP(后面还有更具体的原因,在Arg1中,为EXCEPTION_DOUBLE_FAULT),而导致dump的原因是下面这个调用:
- nt!MiQueryAddressState+0x1ab:
- 8187663b 8b0b mov ecx,dword ptr [ebx] ds:0023:8a8b0000=????????
然而问题是,上面的调用不应该是触发一个Page Fault么, 为什么现在系统提示是UNEXPECTED_KERNEL_MODE_TRAP或者是DOUBLE_FAULT?笔者一开始也没搞懂为什么,不过既然导致异常的语句是因为内存非法访问产生的, 那笔者一开始想的是将ZwQueryVirtualMemory的调用注释掉,看一下传入到ZwQueryVirtualMemory中查询的内存是哪块,然后再追踪有异常的内存是如何产生的。
于是笔者将产生问题的相关代码修改成了如下样子:
- StartAddress *= PAGE_SIZE;
- EndingAddress *= PAGE_SIZE;
- DbgPrint("Start: %X\r\n", StartAddress);
- /*
- Status = ZwQueryVirtualMemory(HandleProcess, (PVOID)BaseAddress, 0, \
- &MemoryBasicInformation, sizeof(MEMORY_BASIC_INFORMATION), &ReturnLength);
- */
DbgPrint引发的异常
然后,笔者得到了另外一个更加离谱的dump(为了方便说明问题,笔者对dump内容做了删减):
- UNEXPECTED_KERNEL_MODE_TRAP (7f)
- kb will then show the corrected stack.
- Arguments:
- Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
- Arg2: 80f4f000
- Arg3: 00000000
- Arg4: 00000000
- Debugging Details:
- TSS: 00000028 -- (.tss 0x28)
- eax=00000000 ebx=00000000 ecx=d2a71b40 edx=00000000 esi=a999b0d0 edi=a999a000
- eip=818771ce esp=a999affc ebp=a999b0b8 iopl=0 nv up ei ng nz ac pe nc
- cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010296
- nt!MmAccessFault+0xe:
- 818771ce 53 push ebx
- Resetting default scope
- PROCESS_NAME: App.exe
- DEVICE_OBJECT: b2360788
- TRAP_FRAME: a999b720 -- (.trap 0xffffffffa999b720)
- ErrCode = 00000000
- eax=00000001 ebx=00000065 ecx=a999b7d0 edx=0000000f esi=81a26c70 edi=00000003
- eip=81957d06 esp=a999b794 ebp=a999b7b0 iopl=0 nv up ei ng nz ac po nc
- cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000292
- nt!DebugService+0x24:
- 81957d06 5b pop ebx
- Resetting default scope
- STACK_TEXT:
- a999b0b8 819558cc 00000000 a999a000 00000000 nt!MmAccessFault+0xe
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a999b0b8 81946543 (T) 00000000 a999a000 00000000 nt!KiTrap0E+0xec
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a999b1fc 818a2adf (T) a999b720 00000000 a999b210 nt!_alloca_probe+0x27
- a999b628 81953085 a999b644 00000000 a999b720 nt!KiDispatchException+0xd5
- a999b694 819539e0 0000000f 00000000 00000000 nt!KiDispatchTrapException+0x51
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a999b694 81957d06 (T) 0000000f 00000000 00000000 nt!KiTrap03+0xf4
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a999b7b0 818bd37c (T) 0000000f 00000065 00000003 nt!DebugService+0x24
- a999b7c4 818bd305 00000003 72617453 33203a74 nt!DebugPrint+0x18
- a999b894 818bd1ff 00000003 d5869bb0 a999b8b8 nt!vDbgPrintExWithPrefixInternal+0xd9
- a999b8ac d5834aba d5869bb0 00300000 01000000 nt!DbgPrint+0x1d
- WARNING: Stack unwind information not available. Following frames may be wrong.
- a999b950 d583465c d7004460 80000c84 9a4c0220 TargetSys32+0x34aba
- a999ba08 d583465c d7004460 80000c84 8ff9b540 TargetSys32+0x3465c
这次蓝屏,直接蓝在了DbgPrint函数上!到这里笔者已经没有任何思路了,于是决定去请教公司的其他大佬,大佬看到dump截图后(对,都没跟我要dump文件)轻飘飘来了一句——栈爆了……
栈爆了
短短三个字,却犹如安静之处听惊雷——瞬间打开了笔者的思路,让我们先用上面DbgPrint的蓝屏快速来看一下。
DbgPrint引发的异常Dump分析
首先是当前栈的状态:
- kd> !thread
- THREAD d2a71b40 Cid 135c.0c9c Teb: 002d6000 Win32Thread: 89954530 RUNNING on processor 0
- IRP List:
- ae659490: (0006,0094) Flags: 00060030 Mdl: 00000000
- Not impersonating
- DeviceMap 96fd8d58
- Owning Process d70ae680 Image: App.exe
- Attached Process N/A Image: N/A
- Wait Start TickCount 18379 Ticks: 1 (0:00:00:00.015)
- Context Switch Count 1021 IdealProcessor: 0
- UserTime 00:00:00.000
- KernelTime 00:00:00.015
- Win32 Start Address ntdll!TppWorkerThread (0x7708b050)
- Stack Init a999dca0 Current a999bcfc Base a999e000 Limit a999b000 Call 00000000
- Priority 11 BasePriority 8 PriorityDecrement 2 IoPriority 2 PagePriority 5
- ChildEBP RetAddr Args to Child
- 81a2a694 819c50ee 00000003 b05b8410 00000065 nt!RtlpBreakWithStatusInstruction (FPO: [1,0,0])
- 81a2a6e8 819c4b3b 8b51f340 81a2ab04 00000000 nt!KiBugCheckDebugBreak+0x1f (FPO: [Non-Fpo])
- 81a2aad8 81943eda 0000007f 00000008 80f4f000 nt!KeBugCheck2+0x73a (FPO: [6,247,4])
- 81a2aafc 819546c8 0000007f 00000008 80f4f000 nt!KiBugCheck2+0xc6
- <Intermediate frames may have been skipped due to lack of complete unwind>
- 81a2aafc 818771ce (T) 0000007f 00000008 80f4f000 nt!KiTrap08+0x6e (FPO: TSS 28:0)
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a999b0b8 819558cc (T) 00000000 a999a000 00000000 nt!MmAccessFault+0xe (FPO: [4,47,4])
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a999b0b8 81946543 (T) 00000000 a999a000 00000000 nt!KiTrap0E+0xec (FPO: [0,0] TrapFrame @ a999b15c)
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a999b1fc 818a2adf (T) a999b720 00000000 a999b210 nt!_alloca_probe+0x27
- a999b628 81953085 a999b644 00000000 a999b720 nt!KiDispatchException+0xd5 (FPO: [Non-Fpo])
- a999b694 819539e0 0000000f 00000000 00000000 nt!KiDispatchTrapException+0x51
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a999b694 81957d06 (T) 0000000f 00000000 00000000 nt!KiTrap03+0xf4 (FPO: [0,0] TrapFrame @ a999b720)
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a999b7b0 818bd37c (T) 0000000f 00000065 00000003 nt!DebugService+0x24 (FPO: [Non-Fpo])
- a999b7c4 818bd305 00000003 72617453 33203a74 nt!DebugPrint+0x18 (FPO: [Non-Fpo])
- a999b894 818bd1ff 00000003 d5869bb0 a999b8b8 nt!vDbgPrintExWithPrefixInternal+0xd9 (FPO: [Non-Fpo])
- a999b8ac d5834aba d5869bb0 00300000 01000000 nt!DbgPrint+0x1d (FPO: [Non-Fpo])
- WARNING: Stack unwind information not available. Following frames may be wrong.
- a999b950 d583465c d7004460 80000c84 9a4c0220 TargetSys32+0x34aba
- a999ba08 d583465c d7004460 80000c84 8ff9b540 TargetSys32+0x3465c
- a999bac0 d58347af d7004460 80000c84 8ee6ce68 TargetSys32+0x3465c
可见当前栈的Limit为a999b000,也就是说栈(esp的值)不可能比这个值更小。由栈回溯可知,DbgPrint这个dump导致蓝屏的最直接原因是KiTrap08抛出的DOUBLE FAULT异常,那么既然是DOUBLE那么肯定有FIRST,FIRST在哪里呢?答案是nt!KiTrap0E+0xec这里,我们看下导致第一次异常的现场是什么样子的:
- kd> .trap a999b15c
- ReadVirtual: a999b15c not properly sign extended
- ErrCode = 00000000
- eax=a999a000 ebx=a999b210 ecx=a999afbc edx=00000000 esi=a999b720 edi=00010037
- eip=81946543 esp=a999b1d0 ebp=a999b1fc iopl=0 nv up ei ng nz na pe nc
- cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010286
- nt!_alloca_probe+0x27:
- 81946543 8500 test dword ptr [eax],eax ds:0023:a999a000=????????
那么第二次异常是如何产生的呢?我们顺着上文中的nt!MmAccessFault调用找,有两种方法, 一种是借用TSS任务段选择子(TSS 28:0)定位, 或者直接对nt!MmAccessFault+0xe做反汇编,其结果是一样的:
- kd> .tss 28
- eax=00000000 ebx=00000000 ecx=d2a71b40 edx=00000000 esi=a999b0d0 edi=a999a000
- eip=818771ce esp=a999affc ebp=a999b0b8 iopl=0 nv up ei ng nz ac pe nc
- cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010296
- nt!MmAccessFault+0xe:
- 818771ce 53 push ebx
- kd> ub nt!MmAccessFault+0xe
- nt!MiObtainReferencedVad+0x4dd:
- 818771bd cc int 3
- 818771be cc int 3
- 818771bf cc int 3
- nt!MmAccessFault:
- 818771c0 8bff mov edi,edi
- 818771c2 55 push ebp
- 818771c3 8bec mov ebp,esp
- 818771c5 83e4f8 and esp,0FFFFFFF8h
- 818771c8 81ecbc000000 sub esp,0BCh
- kd> u nt!MmAccessFault+0xe
- nt!MmAccessFault+0xe:
- 818771ce 53 push ebx
- 818771cf 8b5d08 mov ebx,dword ptr [ebp+8]
- 818771d2 8bc3 mov eax,ebx
- 818771d4 83e009 and eax,9
- 818771d7 56 push esi
- 818771d8 57 push edi
- 818771d9 3c09 cmp al,9
可以看到tss 28和nt!MmAccessFault+0xe的地址是一致的,而且通过ub命令可知,此时MmAccessFault尝试开辟0xbc大小的栈帧:
- 818771c8 81ecbc000000 sub esp,0BCh
此时根据栈回溯可知,EBP为a999b0b8:
- 81a2aafc 819546c8 0000007f 00000008 80f4f000 nt!KiBugCheck2+0xc6
- <Intermediate frames may have been skipped due to lack of complete unwind>
- 81a2aafc 818771ce (T) 0000007f 00000008 80f4f000 nt!KiTrap08+0x6e (FPO: TSS 28:0)
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a999b0b8 819558cc (T) 00000000 a999a000 00000000 nt!MmAccessFault+0xe (FPO: [4,47,4])
- <Intermediate frames may have been skipped due to lack of complete unwind>
- a999b0b8 81946543 (T) 00000000 a999a000 00000000 nt!KiTrap0E+0xec (FPO: [0,0] TrapFrame @ a999b15c)
那么a999b0b8 - 0xbc = 0xa999affc——这个值已经小于上文中提到的Limit(0xa999b000)了——如果继续执行818771ce地址处的push ebx指令必然会产生第二次异常,这也是DOUBLE FAULT的真正由来。
ZwQueryVirtualMemory引发的异常Dump分析
我们依葫芦画瓢,先看下当前线程的Stack Limit是多少:
- kd> !thread
- THREAD 9a573b40 Cid 097c.1088 Teb: 00b7c000 Win32Thread: 8f2d9e80 RUNNING on processor 0
- IRP List:
- 92a28e38: (0006,0094) Flags: 00060030 Mdl: 00000000
- Not impersonating
- DeviceMap 8ba03168
- Owning Process b22eb040 Image: App.exe
- Attached Process 8ef9cbc0 Image: csrss.exe
- Wait Start TickCount 18358 Ticks: 1 (0:00:00:00.015)
- Context Switch Count 1077 IdealProcessor: 0
- UserTime 00:00:00.000
- KernelTime 00:00:00.031
- Win32 Start Address ntdll!TppWorkerThread (0x7708b050)
- Stack Init 90e5eca0 Current 90e5ccfc Base 90e5f000 Limit 90e5c000 Call 00000000
- Priority 11 BasePriority 8 PriorityDecrement 2 IoPriority 2 PagePriority 5
- ChildEBP RetAddr Args to Child
- 81a2a694 819c50ee 00000003 b05b8410 00000065 nt!RtlpBreakWithStatusInstruction (FPO: [1,0,0])
- 81a2a6e8 819c4b3b 8b51f340 81a2ab04 00000000 nt!KiBugCheckDebugBreak+0x1f (FPO: [Non-Fpo])
- 81a2aad8 81943eda 0000007f 00000008 80f4f000 nt!KeBugCheck2+0x73a (FPO: [6,247,4])
- 81a2aafc 819546c8 0000007f 00000008 80f4f000 nt!KiBugCheck2+0xc6
- <Intermediate frames may have been skipped due to lack of complete unwind>
- 81a2aafc 818c4c59 (T) 0000007f 00000008 80f4f000 nt!KiTrap08+0x6e (FPO: TSS 28:0)
- <Intermediate frames may have been skipped due to lack of complete unwind>
- 90e5c008 818c4ba9 (T) ffffffff 00000000 87fc7e40 nt!MiDecreaseAvailablePages+0x9 (FPO: [Non-Fpo])
- 90e5c030 818c4a1b 00000001 00000000 81a41200 nt!MiUnlinkNodeLargePageHelper+0xb9 (FPO: [Non-Fpo])
- 90e5c060 8187d822 02000000 00000000 87fc7e40 nt!MiUnlinkNodeLargePage+0x11b (FPO: [Non-Fpo])
- 90e5c0e4 8187d2a5 00000002 00000000 0000000f nt!MiGetFreeOrZeroPage+0x452 (FPO: [1,24,0])
- 90e5c124 8187cfd0 00000002 00454580 90e5c244 nt!MiGetPage+0x55 (FPO: [Non-Fpo])
- 90e5c1d4 8187b3b3 00000000 00000004 00000042 nt!MiGetPageChain+0x130 (FPO: [Non-Fpo])
- 90e5c230 8187a1bf 8a8b0000 90e5c478 90e5c2f8 nt!MiResolvePrivateZeroFault+0x93 (FPO: [Non-Fpo])
- 90e5c294 818bb043 00000000 81a40090 90e5c478 nt!MiResolveDemandZeroFault+0x15f (FPO: [Non-Fpo])
- 90e5c2f8 8187794a 90e5c478 8a8b0000 90e5c360 nt!MiSystemFault+0x8d3 (FPO: [Non-Fpo])
- 90e5c3d8 819558cc 00000000 8a8b0000 00000000 nt!MmAccessFault+0x78a (FPO: [4,47,4])
- <Intermediate frames may have been skipped due to lack of complete unwind>
- 90e5c3d8 8187663b (T) 00000000 8a8b0000 00000000 nt!KiTrap0E+0xec (FPO: [0,0] TrapFrame @ 90e5c478)
- <Intermediate frames may have been skipped due to lack of complete unwind>
- 90e5c524 818763ec (T) 92a22100 c0019a00 00002000 nt!MiQueryAddressState+0x1ab (FPO: [Non-Fpo])
- 90e5c570 81ad8f03 03340000 a11ce4b8 90e5c67c nt!MiQueryAddressSpan+0x9c (FPO: [Non-Fpo])
- 90e5c640 81ad8a96 00000000 90e5c790 0000001c nt!MmQueryVirtualMemory+0x463 (FPO: [Non-Fpo])
- 90e5c65c 819522c7 80000c94 03140000 00000000 nt!NtQueryVirtualMemory+0x1e (FPO: [Non-Fpo])
- <Intermediate frames may have been skipped due to lack of complete unwind>
- 90e5c65c 81942665 (T) 80000c94 03140000 00000000 nt!KiSystemServicePostCall (FPO: [0,3] TrapFrame @ 90e5c67c)
- <Intermediate frames may have been skipped due to lack of complete unwind>
- 90e5c6ec c5e34f6e (T) 80000c94 03140000 00000000 nt!ZwQueryVirtualMemory+0x11 (FPO: [6,0,0])
- WARNING: Stack unwind information not available. Following frames may be wrong.
- 90e5c7f8 c5e34a0a 91200308 80000c94 92a22100 TargetSys32+0x34f6e
- 90e5c908 c5e34a0a 91200308 80000c94 9a54b008 TargetSys32+0x34a0a
- 90e5ca18 c5e34a0a 91200308 80000c94 9a569b18 TargetSys32+0x34a0a
- 90e5cb28 c5e34b9b 91200308 80000c94 92a295c0 TargetSys32+0x34a0a
Limit为90e5c000,先记下来,然后分析堆栈状态,第一次异常:
- kd> .trap 90e5c478
- ErrCode = 00000000
- eax=00000000 ebx=8a8b0000 ecx=bfffff00 edx=92a22100 esi=c0019a08 edi=00000000
- eip=8187663b esp=90e5c4ec ebp=90e5c524 iopl=0 nv up ei pl zr na pe nc
- cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
- nt!MiQueryAddressState+0x1ab:
- 8187663b 8b0b mov ecx,dword ptr [ebx] ds:0023:8a8b0000=????????
可见的确是在访问一块非法内存,通过后续的堆栈可以看到系统在进行一些列内存操作后调用到了MiDecreaseAvailablePages+0x9,看下此处的反汇编代码:
- kd> .frame /r 0
- 00 90e5c008 818c4ba9 nt!MiDecreaseAvailablePages+0x9
- eax=0005d6e0 ebx=81a41200 ecx=81a41200 edx=00000010 esi=88a8e1c0 edi=00000010
- eip=818c4c59 esp=90e5c000 ebp=90e5c008 iopl=0 nv up ei pl nz na po cy
- cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010203
- nt!MiDecreaseAvailablePages+0x9:
- 818c4c59 57 push edi
- kd> .tss 28
- eax=0005d6e0 ebx=81a41200 ecx=81a41200 edx=00000010 esi=88a8e1c0 edi=00000010
- eip=818c4c59 esp=90e5c000 ebp=90e5c008 iopl=0 nv up ei pl nz na po cy
- cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010203
- nt!MiDecreaseAvailablePages+0x9:
- 818c4c59 57 push edi
可见tss 28和栈顶是同一现场,注意此时esp的值为90e5c000,正好为Limit的下限,此时如果再次push的话,会导致esp-4,会超出上面的Limit范围因此引发第二次异常。
总结
本文分析了两例非常规dump,这两个dump的特殊之处在于其第一现场并不能直接反映出dump原因,真正原因需要进一步分析才能得到。在分析的过程中,我们需要综合运用操作系统的知识才能找到问题的根本。dump产生的原因很蠢,发出来博大家一笑~