
Multithreading is a technique we often use in daily development. The general purpose of using multithreading is to speed up certain operations to provide a better user experience. Recently, the author encountered a problem related to multithreading. The original intention was to speed up some functions, but the final effect was the same as single-threading. After investigation, the author finally located the cause and solved the problem, and this article was written.
Background
The author’s department is currently developing a program, one part of which needs to interact with the kernel for data. User interface of the program is shown in the figure below:

For a better user experience, the program’s design logic is: when clicking on each sub-tab under the “Kernel” tab, each sub-tab will open a thread to call the DeviceIoControl function and interact with the kernel module. In this way, the data loading of each sub-tab is independent and does not affect each other, which will speed up the data acquisition to a certain extent.
Based on the above design, the abstracted coding is as follows (represented in pseudocode):
- typedef struct _TAB_PARAMETER {
- HANDLE DeviceHandle;
- ULONG_PTR IoControlCode;
- } TAB_PARAMETER, *PTAB_PARAMETER;
- VOID subtab_woker_thread(PTAB_PARAMETER Param)
- {
- UCHAR Buffer[256] = {0};
- ULONG_PTR Length = 0;
- DeviceIoControl(Param->DeviceHandle, Param->IoControlCode, \
- NULL, 0, &Buffer, 256, &Length, NULL);
- // According to the Tab label processing, it enters different business processing logic, which is not important, so it is omitted.
- switch (Param->IoControlCode)
- {
- case func1:
- ....
- break;
- ....
- }
- }
- void main()
- {
- HANDLE DeviceHandle = 0;
- ULONG_PTR Message = 0;
- LPCSTR deviceStr = "\\\\.\\KernelModule";
- DeviceHandle = CreateFile( deviceStr, GENERIC_READ | GENERIC_WRITE, \
- FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, \
- FILE_ATTRIBUTE_NORMAL, NULL);
- while (GetMessage(&Message, NULL, 0, 0) > 0) {
- TAB_PARAMETER TabParameters = {0};
- TabParameters.DeviceHandle = DeviceHandle;
- TabParameters.IoControlCode = <Fill different IOCTLs according to different messages>;
- begin_thread(subtab_woker_thread, 0, &TabParameters);
- }
- CloseHandle(DeviceHandle);
- }
If you have seen this far and have already figured out where the problem is, then you don’t need to read on, because all the necessary information is already included in the pseudocode above.
Problem phenomenon
OK, with the above background laid out, let’s describe the problem phenomenon.
The current kernel module processes the functions of each sub-tab at different speeds. Let’s assume that the “Object” tab is very slow and takes 60 seconds to complete, while the “Unload Module” tab is very fast and can be completed within 5 seconds.
If at this point, we first click on the “Object” tab, then its corresponding subtab_worker_thread will be started to request data. At this point, we click on the “Unload Module” tab again, starting its corresponding worker thread to fetch data from the kernel.
Let’s stop here for a moment. Please think about when we will see the data of the “Unload Module”? Our expected answer is 5 seconds, but the correct answer should be 65 seconds.
The above answer is also the actual behavior of the program. After clicking on a time-consuming sub-tab and then clicking on a sub-tab with a small time consumption, the sub-tab with a small time consumption also needs to wait a long time before it can display data.
Problem analysis
From the results, although the program design and coding are all multithreaded working at the same time, its actual effect is still single-threaded. What caused this result? Next, we will analyze the cause through the debugger.
The author uses a dual-machine debugging environment, the debugger is running in kernel mode, and the debugged machine is running our analysis object this time: QDoctor.exe.
First, get the process list on the debug machine to get QDoctor’s EPROCESS:
- 0: kd> !process 0 0
- ......
- PROCESS ffffc58fbbbb5200
- SessionId: 1 Cid: 1478 Peb: 007fb000 ParentCid: 0ba4
- DirBase: a80cd000 ObjectTable: ffffdc810a140d00 HandleCount: <Data Not Accessible>
- Image: QDoctor.exe
- ......
Let’s take a look at the status of each thread in QDoctor:
- 0: kd> !process ffffc58fbbbb5200 2
- PROCESS ffffc58fbbbb5200
- SessionId: 1 Cid: 1478 Peb: 007fb000 ParentCid: 0ba4
- DirBase: a80cd000 ObjectTable: ffffdc810a140d00 HandleCount: <Data Not Accessible>
- Image: QDoctor.exe
- THREAD ffffc58fbbbfe080 Cid 1478.0d00 Teb: 00000000007fd000 Win32Thread: ffffc58fbe424b20 WAIT: (WrResource) KernelMode Non-Alertable
- ffffc58fbea8e8c0 SynchronizationEvent
- THREAD ffffc58fbbf5c080 Cid 1478.05cc Teb: 0000000000600000 Win32Thread: 0000000000000000 WAIT: (WrQueue) UserMode Alertable
- ffffc58fbbbca700 QueueObject
- THREAD ffffc58fbbf51080 Cid 1478.095c Teb: 0000000000603000 Win32Thread: 0000000000000000 WAIT: (WrQueue) UserMode Alertable
- ffffc58fbbbca700 QueueObject
- THREAD ffffc58fbba9a080 Cid 1478.03d4 Teb: 0000000000606000 Win32Thread: 0000000000000000 WAIT: (WrQueue) UserMode Alertable
- ffffc58fbbbca700 QueueObject
- THREAD ffffc58fbbefe7c0 Cid 1478.03ec Teb: 0000000000609000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Alertable
- ffffc58fbc07cf70 SynchronizationEvent
- THREAD ffffc58fbbefd080 Cid 1478.0460 Teb: 000000000060c000 Win32Thread: 0000000000000000 RUNNING on processor 2
- THREAD ffffc58fbbefb080 Cid 1478.0e1c Teb: 000000000060f000 Win32Thread: ffffc58fbfd4d7f0 WAIT: (Executive) KernelMode Alertable
- ffffc58fbc07cf70 SynchronizationEvent
- THREAD ffffc58fbbefa080 Cid 1478.118c Teb: 0000000000612000 Win32Thread: ffffc58fbb2a2600 WAIT: (Executive) KernelMode Alertable
- ffffc58fbc07cf70 SynchronizationEvent
- THREAD ffffc58fbc062080 Cid 1478.1340 Teb: 0000000000615000 Win32Thread: ffffc58fbec49830 WAIT: (WrUserRequest) UserMode Non-Alertable
- ffffc58fbc048bf0 SynchronizationEvent
- THREAD ffffc58fc029b040 Cid 1478.0ee8 Teb: 0000000000618000 Win32Thread: 0000000000000000 WAIT: (WrQueue) UserMode Alertable
- ffffc58fbbbd6a40 QueueObject
- THREAD ffffc58fc029a080 Cid 1478.0dd4 Teb: 000000000061b000 Win32Thread: 0000000000000000 WAIT: (WrQueue) UserMode Alertable
- ffffc58fbbbd6a40 QueueObject
- THREAD ffffc58fc0299080 Cid 1478.0bb4 Teb: 000000000061e000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
- ffffc58fbb954740 SynchronizationTimer
- THREAD ffffc58fc1693080 Cid 1478.19d8 Teb: 0000000000624000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Alertable
- ffffc58fbc07cf70 SynchronizationEvent
- THREAD ffffc58fbb7c5080 Cid 1478.19dc Teb: 0000000000627000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Alertable
- ffffc58fbc07cf70 SynchronizationEvent
- THREAD ffffc58fbf792640 Cid 1478.19e0 Teb: 000000000062a000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Alertable
- ffffc58fbc07cf70 SynchronizationEvent
- THREAD ffffc58fc173e080 Cid 1478.19e8 Teb: 000000000062d000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Alertable
- ffffc58fbc07cf70 SynchronizationEvent
As you can see, only the 0460 thread is in the RUNNING state, and this thread is located on core 2. Let’s take a look at the stack status of this thread:
- 0: kd> !thread ffffc58fbbefd080
- THREAD ffffc58fbbefd080 Cid 1478.0460 Teb: 000000000060c000 Win32Thread: 0000000000000000 RUNNING on processor 2
- IRP List:
- ffffc58fbf969ae0: (0006,0118) Flags: 00060070 Mdl: 00000000
- Not impersonating
- DeviceMap ffffdc8101436b60
- Owning Process ffffc58fbbbb5200 Image: QDoctor.exe
- Attached Process N/A Image: N/A
- Wait Start TickCount 14901 Ticks: 1 (0:00:00:00.015)
- Context Switch Count 1566 IdealProcessor: 1
- UserTime 00:00:00.000
- KernelTime 00:00:01.640
- Win32 Start Address 0x0000000077456020
- Stack Init ffffae8152e30c90 Current ffffae8152e2f640
- Base ffffae8152e31000 Limit ffffae8152e2b000 Call 0000000000000000
- Priority 10 BasePriority 8 PriorityDecrement 2 IoPriority 2 PagePriority 5
- Child-SP RetAddr : Args to Child : Call Site
- ffffae81`52e2f950 00000000`00000002 : 00000000`00000001 ffffae81`52e2fd60 ffffc58f`bc07cef0 00000000`00000000 : constantine64+0xc231de
- ffffae81`52e2fa48 00000000`00000001 : ffffae81`52e2fd60 ffffc58f`bc07cef0 00000000`00000000 fffff800`0cb0ca80 : 0x2
- ffffae81`52e2fa50 ffffae81`52e2fd60 : ffffc58f`bc07cef0 00000000`00000000 fffff800`0cb0ca80 fffff805`e3d30182 : 0x1
- ffffae81`52e2fa58 ffffc58f`bc07cef0 : 00000000`00000000 fffff800`0cb0ca80 fffff805`e3d30182 ffffae81`52e2fb70 : 0xffffae81`52e2fd60
- ffffae81`52e2fa60 00000000`00000000 : fffff800`0cb0ca80 fffff805`e3d30182 ffffae81`52e2fb70 00000000`00000002 : 0xffffc58f`bc07cef0
From the 18th line above, it can be seen that this thread is currently executing the instruction at constantine64+0xc231de, and the constantine64 module is our kernel module. In other words, only the 0460 thread is really working, while the other “working” threads are slacking off, such as the following thread:
- THREAD ffffc58fbbefe7c0 Cid 1478.03ec Teb: 0000000000609000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Alertable
- ffffc58fbc07cf70 SynchronizationEvent
Let’s see what the above thread is doing:
- 0: kd> !thread ffffc58fbbefe7c0
- THREAD ffffc58fbbefe7c0 Cid 1478.03ec Teb: 0000000000609000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Alertable
- ffffc58fbc07cf70 SynchronizationEvent
- Not impersonating
- DeviceMap ffffdc8101436b60
- Owning Process ffffc58fbbbb5200 Image: QDoctor.exe
- Attached Process N/A Image: N/A
- Wait Start TickCount 13091 Ticks: 1811 (0:00:00:28.296)
- Context Switch Count 551 IdealProcessor: 0
- UserTime 00:00:00.000
- KernelTime 00:00:00.281
- Win32 Start Address 0x0000000077456020
- Stack Init ffffae8152e29c90 Current ffffae8152e294e0
- Base ffffae8152e2a000 Limit ffffae8152e24000 Call 0000000000000000
- Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
- Child-SP RetAddr : Args to Child : Call Site
- ffffae81`52e29520 fffff800`0c841cdc : ffffc400`0003b338 80000000`00000000 ffffc400`0003b338 fffff800`0c87c7c6 : nt!KiSwapContext+0x76
- ffffae81`52e29660 fffff800`0c84177f : ffffae81`52e297a0 fffff800`0c95b72a ffffc58f`bbbec500 00000000`00000000 : nt!KiSwapThread+0x17c
- ffffae81`52e29710 fffff800`0c843547 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x14f
- ffffae81`52e297b0 fffff800`0c8bcd03 : ffffc58f`bc07cf70 00000000`00000000 00007fff`ffff0000 00000000`00000000 : nt!KeWaitForSingleObject+0x377
- ffffae81`52e29860 fffff800`0cc7a07d : ffffc58f`bc07cef0 ffffae81`52e2994b ffffc58f`746c6644 ffffae81`52e29940 : nt!IopWaitForLockAlertable+0x43
- ffffae81`52e298a0 fffff800`0cc07e38 : ffffc58f`bc07cef0 ffffae81`52e29b80 00000000`0022e180 fffff800`0c87f4f2 : nt!IopAcquireFileObjectLock+0x59
- ffffae81`52e298e0 fffff800`0cc07286 : ffffc58f`bbefe7c0 00000000`00000000 00000000`00000000 00000000`00000000 : nt!IopXxxControlFile+0xba8
- ffffae81`52e29a20 fffff800`0c95cc93 : ffffc462`000001d8 ffffc462`31000000 ffffc462`31188000 ffff9b7f`65f8e7e6 : nt!NtDeviceIoControlFile+0x56
- ffffae81`52e29a90 00000000`5947222c : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffae81`52e29b00)
- 00000000`0329f138 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x5947222c
From IdealProcessor, we know that this thread is running on core 1 (counting from 0). Logically, it should not conflict with the thread ffffc58fbbefd080 located on core 2. However, after this thread called nt!NtDeviceIoControlFile, it entered nt!KeWaitForSingleObject, and this wait eventually led to a thread switch (nt!KiSwapThread) by the kernel.
From the stack trace, the reason why the thread ffffc58fbbefe7c0 enters the wait state is that nt!IopAcquireFileObjectLock tries to acquire a lock on a file object, but in reality, the thread did not get this lock.
Let’s stop here again. If you have thought of the cause of the problem at this point, it means that your basics are quite solid~ ;)
So, what is the lock that nt!IopAcquireFileObjectLock is trying to acquire? According to the reverse engineering of nt!IopXxxControlFile, this lock comes from the first parameter of nt!IopAcquireFileObjectLock, which is of type _FILE_OBJECT pointer:
- __int64 __fastcall IopXxxControlFile(......)
- {
- ......
- v19 = (_FILE_OBJECT *)Object;
- v51 = IopAcquireFileObjectLock(Object, v15, v50, v61);
- ......
- }
- __int64 __fastcall IopAcquireFileObjectLock(_FILE_OBJECT *Object, __int64 a2, __int64 a3, _BYTE *a4)
- {
- ....
- do
- {
- ....
- v7 = IopWaitForLockAlertable(&Object->Lock);
- ....
- }
- ....
- }
- NTSTATUS __fastcall IopWaitForLockAlertable(PVOID Object, KPROCESSOR_MODE a2, char a3)
- {
- ....
- do
- {
- ....
- result = KeWaitForSingleObject(Object, Executive, v7, v6, 0i64);
- }
- while (....);
- ....
- }
So, let’s take a look at what this _FILE_OBJECT is. According to the call stack, the first parameter is: ffffc58f`bc07cef0. Let’s observe it with windbg:
- 0: kd> !object ffffc58f`bc07cef0
- Object: ffffc58fbc07cef0 Type: (ffffc58fbb0ccf20) File
- ObjectHeader: ffffc58fbc07cec0 (new version)
- HandleCount: 1 PointerCount: 32760
- 0: kd> dt _FILE_OBJECT ffffc58f`bc07cef0
- ntdll!_FILE_OBJECT
- +0x000 Type : 0n5
- +0x002 Size : 0n216
- +0x008 DeviceObject : 0xffffc58f`bc4c9060 _DEVICE_OBJECT
- +0x010 Vpb : (null)
- +0x018 FsContext : (null)
- +0x020 FsContext2 : (null)
- +0x028 SectionObjectPointer : (null)
- +0x030 PrivateCacheMap : (null)
- +0x038 FinalStatus : 0n0
- +0x040 RelatedFileObject : (null)
- +0x048 LockOperation : 0 ''
- +0x049 DeletePending : 0 ''
- +0x04a ReadAccess : 0 ''
- +0x04b WriteAccess : 0 ''
- +0x04c DeleteAccess : 0 ''
- +0x04d SharedRead : 0 ''
- +0x04e SharedWrite : 0 ''
- +0x04f SharedDelete : 0 ''
- +0x050 Flags : 0x40002
- +0x058 FileName : _UNICODE_STRING ""
- +0x068 CurrentByteOffset : _LARGE_INTEGER 0x0
- +0x070 Waiters : 7
- +0x074 Busy : 1
- +0x078 LastLock : (null)
- +0x080 Lock : _KEVENT
- +0x098 Event : _KEVENT
- +0x0b0 CompletionContext : (null)
- +0x0b8 IrpListLock : 0
- +0x0c0 IrpList : _LIST_ENTRY [ 0xffffc58f`bc07cfb0 - 0xffffc58f`bc07cfb0 ]
- +0x0d0 FileObjectExtension : (null)
- 0: kd> dx -id 0,0,ffffc58fbeb24780 -r1 ((ntdll!_DEVICE_OBJECT *)0xffffc58fbc4c9060)
- ((ntdll!_DEVICE_OBJECT *)0xffffc58fbc4c9060) : 0xffffc58fbc4c9060 : Device for "\FileSystem\QAXANTIROOTKIT" [Type: _DEVICE_OBJECT *]
- [<Raw View>] [Type: _DEVICE_OBJECT]
- Flags : 0x40
- UpperDevices : None
- LowerDevices : None
- Driver : 0xffffc58fbb346950 : Driver "\FileSystem\QAXANTIROOTKIT" [Type: _DRIVER_OBJECT *]
A device object ffffc58f`bc4c9060 was found, and this device object is \FileSystem\QAXANTIROOTKIT, which is the service created by the constantine64 driver mentioned in our previous text.
Analysis conclusion
Combining the pseudocode of the program workflow given at the beginning of this article, we can see that the current workflow is:
- The entry function opens the service created by constantine64 and obtains its handle.
- Start the sub-tab work thread and pass in the handle opened in 1.
- The sub-tab work thread calls DeviceIoControl to request the corresponding function of constantine64.
- The kernel routine of DeviceIoControl calls nt!IopAcquireFileObjectLock to try to get the lock of the file object corresponding to the handle in 1.
- Since the handle is created by the entry function and is a shared resource, other RUNNING state threads already hold this device, so the acquisition of the file object lock fails.
- Call nt!KeWaitForSingleObject to wait for the lock to be unlocked.
- The thread enters the Alertable WAIT state.
- The system performs thread switching.
Therefore, only after the RUNNING state thread in step 5 releases the occupied lock, other work threads can continue to work. Going back to the scenario we assumed at the beginning, that is, only after the “Object” work thread completes the 60s work, the “Unload Module” work thread can get this lock, and then finish the work in 5 seconds. This is also the origin of the correct answer of 65 seconds in the previous text.
Although the process is designed multi-threaded, due to improper use of critical resources, it ultimately works linearly in a single thread. In fact, Microsoft’s documentation has already told us. According to the API documentation of CreateFile, the description of its dwFlagsAndAttributes parameter is Reference 1:

Remember how we created the device handle at the beginning?
- DeviceHandle = CreateFile( deviceStr, GENERIC_READ | GENERIC_WRITE, \
- FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, \
- FILE_ATTRIBUTE_NORMAL, NULL);
We did not specify OVERLAPPED, so all requests for this handle will be synchronous IO, which is why the kernel needs to acquire a lock.
Problem solution
Through the analysis above, we naturally think of using OVERLAPPED to implement asynchronous IO. But there is a cost to this - it means that both the R3 program and the R0 program need to make corresponding modifications, and the R0 program also needs to start additional kernel threads to handle asynchronous requests, which will double the number of background threads corresponding to a function.
So, since our R3 program has already opened a sub-thread to handle the current sub-tab request, can the R0 program re-use this sub-thread to complete the corresponding kernel part of the work? The answer is of course yes, and the corresponding modification is very simple, the rewritten program work code (pseudo) is as follows:
- VOID subtab_woker_thread(ULONG_PTR *IoControlCode)
- {
- UCHAR Buffer[256] = {0};
- ULONG_PTR Length = 0;
- HANDLE DeviceHandle = 0;
- LPCSTR deviceStr = "\\\\.\\KernelModule";
- DeviceHandle = CreateFile( deviceStr, GENERIC_READ | GENERIC_WRITE, \
- FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, \
- FILE_ATTRIBUTE_NORMAL, NULL);
- DeviceIoControl(DeviceHandle, *IoControlCode, \
- NULL, 0, &Buffer, 256, &Length, NULL);
- // According to the Tab label processing, it enters different business processing logic, which is not important, so it is omitted.
- switch (Param->IoControlCode)
- {
- case func1:
- ....
- break;
- ....
- }
- CloseHandle(DeviceHandle);
- }
- void main()
- {
- ULONG_PTR Message = 0;
- while (GetMessage(&Message, NULL, 0, 0) > 0) {
- begin_thread(subtab_woker_thread, 0, <Fill different IOCTLs according to different messages>);
- }
- }
As you can see, we only need to simply and brutally move the code that creates the device object handle into the work thread to make it a local variable.
Here we need to stop again. I don’t know if you have the same question as I do when you see this. The keyword of this question is: reference count.
Since all our sub-threads are created from the \\.\KernelModule handle, shouldn’t the kernel level keep increasing the reference count of the \\.\KernelModule file object? In this case, since we still do not use the OVERLAPPED parameter in the current code, and the file objects created are all the same, it should still be pseudo-multithreading.
With this question in mind, we need to dig deeper into the working mode of the kernel.
Brief analysis of the CreateFile function
The CreateFile function should be one of the functions we use most frequently in our daily programming, so it is necessary to understand it deeply. Through the study of WRK, we can draw the following kernel call path:

As you can see, the OpbLookupObjectName function will be called in the end. This function is quite complex, but its core is extremely simple: it finds the object with the corresponding name from the Object Directory, and then calls the object’s ParseProcedure function to generate the file object to be created. Since we are dealing with a Device type device in this article, we focus on the ParseProcedure function of the Device type object. The creation of the Device type object is located in the IoCreateObjectTypes function, where we can get the default ParseProcedure function value of the system.

From the above figure, we know that the default ParseProcedure function of the Device type object is IopParseDevice. Through the study of IopParseDevice in WRK, the focus is on the call to ObCreateObject.

The simplified logic of IopParseDevice is as follows:
- NTSTATUS IopParseDevice(......, IN OUT PVOID Context OPTIONAL, ......)
- {
- ....
- POPEN_PACKET op;
- op = Context;
- realFileObjectRequired = !(op->QueryOnly || op->DeleteOnly);
- if (realFileObjectRequired) {
- ......
- status = ObCreateObject( KernelMode,
- IoFileObjectType,
- &objectAttributes,
- AccessMode,
- (PVOID) NULL,
- fileObjectSize,
- 0,
- 0,
- (PVOID *) &fileObject );
- ......
- }
- ......
- }
As you can see, when both op->QueryOnly and op->DeleteOnly are FALSE, realFileObjectRequired is TRUE, which means a corresponding file object needs to be created. According to the definition of OPEN_PACKET:

Combined with the rewritten program logic, our current operation to open an object is definitely not a query or delete operation, so it will definitely go to the logic of ObCreateObject. That is, when the target of CreateFile is a device (or a soft link to a device), a file object creation operation will definitely be generated (except for queries and deletions).
Since the file objects are all newly created, the cognition at the beginning of this section - that the file objects it creates are all the same - is incorrect. In fact, each sub-thread corresponds to a newly created synchronous IO file object after calling CreateFile to create a handle. Since it is newly created, there will be no problem of not being able to get the lock at the beginning (because only the current sub-thread is using this file object). This is also the basis for the “pseudo-multithreading” solution in this article.
DuplicateHandle
In actual programming, there is another API related to handles, namely DuplicateHandle. By tracing the code of this function, it is found that from beginning to end, the ObCreateObject function is not called at all, but the ObReference and ObDereference family functions are used extensively in the process PspCidTable to increase and decrease the reference count of objects:

Therefore, we can conclude that the DuplicateHandle function will not lead to the creation of new objects. Its function is only to add a new item in PspCidTable and make the new item point to an object that already exists in the kernel.
Summary
Looking at the entire debugging and analysis process, the reason for the problem faced in this article is essentially due to the unreasonable use of critical resources. It’s just that the critical resources involved in this article are more subtle. Although there are related descriptions in Microsoft’s official documents, it is still easy to overlook in actual use and write code that does not meet expectations.
In fact, this article can be regarded as a principled analysis of synchronous IO and asynchronous IO Reference 2, hoping to help everyone.