Analysis a performance problem caused by QQ browser

This is an old article, published on various platforms by JD Security when the blogger was working at JD. Recently, while organizing the computer hard drive, I found a Word file lying in the corner, so I summarized it into the blog.

Background

Recently, data leakage incidents have been frequent, and the news of ransomware also occasionally occupies the headlines of the IT section of major information websites. Network security issues are becoming increasingly prominent, and dealing with these issues has become the most urgent task for security practitioners. Data leakage, ransomware, hacker intrusion… The earlier such problems are discovered, the smaller the losses they will cause. I believe that you have also encountered similar problems in your daily work, and you will inevitably deal with various logs in the process of dealing with these problems. How to generate a qualified log will make the work of security operations personnel twice the result with half the effort.

There are many log collection tools on the market, and the following table is a comparison of several tools we have researched:

名称	是否开源	是否免费	是否跨平台	收集能力	部署难度
osquery	是	是 Apache-2.0	是	中高	中高
audit	是	是 GPL & LGPL	否	中高	中
Windows系统日志	否	是不单独收费	否	低	容易
sysmon	否	是自己的协议	否	中	容易
qradare	否	否	是	高	未知
zabbix	是	是 GPLv2	是	高	中
nagios	是	根据版本	是	中	中低
ObserveIT	否	否	是	高	未知

After evaluating the deployment difficulty and collection capabilities, we used the monitoring tool produced by sysinternals in a certain project: sysmon.exe (hereinafter referred to as sysmon for convenience). However, after a period of use, feedback was received that sysmon has a high CPU and high memory usage situation: generally, sysmon will run a logical core of the cpu. For example, if the cpu is dual-core 4 threads, the occupancy of sysmon is 25%. And the memory occupied by sysmon continues to rise. This problem does not occur on all machines, but has a certain “randomness”.

The following picture is a screenshot of the cpu and memory usage observed in ProcessExplorer when sysmon has the above problems (the pictures in this article are screenshots in the debugging environment as this article is written, so you can see that the sysmon process PID in the picture is inconsistent).

Figure 1. High cpu and memory usage of sysmon

Problem positioning and analysis

Analysis ideas

For the problem of high CPU usage, as everyone knows: the cpu is used to execute program instructions, and the program (Process) can continue to be split into threads. High CPU usage will eventually be reflected as one or several threads continuously occupying the cpu computing resources, so we need to pay attention to the various performance indicators of the thread first.

For the problem of high memory usage, this is slightly more complicated to locate. Since the program will store various types of data in memory during the running process, it is difficult to determine what these extra memories are used for. But this does not mean that there is no way to start. The program still has many key performance indicators that can indirectly reflect the use of memory, and the debugger also provides us with rich memory statistics commands, which can let us understand the memory usage of the program during operation

Preliminary problem positioning

According to the ideas mentioned in the above text, observe the cpu usage of each thread of the sysmon process in ProcessExplorer, as shown in the following figure:

Figure 2. Red is thread destruction, green is thread creation

It was found that the cpu usage of each thread was not high, but a large number of new threads were created, and a large number of old threads were destroyed at the same time. It can be seen that the cpu time slice is mainly used for thread creation and destruction operations.

For memory problems, observe the performance page of sysmon in ProcessExplorer, and find that the number of handles is abnormally high, and it has reached the order of magnitude in a very short time, as shown in the red box in the following figure:

Figure 3. Abnormal number of handles

So the next focus of the analysis will revolve around the large creation and destruction of threads and the abnormal creation of handles.

Locate the problem in the debugger

In order to corroborate the phenomenon observed in ProcessExplorer in the above text, we need to further confirm it in the debugger. The basic idea is: from the action of creating a thread, go back until you find the function where the thread is created. Below we take two steps to collect some preliminary information in this regard.

Locate the problem of large creation and destruction of threads

Use the following command to print the stack trace of all sysmon threads:

~*kb

The obtained stack trace is as follows:

Figure 4. Locate the problematic function

From the figure 2 in the above text, it can be known that the created threads all start with the address sysmon.exe+0x494a8. Following this clue, I found that the stack return address sysmon+0x495aa in Figure 4 falls within the range of the function sysmon.exe+0x494a8. Following the stack trace clue sysmon+0x495aa to find back, finally determined sysmon+0x1820e as the problematic function.

The analysis will stop here for a while, and later verify whether the analysis is correct. Next, follow up the second problem: high memory usage.

Locate the problem of high memory usage

As can be seen from Figure 3 above, sysmon created a large number of handles in a period of time. So, I investigated the use of these handles in the debugger, as shown in the figure:

Figure 5. Global statistics of handles

From Figure 5, it can be seen that in just a few minutes, the number of handles has increased from more than 45,000 in Figure 3 to the current 72942, of which the Event type handles account for 72878, accounting for as much as 99%, and these events all need to be manually reset. This implies a piece of information, before the program manually resets the state of these events, the event will not be released. If such a large-scale event is created and not released in a short time, the memory usage will naturally go up over time.

Confirm the problem in the debugger

Confirmation of high memory usage problem

Through the analysis of the previous section, we have initially located the cause of high cpu and memory usage. Next, let’s determine whether it is caused by this reason. While you just finished reading the problem of high memory usage, let’s start with this problem.

People familiar with windows programming can know that creating an event usually calls the CreateEvent function. So we put the following breakpoints on this function:

bu kernel32!CreateEventW “k;g”

Explain the meaning of this breakpoint: first break at the beginning of the CreateEventW function call in kernel32.dll, then print the stack call, and finally restore the execution of the program.

OK, the following picture is a part of the execution result:

Figure 6. Get stack trace call when creating Event

Figure 6 is about 5 seconds or so. Looking at the output from the right scroll bar, there are probably thousands of outputs, that is, thousands of Events were created in 5 seconds, which is consistent with the phenomenon of handle surge we observed above. Flip up each time the stack trace output, found that it is sysmon+0x181b7 nearby that triggered the CreateEventW call. Also bury a package here: pay attention to the OpenTraceW function.

There is also another place to pay attention to, that is, the last function call return address of the sysmon module in Figure 6: sysmon+0x181b7. Combined with the location in Figure 4, it is close to the address sysmon+0x1820e that causes new threads to be continuously created. As for the causal relationship between these two addresses, see the following analysis.

Confirmation of the problem of large creation and destruction of threads

Just mentioned that sysmon+0x181b7 and sysmon+0x1820e are very close, the following figure can give you a visual understanding of how close these two addresses are:

Figure 7. sysmon+181b7 and sysmon+1820e

So, according to the evidence we currently have, the story is likely to be like this:

Create a large number of threads
A large number of Events were created in the newly created thread
The created thread exited, but because the Event needs Manual Reset, the Event was not released

There is no free lunch in the world, so many Events need to be stored in memory. Therefore, creating a large number of threads causes high cpu usage, and creating a large number of unreleased Events causes high memory usage.

Seeing this, you might say, all of the above are your guesses, is there any evidence to support it? Next, we will verify our guess in the debugger. Before starting formally, I want to explain my thoughts: the conclusion of the above analysis is that the newly created thread caused the new creation of the Event; so, if we set two breakpoints, the first breakpoint is at sysmon+0x1820e, and the second breakpoint is at the system’s CreateEventW function; then, these two breakpoints should be alternately hit; because one of the two breakpoints is located in the thread function of the sysmon module, and the other is located in the kernel32.dll module of the system, so the two breakpoints theoretically There is no necessary connection. If the phenomenon is really that the two breakpoints are alternately hit, it can verify our guess in the above text.

The following is the actual operation, first of all, two breakpoints:

Figure 8. Two breakpoints

Let me explain, first of all, the value at the very beginning of each line is the breakpoint number, 0 represents the 0th breakpoint, and 1 represents the 1st breakpoint; ‘e’ represents that both breakpoints are in an activated (enabled) state; then there’s kernel32!CreateEventW and sysmon+0x1820e, which represent the address of the breakpoint (see the specific analysis above); finally, the meaning of the .echo at the beginning of the double quotes is that once the breakpoint at this address is hit, it prints the current breakpoint number and stack call relationship, and then resumes execution. Below is the hit situation:

Figure 9. Breakpoints hit alternately

As you can see, the phenomenon is consistent with our conjecture, which confirms the correctness of our guess. Possibly, more truthful readers will still say that what you are saying now is still indirect evidence. So next, we will face this problem directly and explain it with code.

The fundamental evidence for the occurrence of this problem

The two addresses in the above text (sysmon+181b7 and sysmon+1820e) are both located in the function sysmon+0x180E0. The pseudocode of this function is directly given:

Figure 10. The entry function of the thread causing high CPU usage

Let’s talk about the key logic here. Note that the function call sub_14 at line 44 in Figure 10 is: from the action of creating a thread, go back until you find the function where the thread is created 0018360, and the address of this function sub_1400180e0. The meaning of this code snippet is that under the action of a certain global signal, it may call the function sub_140018360, and give this function a parameter of 1.

Next, let’s see what the function sub_140018360 does. For convenience of explanation, the pseudocode has been partially folded:

Figure 11. The place that causes recursive calls

Pay attention to the thread creation function at line 69, that is to say, in the case where the parameter of the function sub_140018360 is 1, if there is no error, it will go to the thread creation at line 69, and the entry function of the thread is sub_1400180e0. Does it look familiar? If it’s not familiar, please see Figure 10. That is to say, sub_1400180e0 will finally call the function sub_140018360, and the function sub_140018360 internally creates a thread. The created thread calls back the function sub_1400180e0. As can be seen from the analysis of Figure 6, it is precisely because the OpenTraceW function in sub_1400180e0 creates a large number of Events, which causes the memory to soar. This is a logic similar to recursive calls. If it goes on consistently, there will be no end.

When the analysis reaches here, it is very clear. In order to get some kind of Trace result (the package mentioned in the above text, OpenTraceW function), sysmon uses a logic similar to recursion, and keeps retrying to get the result. It is precisely because of:

Recursive call to create a thread, causing the CPU usage to soar
The thread created above creates a large number of Events through OpenTraceW, causing the memory to soar

So far, the direct cause of sysmon’s high CPU and memory usage has been found.

What caused this problem to occur

During the debugging process, the following phenomena were found:

sysmon will restart once, and the sysmon after the restart will have high CPU and memory usage
Before sysmon restarts, TsService.exe will pull up several processes, but the pulled up processes end very quickly

Seeing this, you may have a question, what is TsService.exe? The answer is that this process is the background service process of QQ Browser. The evidence is as follows (this executable file has a normal Tencent signature):

Figure 12. TsService.exe is the service process of QQ Browser

Then, I followed up on what the several processes started by TsService.exe were:

Figure 13. The subprocesses started by TsService.exe

In the above mention, we need to pay attention to the several xperf subprocesses started by TsService.exe.

Writing here, insert a digression, in the process of solving this problem and checking the information, I found such a piece of information: “The Road to Performance Improvement of QQ Browser-Windows Performance Analysis Tool”. This article gives a detailed explanation of the xperf command in Figure 13:

Figure 14. Tencent’s explanation of xper parameters

And the pits that have been stepped on:

Figure 15. The startup failure mentioned in the text

The text mentions that ProcessMonitor and ProcessExplorer will cause xperf to fail, and you need to close and restart first. After my test, sysmon will also cause xper.exe to throw the same error as the above figure.

Another digression, xperf is a performance information collection tool provided by Microsoft. With this tool, you can solve many performance problems of programs on the windows system.

Do you still remember the two phenomena mentioned at the beginning of this section-sysmon will appear high occupancy after restarting. Could it have something to do with the xperf launched by Tencent Browser?

Reproduce this problem on arbitrary machine

In the above text, we already have a preliminary idea of the cause of this problem. Subsequently, I manually simulated the xperf command sequence started by TsService.exe and the phenomenon of sysmon process exit and restart. Successfully reproduced the problem of high CPU and memory usage of sysmon on any machine. The specific steps are:

First of all, it is best to have a newly installed pure system (optional)
Install the sysmon service on the system, the specific command is to execute: sysmon -i under cmd with administrator privileges
Stop the sysmon service (because not stopping will cause the xperf command to report an error)
Use xperf.exe to execute the following commands in turn:

xperf.exe -stop
xperf.exe -start -on disk_io+disk_io_init+filename+proc_thread

Restart the sysmon service

After the above steps are completed, the cpu usage of the newly started sysmon service will occupy a full logical core. Memory usage will also slowly go up.

If after the cpu usage of sysmon goes up, execute again

xperf.exe -stop

After this, the usage of sysmon will return to normal.

Who shut down sysmon?

Through the above analysis, it is known that the fundamental reason for this problem is that after sysmon restarts, the resources (NT Kernel Logger) it needs are occupied, causing the sysmon service after the restart to be unable to obtain the required resources and keep trying. So what exactly caused sysmon to exit?

Combined with the analysis above, I focused on the TsService.exe service. By setting a breakpoint on the OpenServiceW function of the TsService.exe process, the problem was finally located in the PerfTool.dll module:

Figure 16. Close sysmon service

This problem solution (part)

Scheme one

Note that here is just based on the analysis in the above text, a few solutions are proposed, not the final unique solution.

Through line 43 and 44 in Figure 10, it can be known that before calling the function that causes “recursion”, there is an if judgment. This judgment uses the return value of sub_14008da0 (I have renamed it to fnReadRegistryValue), and performs a logical and operation with 1. Only when this operation is true will the “recursive” logic occur. Next, let’s see what the function fnReadRegistryValue does:

Figure 17. Read the configuration item in the registry

It can be seen that the return value of this function is Data, and the data in Data comes from the following key value in the registry:

HKLM\\System\\CurrentControlSet\\Services\\SysmonDrv\\Parameters\\Options

If the value of this key value, and 1 do and operation is 0, you can end the “recursive” logic. That is to say, if the value under this key value is an even number, it will never cause high cpu and memory usage due to the “recursive” call of these two functions.

So what is this key value for? From the help document of sysmon, sysmon is a command line tool, it has startup parameters, combined with the meaning of Options, it can be known that different sysmon monitoring startup parameters will cause this value to be different. Later experiments also confirmed that this key value will indeed vary with different sysmon startup parameters.

Therefore, a feasible solution is that when the cpu usage is very high and the memory is slowly rising, you can temporarily set this key value to 0 to make the program jump out of the “recursive” logic, and then change this value back to the original value to avoid affecting its monitoring needs.

Scheme two

As can be seen from Figure 16, TsService.exe closes sysmon using hard coding. And sysmon supports using other service names when installing services. Therefore, another feasible solution is to use other service names when installing the sysmon service to prevent it from being closed by TsService.exe.

Summary

The demand for monitoring comes with various reasons, and it is not easy to do perfect. Just like the sysmon performance problem analyzed in this article. Originally thought that sysmon, as a tool “officially” produced by Sysinternals, a subsidiary of Microsoft, should be very stable, but as the current software running environment becomes more and more complex, some operating mechanisms of the tool may be destroyed, and once this situation occurs, it may affect the normal operation of the program, and at worst, it may cause the system to crash.

For monitoring projects, a corresponding operations backend is generally required to perform necessary monitoring on the basic indicators of the system. Once an anomaly in the monitoring indicators is detected, an alarm needs to be issued so that the operations personnel can intervene in time to control the loss. Of course, before this, some automatic recovery mechanisms can be added to nip the problem in the bud.

According to the author’s understanding, sysmon is still considered a widely used monitoring tool. It is hoped that the performance problems mentioned in this article and the ideas and methods to solve this problem will be helpful to everyone.

Watch & Learn

Debugwar Blog

Step in or Step over, this is a problem ...