String Hashing: Reverse Engineering an Anti-Analysis Control

String hashing is a method employed by malware authors to disguise strings that are critical to its (stealthy) execution such as library, function and/or process names. Being able to determine what these hashes represent can aide malware researchers in developing more robust anti-anti-analysis techniques, technologies, and detections.

To be clear, hashes are one-way; meaning that it is impossible to obtain the original string having only the hash itself. However, if we have the algorithm used to create the hash and some additional contextual knowledge we can – in essence – guess or brute force the original value.

In this post, I will delve into one of GootKit’s anti-analysis methods and use what is learned to identify the true executable names that are associated with the precalculated hashes defined by the malware author. Knowing the executable names that GootKit tries to detect can help us build better anti-anti-analysis controls within our sandbox environments.

Note: “process” and “executable” are used interchangeably in this blog.

Gootkit Sample


MD5:  e561ae3cedb6f9fc0ecff559c62788b0
SHA256:  38933984f5ff8b71c054d1c1155e308ac02377b89315ef17cea859178a30dbab
VT:   link


MD5:  1a2e1964da566143ad274ee3720924b8
SHA256:  88d14d717468c984db02a032ff1b809d7998638fc4c731e17be7083d47b012e6
VT:   link

GootKit’s Anti-Analysis Capability

GootKit has to be one of the most thorough in terms of its anti-analysis capabilities and is a good use-case for learning how to reverse engineer such capabilities. The anti-analysis capability that we will be focusing on is the one that detects if any blacklisted analysis tools are currently running.

How it works

Figure 1: Execution of areMalwareAnalysisToolsRunning

First, a function – which I have labeled “areMalwareAnalysisToolsRunning” – is called at 0x40EED0 (Figure 1).

Figure 2: Inside checkRunningProcesses4BlacklistedHash

Once inside the areMalwareAnalysisToolsRunning function (Figure 2), we see a total of 45 different precalculated hashes that represent executable names of tools that could potentially be used for malware analysis. These are the hashes that we will attempt to find their corresponding executable names.

At this stage, this is a truth that I know because I have already performed analysis on this function. If you are doing this on your own, it might take you two, three, four+ times running through this kind of function before you fully start to understand what is happening.

The areMalwareAnalysisToolsRunning function:

  1. Places the first precalculated hash (0x278CDF58) into the EAX register at 0x40268C.
  2. Moves the remaining 44 precalculated hashes onto the stack.
  3. Enters a loop that iterates through each hash, passing it to the function labeled checkRunningProcesses4BlacklistedHash (via EAX register rather than pushing it to the stack).

So, what happens when the precalculated hash is passed to checkRunningProcesses4BlacklistedHash?

Figure 3: checkRunningProcesses4BlacklistedHash key loop

Well, this function obtains a list of processes currently running on the system via a call to ntdll.NtQuerySystemInformation (not shown). It then iterates through all of the process names (Figure 3), hashes the process name using a function labeled getHashOfUnicodeString (at 0x40559E), and then compares the precalculated hash to the hash of the current process name via CALL to a function labeled compareHashValues at 0x4055AF.

If the hashes match, Gootkit detects that a blacklisted application is running and – as a result – removes itself from the system and terminates its execution.

Putting Names to Hashes

Now that we understand what the areMalwareAnalysisToolsRunning function is doing, we now need to take steps to identify the process names that are associated with all of the hashes specified by GootKit. I found this easiest to do in python but feel free to recreate in whatever language you are most comfortable in.

Step 1: Recreate Hashing Function

The first step is to analyze the hashing algorithm used by GootKit to hash strings. This is the function labeled getHashOfUnicodeString being called in Figure 3 at 0x40559E.

Inside this function, the process name is converted from unicode to ascii and then its characters are converted to uppercase (not shown). This is an important bit of information because it eliminates lowercase characters as potential options for filenames.

Figure 4: Hashing Function

The formatted process name is then run through the algorithm shown above. We can easily recreate this within python.

Figure 5: Assembly to Python – Outer Loop

On the left is the assembly version of the hashing function (Figure 5). On the right is the same function that I’ve recreated in python. This function contains nested loops. I’ve placed the outer loops within the red box to show the relation between assembly and python. This outer loop is simply responsible for iterating through each character of the process name (or any string that is passed to it).

Figure 6: Assembly to Python – Inner Loop

Figure 6 highlights the inner loop, which takes the current character provided to it by the outer loop, and passes it through the hashing algorithm 8 times.

We know it is 8 times because the value 8 is pushed to the stack at 0x4041EC and then popped into the EBX register at 0x4041EF. Then the inner loop decrements EBX by 1 and then checks to see if EBX is 0… meaning “Did we loop 8 times?”. If so, move on to the next character (outer loop). if not, run back through the hashing algorithm again (inner loop).

Figure 7: Assembly to Python – Register to Variable Parallels

For the purpose of this exercise, CPU Registers are essentially variables. They hold values assigned to them. So, when recreating assembly within python, I find it easiest to name my variable based on the corresponding registers found within the assembly. Figure 7 helps you to see the parallels between the assembly instructions and their python equivalent. For example:

  • “XOR EAX, ESI” in assembly equals “EAX ^ ESI” in python
  • “SHR ESI, 1” in assembly equals “ESI >> 1” in python

Aldeid has a decent reference for assembly instructions that manipulate data (ie ADD, SUB, SHR, AND, OR, XOR, etc..) and their python equivalents, which can be found here:

Step 2: Modify Script to Check Currently Running Processes

Now that we have recreated the hashing function within Python, we need to put it to use.

Figure 8: Running Processes – Python Implementation

Figure 8 depicts the first iteration of our script. Below is a quick description of each section:

  1. Line 3 imports the python library needed to obtain the list of processes currently running on the host.
  2. Lines 5 thru 16 is the hashing function that we have already discussed (Figures 5,6,7).
  3. Lines 18 thru 26 is an array consisting of the 45 precalculated hashes found within GootKit (Figure 2).
  4. Line 28 obtains the list of processes currently running on the host
  5. Lines 30 thru 34 iterate through those process names, hashes each one, and then checks to see if the resulting hash matches any of the precalculated hashes found within the hash_arr array. If so, it prints out the process name along with the hash.

With this script, we have essentially mirrored the functionality of GootKit’s areMalwareAnalysisToolsRunning function, except that it prints out the value when a match is made instead of terminating execution. If we execute this on our analysis host, we should see the following (Figure 9 – output will vary based on what you have running on your host):

Figure 9: Running Processes – Python Implementation Output

Of the 45 precalculated hashes, we’ve now identified the true process name for 7 of the unique hashes; all of which are – indeed – tools that are/can be used for performing malware analysis… hence the label of “areMalwareAnalysisToolsRunning” that I applied to the main function depicted in Figures 1 and 2.

Figure 10: Precalculated Hashes Annotated w/ Corresponding Executable Names. First Pass

Figure 10 shows the hashes within the code (in IDA) updated with comments to reflect the corresponding process names… Only 38 to go.

Step 3: Modify Script to Add Array of Potential Executable Names

Now that we have confirmed that 1) the script works and 2) the hashes correspond to names of malware analysis tools/executables, the next step is to create an array of executable names commonly associated with malware analysis and pass said executable names through this same process.

With a bit of googling, I was able to compile the following array of executable names:


After adding this array to the existing script, I also add in a second loop that will iterate through each of these values:

for program in program_arr:
	string_hash = getStringHash(program)
	if string_hash in hash_arr:
		print "Hash matches: %s --> %s" % (program.upper(), hex(string_hash).upper().rstrip('L'))

Unfortunately, the code is now too large to fit it all into one screenshot. However, Figure 11 is a snippet showing the new additions (highlighted in red boxes):

Figure 11: Updated Python Code w/ Checks for Malware Analysis Tool Executable List

When executed, we get the following output (Figure 12):

Figure 12: Running Processes + Defined Tool List – Python Implementation Output

BOOM! 32 of the 45 executable names have now been identified. Figure 13 shows the updated progress within IDA:

Figure 13: Precalculated Hashes Annotated w/ Corresponding Executable Names. Second Pass

13 unidentified executable names remaining…

Step 4: We Brute Force

Brute forcing anything sucks. Before doing this, you might want to evaluate the list of executable names and make an educated guess as to other executables that you should add to your array. For example, we see PYTHON.EXE and PYTHONW.EXE on the list…. maybe PERL.EXE is a potential value that we’d want to add to the program_arr array and rerun the script?

If you do this, and you still have unresolved hashes, it’s time to bite the bullet and just brute force.

Figure 14: Python Brute Force Script – 3 Characters

Because brute forcing is much different than what we’ve done to this point, Its probably best just to create a new set of scripts to eliminate confusion and unnecessary processing power. Figure 14 shows this new brute force script which has the following updates:

  1. Removed code related to the checking of currently running processes and the Malware Analysis Tool Executable array.
  2. Added an array of valid filename characters (Lines 3 thru 6).*
  3. Removed resolved hashes from the hash array (Lines 8 and 9).
  4. Added nested for loops that will generate the filenames (Lines 24 thru 30).**

* Yes, there are valid characters such as tilde (~) that I could have included but I’m going to roll the dice and not include it. Each additional character that we include in this array exponentially increases the brute forcing time, so it is best to limit this. Also note that lowercase characters are not included. Thankfully, GootKit converts the executable names to uppercase, which eliminates the need to include lowercase letters. This significantly reduces the number of calculations that will need to be performed.

** These loops correspond to the length of the filename being brute forced (sans .EXE extension). So, in Figure 14 there are three for-loops, which will brute force filenames that are three characters long plus extension (ex. AAA.EXE, AAB.EXE, AAC.EXE, …).

Rather than brute force filename lengths in succession, I’ve created four additional scripts, similar to the one shown in Figure 14 but with an increasing number of for-loops implemented, that I will run in parallel to expedite the process.

Figure 15: Brute Force Script Output – 3 Characters

Figure 15 shows the output of the script shown in Figure 14 that brute forced filenames with 3 characters excluding the .EXE extension. When run, this script returns almost instantly and provides us with PHP.EXE, which is a valid executable that we had not previously identified. So, we add it to the list of resolved executable names that we have been building within IDA.

Figure 16: Brute Force Script Output – 4 Characters

The next script that brute forces 4 characters brings us more goodies (Figure 16)! After about one minute, the script returns EMUL.EXE, IMUL.EXE, PEID.EXE, and PERL.EXE, which are also all valid executables that could potentially be used in analyzing malware.

Figure 17: Brute Force Script Output – 5 Characters

The script that brute forces 5 characters took about an hour to run and returned two filenames that matched a missing hash (Figure 17). Unfortunately, these do not appear to be executable names that are associated with any tools that I am aware of. So, I am going to make the assumption that this is the result of hash collisions. Since I do not believe these to be valid executable names that GootKit is actually looking for, I will not note them within the resolved executable name list.

Figure 18: Brute Force Script Output – 6 Characters

As we increase the length of characters that we attempt to brute force, we exponentially increase the time it takes to iterate though all potential character combinations. The script in Figure 18 took about 1 day to complete execution and provided us with a bunch of results. Unfortunately, most of them appear to the hash collisions with the exception of the three highlighted within the red box.

APISPY.EXE is 100% a valid tool.

ANGAR2.EXE and DH2LLV.EXE look like potentially valid executable names but i couldn’t associate it with any known tools. For now, I will add them to the list as a valid executable name and will remove them if determined to be invalid later on. If anyone can help me confirm this, let me know!

Figure 19: Brute Force Script Output – 7 Characters

The script in Figure 19 (7 characters) has been running for 3 days now and its only hit the CXXXXXX.EXE executable names. Rather than waiting a week+ for the script to finish, I’ve decided to just publish this blog and update it later if anything earth shattering comes of it. Much like the previous script, this script has resulted in a good number of hash collisions resulting in invalid executable names. The only valid executable name returned thus far is AUTOIT3.EXE… To the list it goes!


Using the methods that I’ve described in this blog, I was able to associate all but one (maybe three) of the precalculated hashes with their corresponding executable names (Figure 20).

Figure 20: Precalculated Hashes Annotated w/ Corresponding Executable Names. Final Pass

The one that is still missing is 0x7EC953AB. If you happen to also be doing similar research on GootKit and know what executable name this hash resolves to, please let me know. Also, if you are able to provide confirmation that the following are correct/incorrect, please do!

  • Hash matches: DH2LLV.EXE –> 278CDF58
  • Hash matches: ANGAR2.EXE –> 62B621C4

Surprisingly, one of the most difficult parts of this exercise was finding a comprehensive list of executable file names. This anti-analysis control that GootKit employs is one that is commonly used by malware, so you would think that such a list would have already been produced by researchers and would have been made readily available.

Well, it wasn’t. So, I have pushed the list I compiled to my github (Link) so that others can take advantage… and if you have file names that should be added, let me know and we can add them. Also, since all of the scripts I’ve talked about in this post were displayed using screenshots, I’ve posted the actual scripts on my github (Link) as well.

Finally, (malware) developers are typically lazy… and chances are they’ve stolen any type of ‘complex’ code from somewhere else. So, if you see something unique within an algorithm used in a calculation, such as the bytes 0xEDB88320 used in the XOR statement used at 0x4941FA shown in figures 5,6, and 7, you might find some valuable information about what kind of algorithm was used (Thanks, @herrcore!).



One thought on “String Hashing: Reverse Engineering an Anti-Analysis Control

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s