Well, its 2017 and I’m writing about DLL injection. It could be worse. DLL injection is a technique used by legitimate software to add/extend functionality to other programs, debugging, or reverse engineering. It is also commonly used by malware in a multitude of ways. This means that from a security perspective, it’s imperative to know how DLL injection works.
I wrote most of the code of this small project, called ‘injectAllTheThings’, a while ago when I started developing custom tools for Red Team engagements (in order to emulate different types of threat actors). If you want to see some examples of threat actors using DLL injection have a look here. You may also find this project useful if you want to learn about DLL injection. The internet is full of crap when you look for this kind of information/code, and my code might not be better. I’m not a programmer, I just hack code when I need to. Anyway, I’ve put together in a single Visual Studio project multiple DLL injection techniques (actually 7 different techniques) that work both for 32 and 64 bits, in a very easy way to read and understand. Some friends showed interested in the code, so it might interest you too. Every technique has its own source file to keep things simple.
Below is the output of the tool, showing all the options and techniques implemented.
According to @SubTee, DLL injection is lame. I tend to agree, however DLL injection goes way beyond simply loading a DLL.
You can load DLLs with signed Microsoft binaries indeed, but you won’t attach to a certain process to mess with its memory. The reason why most of the Penetration Testers don’t actually know what DLL injection is, or how it works, is because Metasploit has spoiled them too much. They use it all the time, blindly. The best place to learn about this ‘weird’ memory manipulation stuff is actually game hacking forums, I believe. If you are into Red Teaming you might have to get ‘dirty’ and play with this stuff too. Unless you are happy to just run some random tools other people have written.
Most of times we start a Red Team exercise using highly sophisticated techniques, and if we stay undetected we start lowering the level of sophistication. That’s basically when we start dropping binaries on disk and playing with DLL injection.
This post attempts to give an overview of DLL injection in a very simple and high level way, and at the same time serves as “documentation” support for the project hosted at GitHub.
DLL injection is basically the process of inserting/injecting code into a running process. The code we inject is in the form of a dynamic linked library (DLL). Why? DLLs are meant to be loaded as needed at run time (like shared libs in UNIX). In this project I’ll be using DLLs only, however we actually can ‘inject’ code in many other forms (any PE file, shellcode/assembly, etc. as commonly seen in malware).
Also, keep in mind that you need to have an appropriate level of privileges to start playing with other processes’s memory. However, I won’t be talking about protected processes and Windows privilege levels (introduced with Vista). That’s a completely different subject.
Again, as I said above DLL injection can be used for legitimate purposes. For example, antivirus and endpoint security solutions use these techniques to place their own software code/hooks into “all” running processes on the system. This enables them to monitor each process while it’s running, and better protect us. There are also malicious purposes. A common technique often used was injecting into the ‘lsass’ process to obtain password hashes. We all have done that. Period. Obviously, malware also uses code injection techniques extensively. Either to run shellcode, run PE files, or load DLLs into the memory of another process to hide itself, among others.
We’ll be using the MS Windows API for every technique, since it offers a considerable number of functions that allow us to attach and manipulate other processes. DLLs have been the cornerstone of MS Windows since the first version of the operating system. In fact, all the functions in the MS Windows API are contained DLLs. Some of the most important are ‘Kernel32.dll’ (which contains functions for managing memory, processes, and threads), ‘User32.dll’ (mostly user-interface functions), and ‘GDI32.dll’ (functions for drawing graphics and text display).
You might be wondering why such APIs exist, why would Microsoft give us such a nice set of functions to play and mess with other processes memory? The main reason is to extend the features of an application. For example, a company creates an application and wants to allow other companies to extend or enhance the application. So yes, it has a legitimate usage purpose. Besides, DLLs are useful for project management, conserve memory, resource sharing, and so on.
The diagram below tries to illustrate the process flow of almost every DLL injection technique.
As you can see above, I would say DLL injection happens in four steps:
Attach to the target/remote process Allocate memory within the target/remote process Copy the DLL Path, or the DLL, into the target/remote process memory Instruct the process to execute the DLL
All these steps are accomplished by calling a certain set of API functions. Each technique will require a certain setup and options to be set. I would say that each technique has their positives and negatives.
We have multiple options to instruct a process to execute our DLL. The most common ones are maybe ‘CreateRemoteThread()’ and ‘NtCreateThreadEx()’. However, it’s not possible to just pass a DLL as parameter to these functions. We have to provide a memory address that holds the execution starting point. For that, we need to perform memory allocation, load our DLL with ‘LoadLibrary()’, copy memory, and so on.
The project I called ‘injectAllTheThings’ (because I just hate the name ‘injector’, plus there are already too many crappy ‘injectors’ on GitHub, and I couldn’t think of anything else), includes 7 different techniques. I’m not the original author of any of the techniques. I just compiled, and cleaned, these seven techniques (yes, there are more). Some are well documented (like ‘CreateRemoteThread()’), others use undocumented APIs (like ‘NtCreateThreadEx()’). Here’s a complete list of the techniques implemented, all working for both 32 and 64 bits.
- Code cave via SetThreadContext()
- Reflective DLL
You might know some of these techniques by other names. This isn’t a complete list of every DLL injection technique around. As I said, there are more, I might add them later if I have to play with them for a certain project. Until now this the list of techniques I used in some projects. Some are stable, some aren’t. Maybe the unstable ones are because of my own code, you have been warned.
As stated on MSDN, the ‘LoadLibrary()’ function “loads the specified module into the address space of the calling process. The specified module may cause other modules to be loaded”.
lpFileName [in] The name of the module. This can be either a library module (a .dll file) or an executable module (an .exe file). (...) If the string specifies a full path, the function searches only that path for the module. If the string specifies a relative path or a module name without a path, the function uses a standard search strategy to find the module (...) If the function cannot find the module, the function fails. When specifying a path, be sure to use backslashes (\), not forward slashes (/). (...) If the string specifies a module name without a path and the file name extension is omitted, the function appends the default library extension .dll to the module name. (...)
In other words, it takes a filename as its only parameter and everything works. That is, we only need to allocate some memory for the path of our DLL and set our execution starting point to the address of ‘LoadLibrary()’ function, passing the memory address of the path as a parameter.
As you may, or may not know, the big issue here is that ‘LoadLibrary()’ registers the loaded DLL with the program. Meaning it can be easily detected, but you might be surprised that many endpoint security solutions still fail at this. Anyway, as I said before, DLL injection has legitimate usage cases too, so… Also, note that if a DLL has already been loaded with ‘LoadLibrary()’, it will not be executed again. You might work around this, but I didn’t do it for any of the techniques. With the Reflective DLL injection you don’t have this problem of course, because the DLL is not registered. The Reflective DLL injection technique instead of using ‘LoadLibrary()’, loads the entire DLL into memory. Then determines the offset to the DLL’s entry point to load it. Call it more stealthy if you want. Forensics guys will still be able to find your DLL in memory, but it won’t be that easy. Metasploit uses this massively, still most of endpoint solutions are happy with all this anyway. If you feel like hunting for this kind of stuff, or you are in the ‘blue’ side of the game, have a look here and here.
As a side note, if you are really struggling with your endpoint security software being fine with all this… you might want to try to use some gaming anti-cheating engine instead (note, I’m only trying to be funny in case you didn’t get it). The anti-rootkit capabilities of some anti-cheating games is way more advanced than some AVs. There’s a really cool interview with Nick Cano, author of the “Game Hacking” book, on reddit that you must read. Just check what he has been doing and you’ll understand what I’m talking about.
Attach to the target/remote process
For a start, we need a handle to the process we want to interact with. For this we use the ‘OpenProcess()’ API call.
If you read the documentation on MSDN you’ll see that we need to request a certain set of access rights. A complete list of access rights can be found here.
These might vary across MS Windows versions. The following is used across almost every technique.
Allocate memory within the target/remote process
In order to allocate memory for the DLL path we use ‘VirtualAllocEx()’. As stated in MSDN, ‘VirtualAllocEx()’ “reserves, commits, or changes the state of a region of memory within the virtual address space of a specified process. The function initializes the memory it allocates to zero.”
Basically, we’ll do something like this:
Or you could be a bit smarter and use the ‘GetFullPathName()’ API call. However, I don’t use this API call on the whole project. Just a matter of preference, or not being smart.
If you want to allocate space for the full DLL, you’ll have to do something like:
Copy the DLL Path, or the DLL, into the target/remote process’ memory
Now it’s just a matter of copying our DLL Path, or the full DLL, into the target/remote process by using the ‘WriteProcessMemory()’ API call.
That is something like…
If we want to copy the full DLL, like in the Reflective DLL injection technique, there’s a bit more code, as we need to read it into memory before we copy it into the target/remote process.
As I mentioned before, by using the Reflective DLL injection technique, and copying the DLL into memory, the DLL won’t be registered with the process.
It gets a bit complex because we need to obtain the entry point to the DLL when it is loaded in memory. The ‘LoadRemoteLibraryR()’ function, which is part of the Reflective DLL project, does it for us. Have a look at the source if you want.
One thing to notice is that the DLL we’ll be injecting needs to be compiled with the appropriate includes and options so it aligns itself with the ReflectiveDLLInjection method. The ‘injectAllTheThings’ project includes a DLL called ‘rdll_32.dll/rdll_64.dll’ that you can use to play with.
Instruct the process to execute the DLL
We can say that ‘CreateRemoteThread()’ is the classic and most popular DLL Injection technique around. Also, the most well documented one.
It consists of the steps below:
Open the target process with OpenProcess() Find the address of LoadLibrary() by using GetProcAddress() Reserve memory for the DLL path in the target/remote process address space by using VirtualAllocEx() Write the DLL path into the previously reserved memory space with WriteProcessMemory() Use CreateRemoteThread() to create a new thread, which will call the LoadLibrary() function with the DLL path name as parameter
If you look at ‘CreateRemoteThread()’ documentation on MSDN, we can see that we need a “pointer to the application-defined function of type LPTHREAD_START_ROUTINE to be executed by the thread and represents the starting address of the thread in the remote process.”
Which means that to execute our DLL we only need to instruct our process to do it. Simple.
See below all the basic steps listed above.
For the complete source code see ‘t_CreateRemoteThread.cpp’.
Another option is to use ‘NtCreateThreadEx()’. This is an undocumented ‘ntdll.dll’ function and it might disappear or change in the future. This technique is a bit more complex to implement as we need a structure (see below) to pass to it and another to receive data from it.
There’s a good explanation about this call here. The setup is very close to what we do for ‘CreateRemoteThread()’. However, instead of calling ‘CreateRemoteThread()’ we do something along the lines.
For the complete source code see ‘t_NtCreateThreadEx.cpp’.
An alternative to the previous techniques, that doesn’t create a new thread in the target/remote process, is the ‘QueueUserAPC()’ call.
As documented on MSDN, this call “adds a user-mode asynchronous procedure call (APC) object to the APC queue of the specified thread.”
Here’s the definition.
pfnAPC [in] A pointer to the application-supplied APC function to be called when the specified thread performs an alertable wait operation. (...) hThread [in] A handle to the thread. The handle must have the THREAD_SET_CONTEXT access right. (...) dwData [in] A single value that is passed to the APC function pointed to by the pfnAPC parameter.
So, if we don’t want to create our own thread, we can use ‘QueueUserAPC()’ to “hijack” an existing thread in the target/remote process. That is, calling this function will queue an asynchronous procedure call on the specified thread.
We can use a real APC callback function instead of ‘LoadLibrary()’. The parameter can actually be a pointer to the filename of the DLL we want to inject.
There’s a little gotcha that you might notice if you try this technique, which is related to the way MS Windows executes APC’s. There’s no scheduler looking at the APC queue, meaning the queue is only examined when the thread becomes alertable.
Because of this we basically hijack every single thread, see below.
We basically do this expecting one thread to become alertable.
As a side note, it was nice to see this technique being used by DOUBLEPULSAR.
For the complete source code see ‘t_QueueUserAPC.cpp’.
In order to use this technique the first thing we need to understand is how MS Windows hooks work. Basically, hooks are a way to intercept events and act on them.
As you may guess, there are many different types of hooks. The most common ones might be WH_KEYBOARD and WH_MOUSE. You guessed right, these can be used to monitor, the keyboard and mouse input.
The ‘SetWindowsHookEx()’ “installs an application-defined hook procedure into a hook chain.”
idHook [in] Type: int The type of hook procedure to be installed. (...) lpfn [in] Type: HOOKPROC A pointer to the hook procedure. (...) hMod [in] Type: HINSTANCE A handle to the DLL containing the hook procedure pointed to by the lpfn parameter. (...) dwThreadId [in] Type: DWORD The identifier of the thread with which the hook procedure is to be associated. (...)
An interesting remark on MSDN states that:
“SetWindowsHookEx can be used to inject a DLL into another process. A 32-bit DLL cannot be injected into a 64-bit process, and a 64-bit DLL cannot be injected into a 32-bit process. If an application requires the use of hooks in other processes, it is required that a 32-bit application call SetWindowsHookEx to inject a 32-bit DLL into 32-bit processes, and a 64-bit application call SetWindowsHookEx to inject a 64-bit DLL into 64-bit processes. The 32-bit and 64-bit DLLs must have different names.”
Keep this in mind.
Here’s a simple extract of the implementation.
We need to understand that every event that occurs will go through a hook chain, which is a series of procedures that will run on the event. The setup of ‘SetWindowsHookExe()’ is basically how we put our own hook procedure into the hook chain.
The code above takes the type of hook to be installed (WH_KEYBOARD), the pointer to the procedure, the handle to the DLL with the procedure, and the thread id to associate the hook to.
In order to get the pointer to the procedure we need to first load the DLL using the ‘LoadLibrary()’ call. Then we call ‘SetWindowsHookEx()’ and wait for the event that we want (in our case pressing a key). Once that event happens our DLL is executed.
Note that even the CIA guys are, potentially, having some fun with ‘SetWindowsHookEx()’ as we can see on Wikileaks.
For the complete source code see ‘t_SetWindowsHookEx.cpp’.
The ‘RtlCreateUserThread()’ is an undocumented API call. Its setup is, almost, the same as ‘CreateRemoteThread()’, and subsequently as ‘NtCreateThreadEx()’.
Actually, ‘RtlCreateUserThread()’ calls ‘NtCreateThreadEx()’, which means ‘RtlCreateUserThread()’ is a small wrapper for ‘NtCreateThreadEx()’. So, nothing new here. However, we might want to just use ‘RtlCreateUserThread()’ instead of ‘NtCreateThreadEx()’. Even if the later changes, our ‘RtlCreateUserThread()’ should still work.
So, if mimikatz and Metasploit are using ‘RtlCreateUserThread()’… and yes, those guys know their stuff… follow their “advice”, use ‘RtlCreateUserThread()’. Especially if you are planning to do something more serious than a simple ‘injectAllTheThings’ program.
For the complete source code see ‘t_RtlCreateUserThread.cpp’.
This is actually a very cool method. A specially crafted code is injected into the target/remote process by allocating a chunk of memory in the target/remote process. This code is responsible for loading the DLL.
Here’s the code for 32 bits.
For 64 bits I couldn’t actually find any assembly working code and I kinda wrote my own. See below.
Before we inject this code into the target process some placeholders need to be filled/patched with:
- Return address (address where the thread should resume once the code stub has finished execution)
- The DLL path name
- Address of LoadLibrary()
And that’s when the game of hijacking, suspending, injecting, and resuming a thread comes into play.
We need first to attach to the target/remote process, of course, and allocate memory into the target/remote process. Note that we need to allocate memory with read and write privileges to hold the DLL path name and to hold our assembly code that will load the DLL.
Next, we need to get the context of one of the threads running on the target/remote process (the one that is going to be injected with our assembly code).
To find the thread, we use the function ‘getThreadID()’, you can find it on the file ‘auxiliary.cpp’.
Once we have our thread id, we need to set the thread context.
Next, we need to suspend the thread to capture its context. The context of a thread is the state of its registers. We are particularly interested in EIP/RIP (call it IP - instruction pointer, if you want).
Since the thread is suspended, we can change the EIP/RIP value and force it to continue its execution in a different path (our code cave).
So, we suspend the thread, we capture the context, and from there we extract the EIP/RIP. This is saved to resume the execution when our injected code finishes. The new EIP/RIP is set as our injected code location.
We then patch all the placeholders with the return address, the DLL path name address, and the ‘LoadLibrary()’ address.
Once the thread starts executing, our DLL will be loaded and once it finishes it will return back to the point it was suspended at and resume its execution there.
If you feel like debugging this technique as a learning exercise, here’s how to do it. Launch the application you want to inject into, let’s say ‘notepad.exe’. Run ‘injectAllTheThings_64.exe’ with ‘x64dbg’ as shown below.
That is, using the following command line (adapt to your environment):
Set a breakpoint on the call to ‘WriteProcessMemory()’ as shown below.
Let it run and when the breakpoint is hit take note of the memory address at the register RDX. If you are asking yourself why RDX is time to read about the calling convention used in x64. Have fun and come back once you finish.
Step over (F8) the call to ‘WriteProcessMemory()’, launch another instance of x64dbg and attach to ‘notepad.exe’. Go to the address copied before (the one at RDX) by pressing ‘Ctrl + g’ and you will see our code cave assembly as shown below.
Cool, huh!? Now set a breakpoint at the beginning of this shellcode. Go to the ‘injectAllTheThings’ debugged process and let it run. As you can see below our breakpoint is hit and we can now step over the code for fun and enjoy this piece of code working.
Once we call the ‘LoadLibrary()’ function, we get our DLL loaded.
This is so beautiful…
Our shellcode will return to the previously saved RIP and ‘notepad.exe’ will resume execution.
For the complete source code see ‘t_suspendInjectResume.cpp’.
Reflective DLL injection
I also incorporated Stephen Fewer’s (pioneer of this technique) code into this ‘injectAllTheThings’ project, and I also built a reflective DLL to be used with this technique. Note that the DLL we’re injecting must be compiled with the appropriate includes and options, so it aligns itself with the Reflective DLL injection method.
Reflective DLL injection works by copying the entire DLL into memory, so it avoids registering the DLL with the process. All the heavy lifting is already done for us. To obtain the entry point to our DLL when it’s loaded in memory we only have to use Stephen Fewer’s code. The ‘LoadRemoteLibraryR()’ function included within his project does it for us. We use the ‘GetReflectiveLoaderOffset()’ to determine the offset in our processes memory, then we use that offset plus the base address of the memory in the target/remote process (where we wrote the DLL) as the execution starting point.
Too complex? Yes, it might be. Here are the main 4 steps to achieve this.
Write the DLL headers into memory Write each section into memory (by parsing the section table) Check imports and load any other imported DLLs Call the DllMain entry-point
This technique offers a great level of stealth in comparison to the other methods, and is massively used in Metasploit.
Also, have a look at Loading a DLL from memory from Joachim Bauch, author of MemoryModule and this nice post about Loading Win32/64 DLLs “manually” without LoadLibrary().
There are some more obscure and complex injection methods around. So I’ll eventually update the ‘injectAllTheThings’ project in the future. Some of the most interesting ones I’ve seen lately are:
- The one used by DOUBLEPULSAR
- The one written by @zerosum0x0, Reflective DLL injection using SetThreadContext() and NtContinue() described here and code available here.
All of the techniques I described above are implemented in one single project I made available at GitHub. It also includes the required DLLs for each of the techniques. The table below makes it easy to understand what’s actually implemented and how to use it.
|Method||32 bits||64 bits||DLL to use|
|CreateRemoteThread()||+||+||dllmain_32.dll / dllmain_64.dll|
|NtCreateThreadEx()||+||+||dllmain_32.dll / dllmain_64.dll|
|QueueUserAPC()||+||+||dllmain_32.dll / dllmain_64.dll|
|SetWindowsHookEx()||+||+||dllpoc_32.dll / dllpoc_64.dll|
|RtlCreateUserThread()||+||+||dllmain_32.dll / dllmain_64.dll|
|SetThreadContext()||+||+||dllmain_32.dll / dllmain_64.dll|
|Reflective DLL||+||+||rdll_32.dll / rdll_64.dll|
Needless to say, to be on the safe side, always use injectAllTheThings_32.exe to inject into 32 bits processes or injectAllTheThings_64.exe to inject into 64 bits processes. Although, you can also use injectAllTheThings_64.exe to inject into 32 bits processes. And actually, I didn’t implement it, but I might have to give it a try later, you can go from WoW64 to 64 bits. Which is basically what Metasploit ‘smart_migrate’ does. Have a look here.
The code for the whole project, including DLLs is available at GitHub. Compile for 32 and 64 bits, with or without debugging and have fun.
- Windows via C/C++ 5th Edition