Writing a Mutation Engine and breaking Aimware
PERSES' source code can be found here.
I wrote a X86 code obfuscation engine which was used to protect CS:GO and break Aimware
, a well known cheat software provider. In this post I will discuss the road to building an obfuscator, usage of commercial protectors, and why inhouse solutions are ideal for protecting intellectual property. The project that will be discussed is by no means intended to compete with these commercial protectors, rather propose questions and elucidate the flaws by choosing such solutions.
PERSES is a mutation engine that operates on at the X86 instruction level for Portable Executables files. The objective is to obfuscate the underlying semantics of an instruction by replacing it with a larger, more sophisticated collection of instructions. PERSES only mutates a very small subset of x86 instructions, those being; PUSH
, MOV
, CALL
, JMP
, ADD
, and XOR
. In addition, PERSES also allows mutation of jump table accesses. Each instruction is mutated differently, but the transformations are conceptually similar. Every instruction listed takes various sets of operands, so it is important to ensure that the instruction under analysis fits our specific guidelines and is mutated accordingly. Immediates, relatives, and memory accesses are all encrypted at compile time via the PERSES mutation schema, then instructions are implemented into the stream in order to decrypt the operand at run time and continue the original program’s flow. Furthermore, the semantics of the original instruction are replicated by a group of alternative instructions which perform the same operation as the original. ADD
and XOR
instructions are converted into a larger, more sophisticated expressions; this is known as mixed boolean arithmetic.
As mentioned before, we encrypt operands via various arithmetical instructions, add the instructions needed to decrypt the operand, then perform the original instruction (not always equivalently, just semantically identical). When the operand of an immediate or memory reference is found, we gauge if it is a offset or pointer in the file by checking if there is a relocation bound to the corresponding instruction segment. PERSES makes use of a great utility in Zydis named ZydisGetInstructionSegments
to discover if the operand in question is relocatable.
CALL
: When a CALL
is hit, we can completely cripple a decompiler’s control flow analyzer by performing the semantics of the CALL
instruction in steps,
while also keeping our encrypted operand tactic in play. As we already know, a CALL
instruction pushes the return address on the stack before entering
the target destination, meaning we can manually push the proper return address and place a JMP
to the destination (or PUSH/RET
, etc). This will impair decompilers because when a RET
(or JMP
in some cases) is hit, the analyzer may assume that it has reached the end of the current routine, depending on the basic block being observed. This type of mutation can turn a single function into dozens or hundreds of stubs, which will impede reverse engineering.
PUSH
: PUSH
instructions are only mutated on the x86 (32bit) architecture, and the operand must be an immediate or memory reference with no base registers. The original value becomes obscured through the means of encryption. The EFLAGS
register is preserved.
MOV
: MOV
is a very commonly used instruction. There are 4 variants of the MOV
instruction which become transformed by PERSES. PERSES will also increase complexity of transformations depending on the variant of the instruction and operand size. For example, on 32bit+ immediates and memory operands, a two stage load is integrated where the upper and lower half portions of the operand are encrypted separately. Once both portions are decrypted via the newly appended x86 instructions, they are finally OR
’d together to produce the original value. The semantics of the original instruction are then replicated once the value is ready to be read.
XOR
: XOR
instructions are handled under certain conditions. The first operand must be a register, and the second must be either a register or immediate. x ^ y
can be expressed as (x & ~y) | (~x & y)
, which is the expression I decided to incorporate when obfuscating XOR instructions. The expression is handled by creating scratch space on the stack for one side of the expression, while the destination register is used to handle the opposite side. Finally, the two sides are OR’d together to complete the operation.
ADD
: ADD
is handled under the same condition that XOR is. x + y
can be viewed as (x - ~y) - 1)
, or ~(~x - y)
. PERSES uses both variants depending on the second operand’s type.
Mixed Boolean Arithmetic (MBA) in relation to code obfuscation is converting a (usually) simple expression into a larger more sophisticated expression which is semantically identical. MBA is common in the code obfuscation realm due to ramifications when attempting to analyze or reverse engineer the transformed output. The expressions which PERSES uses are listed above, but I will explain why they work below.
-
XOR
:XOR
, otherwise known asexclusive or
is similar to alogical or
. The key difference between the two is that only one of the operands in the operation can be present, which is whyXOR
is commonly trivialized to a phrase such as “one or the other, but not both.” Theexclusive or
expression used ((x & ~y) | (~x & y)
) shows this operation in action by stripping out one of the operand’s value from the other thenOR
‘ing the two new operands. -
ADD
: The expression used inADD
works by using the inverse operation ofADD
,SUB
, conjointly with negation. This equates to an addition since it inverts the operation as well as the operand used.
For further information about MBA, I recommend this great paper which delves into topics such as truth tables, generation, and simplification.
Before an immediate or memory operand is encrypted, a check as stated before is performed to observe whether the operand will be relocated during loader mapping. If determined to be a relocatable, the operand is converted into a RVA before encryption. Once decryption has ensued, the result is translated into a virtual address by adding the new base address. In executable files, PERSES will sometimes use the PEB to determine current base address of the running program, otherwise, PERSES appends an instruction such as mov eax, 0x400000
where 0x400000
is the original base address and marks the immediate as a relocatable.
PERSES will not mutate an instruction under these certain circumstances.
- Memory operands must be purely RIP-relative on x64, or absolute on x86.
- The operand’s size does not fit the required size(s) PERSES operates on.
There are a handful of instructions that can make the pseudo-code output repulsive. PERSES makes use of some of these instructions, which include XCHG
, ROR
, ROL
, BSWAP
, and more. When some decompilation engines such as IDA Pro or Ghidra encounter these instructions, they may fail to apply some of the optimizations that would otherwise provide a normal, clean output. Moreover, if allowed, PERSES can emplace a JMP
to a RET
instruction rather than placing a RET
directly into the stream, which may skew code identification. This is done by caling perses::buildKnownRetGadgets<>()
.
Mutating instructions on the x86 instruction level comes with a few hardships. Semantical accuracy is a requirement, registers and flags must be preserved to ensure stability of the code, relatives and conditionals must be fixed, relocations must be fixed - the list keeps going. I’ll be going over the dilemmas I faced while developing PERSES.
When mutating an instruction and adding additional code, sometimes temporary registers must be used. PERSES ensures to keep the original register saved by pushing it on the stack, then retrieving it once it’s usage is no longer required. This is imperative to certifying that the original assembly will continue operating correctly.
Due to the nature of PERSES’ mutation, many instructions which taint the EFLAGS register are added into the instruction stream. The MOV
instruction for example does not have any effect on the EFLAGS register, however, when decryption is done the EFLAGS register will be polluted with values which may destroy the original program’s logic. PERSES uses the PUSHFD
and PUSHFQ
function to preserve the EFLAGS register, and restores the register once mutations on the instruction have been applied.
JCCs/JMP/CALLs
that lie in the routine’s range may be invalidated once additional instructions are added to the routine. This requries fix-ups that need to be handled by the mutation engine. Rather than building basic blocks, PERSES makes use of AsmJit’s Label
system and fixes conditionals/relatives in a hackier technique by keeping track of the original instruction offset and binding a label once a calculated offset for a conditional has been hit.
In PE files, relocations must be applied during mapping if the base address for the PE file on disk differentiates from the newly allocated base address. Due to increases in code size when adding instructions in any arbitrary location in the routine, relocations are removed (or rather ignored) from the original location, and either appended again automatically when an unhandled instruction is met by checking X86BinaryApplication::isRelocationPresent
, or added when handling the mutated instruction. Unfortunately, AsmJit did not seem to provide an easy way to generate relocations, my solution was to create a named Label
which was then parsed after finalization to retrieve the address of the new relocation. When the final image is compiled in X86BinaryApplication::compile
, the relocations previously appended are finally attached to the PE file using pepp::BlockStream
. Further information on the compiler steps and linkage will be elaborated further in this paper.
The X64 architecture varies much differently than its 32bit counterpart. Operands that reference memory accesses are relative to the instruction pointer, where as in X86, absolute addressing is used. Instructions that may use relative addressing such as LEA
need to be automatically parsed and repaired since this instruction, as with many other, won’t be included into the mutation schema. This is done with the help of ZydisInstructionSegments
, which allows us to break a single instruction’s encoding to multiple segments which describes types, e.g register, immediate. All relatives will be automatically repaired during image compilation.
As an optimization, compilers may generate a jump table to represent a switch case. Jump tables are simple but present an issue when mutating code, similar to conditionals which were outlined above. A jump table consists of a finite number of branches which are used to determine a code path based on a condition. Recovering a jump table is automatically done by PERSES. Steps to recovering a jump table requires understanding of the compiled assembly. The default branch is taken via a conditional if the index is not supported by a switch case, which exposes us to the number of jump table entries.
The jump table in the above picture has 4 entries. The comparison jumps to the default branch if the index exceeds the jump table’s boundaries, so we can take the immediate in the second operand of the comparison and add one, which gives us the number of entries in the jump table. PERSES collects information on all identified jump tables and stores them into a structure named JumpTableEntry
which describes a single entry. The new value of the entry is calculated identically to conditionals, which allows us to later modify the jump table to support our mutated code. A vector
of these entries is finally passed into X86BinaryApplication::linkCode
, where each entry’s value is adjusted to the newly mutated routine.
All necessary info such as relocations, jump table fix-ups, and RIP-relative fix-ups is provided to the application class in X86BinaryApplication::linkCode
from the deriving mutation schema. After all desired routines are transformed into their mutated counterpart, X86BinaryApplication::compile
is called where all relocations are appended to the .reloc
section, the .perses
section is created for the new code, then detours are written where the original code resided. Detours use a similar technique to PUSH ADDRESS; RET
to hit a destination, however rather than adding a PUSH
instruction, a CALL 0x5
instruction is placed (calling the next consecutive instruction). Since we can deduce the return address pushed on the stack, we add the delta (NewCodeRVA - RetAddrRVA
) to the return address and place the RET
. This, although trivial, forces static analysts to manually calculate the mutated routine’s address rather than easily being able to follow the flow.
Function Discovery and Identification
PERSES incorporates multiple methods of adding a routine into the mutation queue. Routines can be added by an address, symbol, markers, or lastly function lists.
- Add by Address
PERSES will attempt to calculate a function’s size by traversing the disassembly and identifying basic blocks. PERSES makes use of two lists, all the basic blocks observed and all offsets from the beginning of the function to the current instruction being parsed. The last element in the code offset list corresponds to the last element in the basic block list. Disassembly continues until a terminating instruction is hit, which may be a RET
, JMP
, or a JCC. On JCCs and JMPs, the branch destination is computed and translated into an offset, then pushed onto the list along with a new block. When a RET
instruction is hit, an offset is popped off the list and will return to the next instruction to continue analysis. This process continues until the list of offsets becomes empty, which signify that all basic blocks have been observed. Finally, all basic blocks in our list will be traversed in order to determine the block that ends at the highest address. As long as edge cases, such as functions that end with CALLs, or jump tables, are handled, this method seemed to be extremely accurate in determining function size.
- Add by Symbol
A map file parser which includes .map
files generated by IDA Pro and MSVC is included with PERSES. One can then call addRoutineBySymbol
which will take a name or pointer to a MapSymbol
. Control carries over to addRoutineByAddress
(above) to calculate the function’s size.
- Add by Marker
A file named PersesSDK.h
can be included into a project to emit a scannable pattern into code. PERSES makes use of compiler intrinsics to generate unique patterns. Beginning and end macros named PERSES_MUTATION_START()
and PERSES_MUTATION_END()
are provided.
- Function Lists
If many functions need to be mutated, a function list can be parsed via parseFunctionList
. PERSES will allow parsing of a list as long as it fits the proper formatting, which is the start and end address delimited by a colon (e.g 0x1200000:0x1200200
). Each entry must envelop one line.
Companies interested in protecting their intellectual property should invest in inhouse solutions for various reasons.
- Complete control.
- When developing an inhouse solution, multiple decisions on the logistics of the code protection can be made. Obfuscation can be done at the compiler level (see llvm-obfuscator), or like PERSES, at the binary level.
- Obscurity.
- With the rising quantity of public information regarding public protectors such as VMProtect (see VMProtect 2 - Detailed Analysis of the Virtual Machine Architecture and NoVMP), software which utilizes such protections may fall victim to public research or solutions that aim to expose critical components and flaws of the protector, or tools in regards to deobfuscating it entirely. A well developed inhouse solution will hinder many reverse engineers due to anonymity, and will require a great deal of time to reverse engineer and circumvent. Recently, Easy Anti-Cheat (EAC) has created their own solution in means of replacing VMProtect which thus far has very minimal public information regarding to it’s inner workings and architecture.
- Implementation.
- When employing commercial or public code protection engines, the user is limited to a predefined method of generating the obfuscated output with usually minimal or no customizable aspects at all. The user may have no say on complexity, size output, and output generation. In cases where these features are slightly customizable, protection engines commonly expose a predefined set of levels which are still immutable to the user, which imposes them to a selection of the best fit.
- Compatibility.
- Although commercial code protection engines can be very powerful, they commonly work on a binary level where there may be no guarantee that the original program’s logic is semantically identical in the obfuscated counterpart. This is especially significant when code protection engines translate native instructions to their custom virtual-machine based architecture. Moreover, a code protection engine may not identify the intended semantics of a group of code such as a jump table. Oreans, the developers behind Themida/Code Virtualizer, state incompatibiltiy with switch cases (which may produce jump tables).
To further demonstrate my reasoning, a fairly recent version of BattlEye’s kernel driver was released entirely with VMProtect’s virtualization completely stripped. This has allowed many to gain insight into code which may have never publicized or reverse engineered before. This release has completely vulnerated and exposed critical components of BattlEye’s protection, which may have been preventable if a customly developed protection was embedded.
Companies may find themselves in the predicament of paying hundreds of thousands or more for a contract to a thirdparty software protector, when instead they could hire a small team with said funding to develop an inhouse solution that can be used for any product they desire.
Protecting Counter Strike: Global Offensive
After a rough completion of my project, I decided it would be interesting to mutate entire (or a majority percentage) of a binary. Initially I chose CSGO’s executable, csgo.exe
, but decided it was not enough since the game runs on Source Engine, where much of the game’s code and logic are stored in libraries such as client.dll
and engine.dll
rather than the executable. I settled with mutating both the executable and engine.dll
. The engine.dll
library in Source Engine is essentially the main event loop for the entire game. It is responsible for critical elements such as running frames, networking, updating the client, running prediction, and a myriad of other components.
Bypassing Steam’s Integrity Checks
Steam by default ensures that all game files match their corresponding signature. This check happens in a function named CCrypto::RSAVerifySignature
in steamclient.dll
. To bypass these checks in the simplest manner, one can force the function to return true
by patching the mov al, bl
instruction at the function’s epilogue to mov al, 1
.
Mutations were done via function lists generated from an IDA script which provide the start and end address for a routine. Calling X86BinaryApplication::parseFunctionList
allows PERSES to build routines from every start and end address supplied. The result was over 17,000 routines in the queue to be mutated. Functions less than 24 bytes in size were skipped, which is the size of the detour code that PERSES emplaces to jump into the new function. Due to the parsing of the function list, PERSES manages to mutate all 17,057 functions in roughly two minutes. The size of the binary grew from 6.2MB to 25.8MB.
On average, the mutated version ranges from 40-200fps, while the original version is pretty stagnant at 180-300fps.
Effects on IDA Pro or Other Decompilers
When popping the mutated version into IDA Pro and waiting for auto-analysis to complete, there are very noticeable detrimental effects. Originally, IDA Pro recognizes 22,117 functions. With PERSES, IDA Pro manages to detect 8,388 functions (many of them being skipped functions which were mentioned earlier), with 6,397 being in the original .text
section.
Adverse Results for Pay2Cheat Software
For further testing, a cheating software which goes by the name of Aimware
was used to test the effects of the mutation engine against it’s initialization phase. After the loader seemed to be finalizing, the game closes instantly. Although not debugged, one can infer that the inability to signature scan the new engine.dll
for expected bytes resulted in a fatal crash or exit. The ability to read or write into the game does not matter in this context, any cheat which expects a sequence of bytes in the target binary will be completely immobilized, the original logic is skewed and without deobfuscation tools, reversing the original program’s logic becomes a slight challenge.
When considering obfuscation as a means of protecting intellectual property or critical code, commercial protection engines may not be the best answer. Inhouse solutions are ideal as they can be modified to the desired complexity, reconciled to work on any arbitrary architecture, and may be a much cheaper alternative to contracts with commercial software protections.
PERSES is definitely still a work in progress, but allowed us to view the crippling effects on cheating software which relies on a known sequence of bytes to initialize. PERSES is not perfect nor attempts to compete with or replace any established code obfuscation engine. There are many aspects which require improvements, such as a rework of the Routine
structure to work off basic blocks, the linkage and compiler, and a multitude of other components.
If you are hiring you can contact me here: notmike1k@gmail.com