Download link: Reverse Injector

Table Of Contents


Introduction - Virtual Memory & Page Tables


Each process on the Windows operating system has its own virtual address space. Virtual address spaces are created and described by page tables. The top most layer of translation is through the Page Map Level 4 or PML4 for short. Each entry in a PML4 points to another page table, to the next layer of translation, a Page Directory Page Table or PDPT for short. PDPTE’s point to Page Directories or PD’s, and PD’s point to Page Tables, or PT. There are four levels of address translation in IA-32e Long Mode. Each translation layer can map f(x) = (512^x) * 8 bytes, where x is the translation table layer. PT = 1; PD = 2; PDPT = 3; PML4 = 4

Four Level Page Table Graph

Now consider the following: what would happen if you change the properties of a PDPTE. What would happen if other PML4’s contained entries which pointed to the PDPT? Would the effects be seen in the other address space? If you said yes then you are correct.

Note: If you would like to learn more about virtual memory and page tables I have written a post about page tables and a library to manage them from usermode.

TLB - Translation Lookaside Buffer

The Translation Lookaside Buffer or TLB for short, caches linear address translations, page tables, and page protections. The TLB is split into two sections: the iTLB, or instruction translation lookaside buffer, and the data translation lookaside buffer, or dTLB.

When page table entries change the TLB requires the programmer to invalidate entries via INVLPG, which invalidates all entries pertaining to the given virtual address, or INVPCID. Entire page tables can be cached as well in a page-structure cache.

Note: if you would like to read more about the translation lookaside buffer and other page table caching mechanisms I have linked to a few, extremely well written resources here.

Reverse Injector - Merging Address Spaces


As described above in the introduction, page table entries describe virtual memory. At the highest level of translation, is the PML4. The PML4 along with all other page tables contain up to 512 entries. The first 256 PML4 entries are reserved for usermode usage and the top 256 are reserved for the kernel. Since the kernel is globally mapped into all address spaces there is no need to merge the top 256 entries. The bottom 256 PML4E’s are what map all code, data, and stacks. The 256th PML4E maps modules such as ntdll.dll, and other loaded modules. As you can foresee, some PML4E index’s overlap. In order to handle overlapping PML4E’s, Reverse Injector simply finds empty PML4E’s and inserts the remote PML4E’s into them. An std::map is constructed to map remote PML4E indexes to local inserted PML4E indexes.

Remapping PML4 Entries

     Local PML4                                Remote PML4
┌────────────────────┐                     ┌────────────────────┐
│ 256 PML4E          │                     │  512 PML4E         │
├────────────────────┤                     ├────────────────────┤
│                    │                     │                    │
├────────────────────┤                     ├────────────────────┤
│                    │                     │                    │
├────────────────────┤                     ├────────────────────┤
│ ...................│                     │                    │
├────────────────────┤                     ├────────────────────┤
│ 180 PML4E          │◄──────────┐         │                    │
├────────────────────┤           │         ├────────────────────┤
│                    │           │         │                    │
├────────────────────┤           │         ├────────────────────┤
│                    │           │         │ ...................│
├────────────────────┤           │         ├────────────────────┤
│                    │           └─────────┤ 256 PML4E          │
├────────────────────┤                     ├────────────────────┤
│                    │                     │ ...................│
├────────────────────┤                     ├────────────────────┤
│                    │        ┌────────────┤ 106 PML4E          │
├────────────────────┤        │            ├────────────────────┤
│                    │        │            │                    │
├────────────────────┤        │            ├────────────────────┤
│ 160 PML4E          │◄───────┘            │                    │
├────────────────────┤                     ├────────────────────┤
│                    │                     │                    │
├────────────────────┤                     ├────────────────────┤
│                    │                     │                    │
└────────────────────┘                     └────────────────────┘

32bit Processes - Support, Limitations, Information

32bit processes have code segments which are marked to contain 32bit protected mode code. The processor will execute the code in these segments in compatibility mode. All stack, heap, and code are mapped within a 32bit address space, or 4gb. As described in the beginning of this post, page table entries map f(x) = 512^x * 8 bytes where x = translation level. Thus PDPTE’s can map up to 1gb of memory. Four PDPTE’s, index zero through three, are used to map the entire 32bit address space. Although the processor will execute code in 32bit processes in compatibility mode, a four layer page table configuration is still used. A PML4E at index zero is always used to point to a PDPT containing PDPTE’s which map all code, stack, and data for a 32bit process. The 64bit kernel is mapped into the top 256 PML4E’s just like every other process and there is other code which is mapped into 32bit processes to handle syscalls into a 64bit kernel. There is no way to merge a 32bit process with another 32bit process using reverse injector. Doing so would require extra segments and changing segment selectors. However, Reverse Injector should be able to merge a 32bit processes address space into a 64bit processes address space. Note that the 32bit code mapped cannot be executed as stack alignment and segmentation is handled differently in 64bit processes. Reading and writing to the stack, heap and code however is possible.

Remapping 32bit Processes

     Local PML4                                Remote PML4
┌────────────────────┐                     ┌────────────────────┐
│ 256 PML4E          │                     │  512 PML4E         │
├────────────────────┤                     ├────────────────────┤
│                    │                     │                    │
├────────────────────┤                     ├────────────────────┤
│                    │                     │                    │
├────────────────────┤                     ├────────────────────┤
│                    │                     │                    │
├────────────────────┤                     ├────────────────────┤
│                    │                     │                    │
├────────────────────┤                     ├────────────────────┤
│                    │                     │                    │
├────────────────────┤                     ├────────────────────┤
│                    │                     │ ...................│
├────────────────────┤                     ├────────────────────┤
│                    │                     │                    │
├────────────────────┤                     ├────────────────────┤
│                    │                     │                    │
├────────────────────┤                     ├────────────────────┤
│....................│                     │                    │
├────────────────────┤ (code, stack, heap) ├────────────────────┤
│ 65 PML4E           │◄────────┐           │                    │
├────────────────────┤         │           ├────────────────────┤
│                    │         │           │                    │
├────────────────────┤         │           ├────────────────────┤
│                    │         │           │                    │
├────────────────────┤         │           ├────────────────────┤
│                    │         └───────────┤ 0 PML4E.           │
└────────────────────┘                     └────────────────────┘

VirtualProtect - Page Protections and WinAPI’s

VirtualProtect is a well known WinAPI used to change page protections. This high level WinAPI is a wrapper for NtProtectVirtualMemory, which then calls other non-exported routines in ntoskrnl. The call stack roughly depicted in figure x shows that VirtualProtect will refer to the current processes VAD. Reverse Injector does not create VAD entries for the mapped memory of the remote process. Thus changing page protections using VirtualProtect will always fail. Direct PTE manipulation is required in order for page protections to be changed. This is possible with a self referencing page table entry which can be usermode accessible.

nt!MiObtainReferencedVadEx <--- no VAD entries for reverse injected memory
nt!MmProtectVirtualMemory+0x175
nt!NtProtectVirtualMemory+0x1bf

Example - Translating Remote Module Base Addresses

This example will loop over every loaded module in the remote process by translating the address of the remote processes PEB to a local address, then it will loop over in load order module list to find the desired modules base.

auto get_module_base(vdm::vdm_ctx* v_ctx, nasa::injector_ctx* rinjector,
    std::uint32_t pid, const wchar_t* module_name) -> std::uintptr_t
{
    const auto ppeb =
        reinterpret_cast<PPEB>(
            rinjector->translate(
                reinterpret_cast<std::uintptr_t>(v_ctx->get_peb(pid))));

    const auto ldr_data =
        reinterpret_cast<PPEB_LDR_DATA>(
            rinjector->translate(reinterpret_cast<std::uintptr_t>(ppeb->Ldr)));

    auto current_entry =
        reinterpret_cast<LIST_ENTRY*>(
            rinjector->translate(reinterpret_cast<std::uintptr_t>(
                ldr_data->InMemoryOrderModuleList.Flink)));
    // ....

After obtaining the local address of the PEB via injector_ctx::translate, the virtual address can be directly dereferenced. This is repeated on the PEB_LDR_DATA structure pointer. The first entry is then used to compare in the while loop condition.

    //....
    while (current_entry != &ldr_data->InMemoryOrderModuleList)
    {
        const auto current_entry_data =
            reinterpret_cast<PLDR_DATA_TABLE_ENTRY>(
                reinterpret_cast<std::uintptr_t>(current_entry) - sizeof LIST_ENTRY);

        const auto entry_module_name =
            reinterpret_cast<const wchar_t*>(
                rinjector->translate(
                    reinterpret_cast<std::uintptr_t>(
                        reinterpret_cast<PUNICODE_STRING>(
                            reinterpret_cast<std::uintptr_t>(
                                &current_entry_data->FullDllName) + sizeof UNICODE_STRING)->Buffer)));

        if (!_wcsicmp(entry_module_name, module_name))
            return rinjector->translate(
                reinterpret_cast<std::uintptr_t>(
                    current_entry_data->DllBase));

        current_entry = reinterpret_cast<LIST_ENTRY*>(
            rinjector->translate(reinterpret_cast<std::uintptr_t>(current_entry->Flink)));
    }

    return {};
}

An example usage of this routine is displayed below.

ptm::ptm_ctx target_proc(&vdm, std::atoi(argv[2]));
nasa::injector_ctx injector(&my_proc, &target_proc);

if (!injector.init())
{
    std::printf("[!] failed to init injector_ctx...\n");
    return -1;
}

const auto ntdll_base =
get_module_base(&vdm, &injector,
    std::atoi(argv[2]), L"ntdll.dll");

std::printf("[+] ntdll reverse injected base -> 0x%p\n", ntdll_base);
std::printf("[+] ntdll reverse injected MZ -> 0x%p\n", *(short*)ntdll_base);
std::printf("[+] press any key to close...\n");

The result of the above code is displayed below.

[+] ntdll reverse injected base -> 0x0000337ED8CF0000
[+] ntdll reverse injected MZ -> 0x0000000000005A4D
[+] press any key to close...

Please note that all memory, not just loaded modules are mapped into the local address space.

Credit, Resources, External Links