Virtual Memory - Intro to Paging Tables
Download link: PTM
Introduction
Virtual memory is probably one of the most interesting topics of modern computer science. Although virtual memory was originally designed back when physical memory was not an abundant resource to allow the use of disk space as ram, it has stuck with us, offering security, modularity, and flexibility. Unlike the rest of the content on my sites which is bound to an operating system, virtual memory is really a CPU level concept. Although virtual memory is well documented in the intel manual, applying such knowledge to windows can be tricky due to the fact there isn’t a single documented way to interface with paging tables on windows. In this series of write ups I will be discussing methods to manipulate paging tables and detect such manipulations. Although there are many different types of paging table systems, I will only be discussing the standard four layer paging table system used in all modern AMD and Intel 64 bit CPU’s.
Terms
- CR3 = Control Register 3
- PFN = Page Frame Number
- PML4 = Page Map Level 4
- PDPT = Page Directory Page Table
- PD = Page Directory
- PT = Page Table
- Context/Memory Context = In this write up I’m referring to a processes memory range and subsequent paging tables associated with a processes.
Paging
To begin, what are paging tables? Paging tables are a set of tables used to translate linear virtual addresses to linear physical addresses. You can imagine every virtual address you have ever interfaced with as a key, which is used to unlock the path to the backing physical memory. The term “paging table” implies paging, a concept which breaks memory down to a predetermined size called a “page”. Typically pages are 0x1000 bytes (4096). This number is not arbitrary, in fact quite the opposite. If you break down what a virtual address is, you will begin to see that there is an extremely complex system underneath every virtual address. Let’s take a look at what a virtual address really is.
As shown in the figure above, a virtual address is indeed a key, just like a normal key that has a series of riggies that align with its subsequent pinning. Note that the last part of a virtual address is its page offset. This page offset is 12 bits wide (2 ^ 12 = 0x1000) the same size as a page itself. It may seem strange that the top sixteen bits are never used, this is true, in reality we have a 48 bit address range per process. A structure that represents a linear virtual address can be seen below.
typedef union _virt_addr_t
{
void* value;
struct
{
std::uint64_t offset : 12;
std::uint64_t pt_index : 9;
std::uint64_t pd_index : 9;
std::uint64_t pdpt_index : 9;
std::uint64_t pml4_index : 9;
std::uint64_t reserved : 16;
};
} virt_addr_t, *pvirt_addr_t;
Now that you understand why the standard granularity of a page is 4096 bytes, what happens if we need a large page? What even is a “large page”? Well simply put, instead of using a range of bits for indexing into a page table, that range of the virtual address is used to index into the page itself. Take a look at this example below.
This idea of using a range of bits in a virtual address to represent a page index instead of a table index can be applied all the way to the pdpt index bit range. Now that you understand the very basics of how pages work and the sizes of pages lets talk about the paging tables themselves.
Accessing Paging Tables
Before we begin talking about what the paging tables contain and where they are at in memory, let’s talk about how we access the top most layer of paging tables. This top most layer of paging is called the PML4 (page map level 4). Due to spectra/meltdown there can be two of these tables per process (not both at the same time). One of these tables contains all of the usermode and kernel mode mappings, the second just contains usermode mappings. In windows there is a structure called KPROCESS with a substructure called EPROCESS. If you open windbg and type “dt !_KPROCESS” you will see that the third field in this structure is a PFN to that processes PML4. So uh? What’s a PFN you are probably asking. A PFN (Page Frame Number) is simply a physical address that has the last 12 bits lopped off. In other words, the physical address is rounded to the granularity of one page (0x1000) and thus the last 12 bits are not needed to be presented. In order to restore a PFN back to a linear physical address one simply has to bit shift the value to the left by twelve or whatever the size of the page is in bits. (PFN « 12 for a standard 4096 byte page).
This is one way of obtaining the physical address of a processes PML4, another such method would be reading CR3. By using MSVC intrinsics you can read the content out of CR3 and cast it to a bit field. Have a look:
typedef union _cr3
{
std::uint64_t flags;
struct
{
std::uint64_t reserved1 : 3;
std::uint64_t page_level_write_through : 1;
std::uint64_t page_level_cache_disable : 1;
std::uint64_t reserved2 : 7;
std::uint64_t dirbase : 36;
std::uint64_t reserved3 : 16;
};
} cr3;
// getting the physical address of the current contexts pml4...
const auto cr3 = CR3{ __readcr3() };
const auto pml4 = cr3.dirbase << 12;
Now that you know a few ways to get the linear physical address of a processes PML4 how does one interface with this physical memory? MmMapIoSpace, ZwMapViewOfSection and even MmCopyMemory all cannot be used to interface with the paging tables on newer versions of windows. Although you cannot map these physical pages into virtual memory yourself, the paging tables are actually already mapped into the kernel. This mapping range is a section of what is called “hyperspace” and contains all of the paging tables for the current context. In short you can take the linear physical address of your current contexts PML4 and call MmGetVirtualForPhysical with it. So cool, we have the virtual address of our current contexts PML4 now what? Well let’s take a look at what a PML4 actually contains and what information inside of these tables we can use to get to the next level of the paging table system.
typedef union _pml4e
{
std::uint64_t value;
struct
{
std::uint64_t present : 1; // Must be 1, region invalid if 0.
std::uint64_t rw: 1; // If 0, writes not allowed.
std::uint64_t user_supervisor : 1; // If 0, user-mode accesses not allowed.
std::uint64_t page_write_through: 1; // Determines the memory type used to access PDPT.
std::uint64_t page_cache : 1; // Determines the memory type used to access PDPT.
std::uint64_t accessed : 1; // If 0, this entry has not been used for translation.
std::uint64_t Ignored1 : 1;
std::uint64_t page_size : 1; // Must be 0 for PML4E.
std::uint64_t Ignored2 : 4;
std::uint64_t pfn : 36; // The page frame number of the PDPT of this PML4E.
std::uint64_t reserved: 4;
std::uint64_t Ignored3 : 11;
std::uint64_t nx : 1; // If 1, instruction fetches not allowed.
};
} pml4e, *ppml4e;
As described above, a PML4E (page map level 4 entry) contains a PFN for the next layer of address translation (PDPT). Paging table entries for PML4, PDPT, and PD are very similar in that their bit fields share the same position, PTE’s contain a few new bit fields.
So now you understand how to traverse paging tables using a combination of MmGetVirtualForPhysical and bit shifting PFN’s. Let’s take a step back and look at linear address translation from a birds eye view. This should help solidify your understanding of how linear virtual addresses are translated to physical memory.
The next layer in address translation is in the PDPT (Page Directory Pointer Table). As stated prior this paging table is the same size as the rest of the tables 0x1000 bytes (4096). PDPTE (entries) are eight bytes wide just like every other paging table entry. Every entry contains a PFN which is used to get to the next layer of address translation. The structure for a PDPTE is the following:
typedef union _pdpte
{
std::uint64_t value;
struct
{
std::uint64_t present : 1; // Must be 1, region invalid if 0.
std::uint64_t rw : 1; // If 0, writes not allowed.
std::uint64_t user_supervisor : 1; // If 0, user-mode accesses not allowed.
std::uint64_t page_write: 1; // Determines the memory type used to access PD.
std::uint64_t page_cache : 1; // Determines the memory type used to access PD.
std::uint64_t accessed : 1; // If 0, this entry has not been used for translation.
std::uint64_t Ignored1 : 1;
std::uint64_t page_size : 1; // If 1, this entry maps a 1GB page.
std::uint64_t Ignored2 : 4;
std::uint64_t pfn : 36; // The page frame number of the PD of this PDPTE.
std::uint64_t reserved: 4;
std::uint64_t Ignored3 : 11;
std::uint64_t nx : 1; // If 1, instruction fetches not allowed.
};
} pdpte, * ppdpte;
Note that the “page_size” bit field can be used to create a 1GB page. Described in section one of this write up all sections of the virtual address up the PDPT index would be used to index into such a page. The next layer in linear address translation is the PD (Page Directory), the structure for this page table entry is the same as the PML4E and PDPTE.
typedef union _pde
{
std::uint64_t value;
struct
{
std::uint64_t present : 1; // Must be 1, region invalid if 0.
std::uint64_t rw : 1; // If 0, writes not allowed.
std::uint64_t user_supervisor : 1; // If 0, user-mode accesses not allowed.
std::uint64_t page_write: 1; // Determines the memory type used to access PD.
std::uint64_t page_cache : 1; // Determines the memory type used to access PD.
std::uint64_t accessed : 1; // If 0, this entry has not been used for translation.
std::uint64_t Ignored1 : 1;
std::uint64_t page_size : 1; // If 1, this entry maps a 1GB page.
std::uint64_t Ignored2 : 4;
std::uint64_t pfn : 36; // The page frame number of the PD of this PDPTE.
std::uint64_t reserved: 4;
std::uint64_t Ignored3 : 11;
std::uint64_t nx : 1; // If 1, instruction fetches not allowed.
};
} pde, * ppde;
This paging table entry points to a PT (Page Table), this entry can also be used to make 2MB pages. PDE’s share the same structure as PML4E’s and PDPTE’s. The structure for a PTE can be viewed below.
typedef union _pte
{
std::uint64_t value;
struct
{
std::uint64_t present : 1; // Must be 1, region invalid if 0.
std::uint64_t rw : 1; // If 0, writes not allowed.
std::uint64_t user_supervisor : 1; // If 0, user-mode accesses not allowed.
std::uint64_t page_write: 1; // Determines the memory type used to access the memory.
std::uint64_t page_cache : 1; // Determines the memory type used to access the memory.
std::uint64_t accessed : 1; // If 0, this entry has not been used for translation.
std::uint64_t dirty: 1; // If 0, the memory backing this page has not been written to.
std::uint64_t page_access_type: 1; // Determines the memory type used to access the memory.
std::uint64_t global: 1; // If 1 and the PGE bit of CR4 is set, translations are global.
std::uint64_t ignored2 : 3;
std::uint64_t pfn : 36; // The page frame number of the backing physical page.
std::uint64_t reserved : 4;
std::uint64_t ignored3 : 7;
std::uint64_t protect_key: 4; // If the PKE bit of CR4 is set, determines the protection key.
std::uint64_t nx : 1; // If 1, instruction fetches not allowed.
};
} pte, * ppte;
The PTE (Page Table Entry) is the only paging table entry that is unique to the rest. There are a few new bit fields that were not present in the other paging table entries. Notably “protect_key”, “global”, and “dirty”. If you are interested in the use of these fields please refer to the intel manual.
Self Referencing & PTE Backdoor
Let’s suppose a paging table entry points to itself. What would happen? Well in short the same page would be used twice for address translation. This can be very useful to quickly access the paging tables themselves by just accessing a specific address. If a paging table points to itself, you can set your page index bits to that index on each layer you need thus allowing you to read/write another layer of paging tables. An example can be shown below.
Let’s say that the address above was instead composed of 0xFF as the PML4 index and 0xFF as the PDPT index. The same page would be referred to as a PD for the next layer of address translation. This is a pretty advanced topic of paging tables but It’s not too difficult to understand. Let’s take this one step farther now. What would happen if a PTE pointed to its PT? Well the virtual address would point to its own PT. This is very interesting because a virtual address could change where it points itself. Furthermore, a more applicable scenario would be to point a virtual address at another page’s PT thus allowing paging table manipulation just by accessing that page. This is the basis of an experimental library I have created that allows paging table manipulation from usermode. As described below by changing a page’s PTE to point to another page’s PT we can traverse all physical memory we want simply by accessing the manipulated page. I refer to the page that manipulates the other pages PT as the “cursor” and the page used to access the physical memory as the “accessor”. Using these two pages one can traverse and manipulate any paging tables for any context.
Experimental TLB bypass
Although changing where a page points to in physical memory is possible the CPU caches the paging tables in what is called the TLB, translation look-aside buffer. This cache is what the CPU interfaces with to translate linear virtual addresses to linear physical addresses. The TLB is usually split between pages that contain instructions and pages that contain read/write memory. Whenever a paging table entry is changed, flushing of this cache is needed to see the changes. Flushing CPU caches requires CPL 0 level privilege thus getting around this cache can be tricky. Some of the ways a TLB can be flushed are the following: exceptions, context change, and purposely flushing such buffers. Although these are the only documented ways of flushing the TLB I have thought of another possible way of “bypassing” this cache. Preventing such caching can be difficult and will not work on most AMD systems since they have very aggressive caching techniques. My idea was to “out run” the TLB. The TLB on some systems will only cache linear address translations that have been translated, this means if one were to say, access a new page that has never been accessed before that linear address translation should not be cached. Thus my idea of using self referencing paging tables to generate a new unique address every time one would want to see the effect of a new paging table entry/edit.
As seen above, using a combination of PTE back dooring and self referencing paging tables, we can generate a unique arrangement of paging table entries everytime we want to access a physical page. There are 508 ^ 4 possible combinations if you use a self referencing PML4, which is enough to “out run” the TLB. In other words once we have accessed 508 ^ 4 addresses the TLB would have already flushed out the first arrangements. Although this idea of out running the TLB works on intel chipsets, most AMD chipsets this will not work on. AMD caches entire paging tables and thus creating four tables would be required instead of just one self referencing paging table.
Conclusion
Paging tables are a very complex CPU level concept that can be difficult to interface with on windows. I have omitted a lot of detailed information about possible bugs and problems, those can be addressed on the git repo. If you are interested in learning more about paging tables please find your way to the intel manual. Thank you for reading.