Articles Cross-Platform Code Hooking by Erik van Bilsen

emailx45 · 2 Май 2020

Cross-Platform Code Hooking
July 26, 2017 Erik van Bilsen

[SHOWTOGROUPS=4,20]
Code hooks can be useful to add instrumentation and debug code, or to change the behavior of functions you don’t have the source code for. We take a look at how you can implement these in a way that works on all platforms that are currently supported by Delphi.
Hooking into existing code at run-time is frowned upon by many people since it can be used to create malicious code. But there are some legitimate uses of hooking as well. For example, in the not-too-distant future, we at Grijjy will present our cross-platform remote logging library that can be used to send log messages from any platform and view them in a log viewer on your PC. One of the features of this viewer is that it not only shows log messages, but it also provides a view of all live objects in your application as a list of class names and the current number of live instances of those classes. This information can be very useful to track down memory leaks and other memory related problems. For example, if the instance count of a certain kind of object keeps growing, but you expect it to shrink, then you may have forgotten to free an object somewhere, or you may have a reference cycle when running on an ARC platform or using object interfaces.
Instance Tracking
To implement this feature, we must somehow be able to get notified whenever an object is created or destroyed. We do this by hooking into the TObject.NewInstance and TObject.FreeInstance methods. These are the methods where memory for an object is actually (de)allocated. Inside those hooked methods, we duplicate the original implementations of these methods, and in addition update a global list of active instances. In this article, we show how we did this. It is accompanied by sample code in our Для просмотра ссылки Войди или Зарегистрируйся repository on GitHub, in the directory Для просмотра ссылки Войди или Зарегистрируйся. You will find two sample applications that show a list of running instances. One is a FireMonkey application that runs on all Platforms except Linux. The other one is a console application that works on all desktop platforms, including Linux. These are some screen shots of the end result:

Hooking Methods
There are various ways you can hook into existing code at run-time. Unfortunately, I have not found a single method that works on all platforms. So our logging library, and the sample code for this article, uses one of two methods, depending on platform.
The first method I simply call Function Hooking. This works by overwriting the existing implementation of a function with a jump to a custom implementation. Unfortunately, this method does not work an iOS and Android since those platforms don’t allow you to overwrite executable code.
The second method is called Virtual Method Table (VMT) patching. This method is more limited than the first one, but also works on iOS and Android, but interestingly enough does not work on macOS.
Function Hooking
Function hooking works by overwriting the first few bytes of a function with a JMP instruction to a new function. In Delphi pseudo-code, this would look something like this:

Here, the first line of code is replaced with a goto instruction to our hooked version. In reality, this means overwriting the first 5 bytes of the function with an assembly JMP instruction.

Since this method of code hooking only works in Intel CPU’s, we don’t have to take an ARM version into account.

You might wonder if you are even allowed to modify existing code this way. Normally, you cannot, because memory pages with executable code are read-only by default, and trying to modify them will result in an Access Violation. However, with the help of the VirtualProtect API on Windows, and the mprotect API on other (Posix) platforms, you can change the access level of those memory pages. Actually, you are only allowed to do that on Windows, macOS, iOS Simulator and Linux. For iOS and Android, we use a different approach (as presented later).
Our goal is to create a function that we can call like this:

1	HookCode(@TObject.NewInstance, @HookedObjectNewInstance);

This redirects the implementation of TObject.NewInstance to our own HookedObjectNewInstance function. This function looks like this:

1
2
3
4
5
6
7
8
9
10
11
12

function HookedObjectNewInstance(const Self: TClass): TObject;
var
Instance: Pointer;
begin
GetMem(Instance, Self.InstanceSize);
Result := Self.InitInstance(Instance);
{$IFDEF AUTOREFCOUNT}
TObjectOpener(Result).FRefCount := 1;
{$ENDIF}

TrackInstance(Result);
end;

There are a few things to note here:

TObject.NewInstance is a (non-static) class method. Like regular methods, these methods have an implicit Self parameter. But in the case of class methods, this Self parameter refers to the class, and not to the instance. This is a Delphi language feature that you don’t see much in other object-oriented programming languages, and allows for powerful features like virtual class methods (which NewInstance is). In our hooked functions, we need to make any implicit Self parameters explicit, as we did in the example above.
The majority of the implementation is just a copy of the original TObject.NewInstance method. We just need to use the Self parameter explicitly here to access its methods. Also, TObjectOpener is the common “hack” used to access protected fields and methods of a class.
The last line is were we added our custom code. In this case, it calls a TrackInstance routine which adds the object’s class to a hasp map of running instances. I will not show the implementation of this routine here, since that is outside the scope of this article. You can look it up in the sample code on GitHub. One thing to note though is that multiple threads may be creating objects at the same time, so access to the list of instances must be protected with a lock.

The hooked TObject.FreeInstance method works similarly:

1
2
3
4
5
6
7

procedure HookedObjectFreeInstance(const Self: TObject);
begin
UntrackInstance(Self);

Self.CleanupInstance;
FreeMem(Pointer(Self));
end;

First it calls UntrackInstance to remove the instance from the hash map. After that follows the original implementation of TObject.FreeInstance. Note that this is a “regular” method, so the implicit Self parameter is a TObject, not a TClass.

You may wonder if there are better ways to execute the original code, other than to copy its implementation as we did here. There are. One of the ways is by using a library like Microsoft’s Detours. Way back in 2004, I wrote an article in The Delphi Magazine that presented a Delphi version of this library. It would copy part of the original implementation to a so-called “Trampoline” function. Then you would just call that trampoline function to execute the original code. However, using a library like Detours here is overkill, since the hooked methods are small and have not changed in many years. You can find a more recent version of Для просмотра ссылки Войди или Зарегистрируйся on GitHub. Another well-known Delphi hooking library that provides similar functionality is Для просмотра ссылки Войди или Зарегистрируйся. These libraries are Windows-only though…

[/SHOWTOGROUPS]

emailx45 · 2 Май 2020

Cross-Platform Code Hooking
July 26, 2017 Erik van Bilsen

[SHOWTOGROUPS=4,20]
Hooking on Windows
As mentioned, on Windows you use the VirtualProtect API to change the access level of an executable piece of memory. We need to modify enough bytes to insert a JMP instruction. On both x86 and x64 platforms, a jump instruction takes 5 bytes: one for the opcode and four for a displacement value. The HookCode function starts by changing the access level of these 5 bytes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

const
SIZE_OF_JUMP = 5;
JMP_RELATIVE = $E9;

function HookCode(const ACodeAddress, AHookAddress: Pointer): Boolean;
var
OldProtect: DWORD;
P: PByte;
Displacement: Integer;
begin
Result := VirtualProtect(ACodeAddress, SIZE_OF_JUMP,
PAGE_EXECUTE_READWRITE, OldProtect);

if (Result) then
begin
P := ACodeAddress;
P^ := JMP_RELATIVE;
Inc(P);

Displacement := UIntPtr(AHookAddress) -
(UIntPtr(ACodeAddress) + SIZE_OF_JUMP);
PInteger(P)^ := Displacement;

VirtualProtect(ACodeAddress, SIZE_OF_JUMP, OldProtect, OldProtect);
end;
end;

The original protection level will be stored in the OldProtect variable, which will be used at the end of the routine to restore to the original level.
If the VirtualProtect API succeeds (which it always should in this case), then we patch the first byte of the original code with the opcode for the JMP instruction. Next, we calculate the number of bytes to jump from the location after the JMP instruction to our hooked function. We calculate this displacement by taking the difference between the address of our hooked function and the original code address (adjusted for the size of the jump itself).
Finally, we write this displacement value as operand to the JMP instruction and restore the original protection level. Not too complicated actually.
Hooking on macOS and Linux
The version for Posix-based operating systems (like macOS, Linux and the iOS Simulator) is similar, but it uses the mprotect API instead of VirtualProtect:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

function HookCode(const ACodeAddress, AHookAddress: Pointer): Boolean;
var
AlignedCodeAddress: UIntPtr;
P: PByte;
Displacement: Integer;
begin
AlignedCodeAddress := UIntPtr(ACodeAddress) and (not (GPageSize - 1));

Result := (mprotect(Pointer(AlignedCodeAddress), GPageSize,
PROT_READ or PROT_WRITE) = 0);

if (Result) then
begin
P := ACodeAddress;
P^ := JMP_RELATIVE;
Inc(P);

Displacement := UIntPtr(AHookAddress) -
(UIntPtr(ACodeAddress) + SIZE_OF_JUMP);
PInteger(P)^ := Displacement;
end;
end;

There are only a few differences:

Unlike VirtualProtect, mprotect works on entire memory pages. So you need to align the code address to the size of a memory page. We store the size of each memory page in the global GPageSize variable. This variable is initialized at startup with the result from a sysconf(_SC_PAGESIZE) API call. Since page sizes are always a power of two, we can simply align the memory address by and’ing it with (not (GPageSize -1)).
There is no (easy) way to query the original protection level of a memory page, so we cannot restore to that level afterwards.

VMT Patching
Function hooking does not work on iOS and Android, since we are not allowed to change the protection level of executable memory pages on those devices. However, we are allowed to change to protection level of read-only pages containing other data, such as Virtual Method Tables. But before we do that, lets first recap what VMTs are and how they are implemented.
About Virtual Method Tables
A virtual method table is simply a list of addresses to virtual methods in a class. It is used at run-time to lookup the method implementation to execute when a virtual method is called. Every class has its own VMT, and all instances of the same class share the same VMT. The VMT of a class has a complete copy of the VMT of its parent class, and optional additional entries in case the class introduces new virtual methods. The following diagram may clarify this:

This diagram shows part of the VMT for three classes. Each entry in the VMT just contains the address of the implementation (the diagram shows some made-up addresses). As you can see, each class has entries for the NewInstance and FreeInstance methods. These methods are first introduced at the TObject level, but since all classes derive from TObject, their VMTs all have entries for these methods. The TStream class introduces new virtual methods like Read and some others.
The entries for the FreeInstance method all have the same value (the made-up $00100200 address in the diagram). This means they all share the same implementation. The TStream class does not override the NewInstance method, so it has the same address ($00100100) as for TObject. However, TInterfacedObject does override the NewInstance method, so it has a different address ($00100400).
Limitations of VMT Patching
Our goal is to patch the VMTs with the addresses of our hooked NewInstance and FreeInstance methods. This provides several challenges:

Since each class has its own VMT, we need to patch the VMTs of all classes we care about. It does not suffice to just patch the VMT of the TObject class!
Some classes may have overridden the NewInstance and/or FreeInstance methods. In that case, we would need different hook versions of these methods since their implementations will be different.
And obviously, VMT patching only works for virtual methods. You cannot use it to hook global functions or non-virtual methods.

The second problem can be addressed by simply ignoring all classes that have overridden versions of NewInstance and/or FreeInstance. Fortunately, there are only very few classes that do this, so this should not have a big impact. The exception is TInterfacedObject. As you can see from the diagram, this class has an overridden version of NewInstance. Since TInterfacedObject is widely used as a base class, we want to include this class in our metrics, and so we create a separate hook function for its NewInstance method:

1
2
3
4
5
6
7
8
9
10

function HookedInterfacedObjectNewInstance(const Self: TClass): TObject;
var
Instance: Pointer;
begin
GetMem(Instance, Self.InstanceSize);
Result := Self.InitInstance(Instance);
TInterfacedObjectOpener(Result).FRefCount := 1;

TrackInstance(Result);
end;

The only difference with TObject.NewInstance, is that it always has a FRefCount field that must be initialized (at the TObject level, this field only exists on ARC platforms).
The first issue can be addressed with some RTTI.
Listing All Classes
So we need a list of all classes so we can patch their VMTs. We can use Delphi’s Run Time Type Information (RTTI) tools to help with this. The TRttiContext.GetTypes method returns an array of all linked types that have RTTI. For each type, we can check if it is a class type, and if so, patch its VMT.

Unfortunately, not all classes will have RTTI. In particular, “private” classes that are declared in the implementation section of a unit will not have RTTI. Fortunately, most classes we care about do have RTTI, so this shouldn’t be too much of a problem.

In our sample app, enumerating all classes and patching their VMTs is performed in the InitializeVMTHooks procedure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

procedure InitializeVMTHooks;
var
Rtti: TRttiContext;
RttiType: TRttiType;
InstanceType: TRttiInstanceType;
VMTEntryNewInstance, VMTEntryFreeInstance: PPointer;
ObjectNewInstance, ObjectFreeInstance,
InterfacedObjectNewInstance: Pointer;
begin
ObjectNewInstance := @TObject.NewInstance;
ObjectFreeInstance := @TObject.FreeInstance;
InterfacedObjectNewInstance := @TInterfacedObject.NewInstance;

{ Get a list of all Delphi types in the application with RTTI support. }
Rtti := TRttiContext.Create;
for RttiType in Rtti.GetTypes do
begin
{ Check if the type is a class type. }
if (RttiType.TypeKind = tkClass) then
begin
{ We can now safely typecase to TRttiInstanceType }
InstanceType := TRttiInstanceType(RttiType);

{ Retrieve the entry in the VMT of the FreeInstance method for
this class. }
VMTEntryFreeInstance := PPointer(
PByte(InstanceType.MetaclassType) + vmtFreeInstance);

{ Only track classes that didn't override TObject.FreeInstance. }
if (VMTEntryFreeInstance^ = ObjectFreeInstance) then
begin
{ Retrieve the entry in the VMT of the NewInstance method for
this class. }
VMTEntryNewInstance := PPointer(
PByte(InstanceType.MetaclassType) + vmtNewInstance);

{ Only track classes that didn't override TObject.NewInstance or
TInterfacedObject.NewInstance. }
if (VMTEntryNewInstance^ = ObjectNewInstance) then
begin
{ This class uses NewInstance and FreeInstance from TObject.
Hook those VMT entries. }
HookVMT(VMTEntryNewInstance, @HookedObjectNewInstance);
HookVMT(VMTEntryFreeInstance, @HookedObjectFreeInstance);
end
else if (VMTEntryNewInstance^ = InterfacedObjectNewInstance) then
begin
{ This class is (ultimately) derived from TInterfacedObject, so
we need to hook to a separate version of NewInstance. }
HookVMT(VMTEntryNewInstance, @HookedInterfacedObjectNewInstance);
HookVMT(VMTEntryFreeInstance, @HookedObjectFreeInstance);
end;
end;
end;
end;
end;

First, we retrieve the code addresses of the original NewInstance and FreeInstance methods. We use these later to check if these methods are overridden by a certain class.
Then we enumerate all types and look for class types. We can safely typecast those types to TRttiInstanceType. Its MetclassType property is used to get the TClass for the class type. In reality, a TClass is just a pointer to its VMT. This means we can find the entries for the NewInstance and FreeInstance methods by adjusting this pointer with a vmtNewInstance or vmtFreeInstance offset. These are constants declared in the System unit. If you want to hook virtual methods for which you don’t have a vmt* constant, then you can use the TRttiMethod.VirtualIndex property to calculate the offset.
Next, we check if the contents of these VMT entries match the addresses of TObject.NewInstance (or TInterfacedObject.NewInstance) and TObject.FreeInstance. If not, then the class has overridden one or both of these methods and we skip it. If so, then we patch its VMT by calling the HookVMT function. Since this function has to change the protection level of the memory page containing the VMT, we need different implementations for Windows and non-Windows systems.
VMT Patching on Windows
On Windows, we need to use the VirtualProtect API again. After that, patching the VMT is just a matter of changing the value of a VMT entry:

1
2
3
4
5
6
7
8
9
10
11
12
13
14

function HookVMT(const AVMTEntry, AHookAddress: Pointer): Boolean;
var
OldProtect: DWORD;
begin
Result := VirtualProtect(AVMTEntry, SizeOf(Pointer),
PAGE_READWRITE, OldProtect);

if (Result) then
begin
PPointer(AVMTEntry)^ := AHookAddress;

VirtualProtect(AVMTEntry, SizeOf(Pointer), OldProtect, OldProtect);
end;
end;

Not much to it.
VMT Patching on Posix
On all other systems (macOS, iOS, Android and Linux), we need to use the mprotect API instead and make sure we align to full memory pages again:

1
2
3
4
5
6
7
8
9
10
11
12

function HookVMT(const AVMTEntry, AHookAddress: Pointer): Boolean;
var
AlignedCodeAddress: UIntPtr;
begin
AlignedCodeAddress := UIntPtr(AVMTEntry) and (not (GPageSize - 1));

Result := (mprotect(Pointer(AlignedCodeAddress), GPageSize,
PROT_READ or PROT_WRITE) = 0);

if (Result) then
PPointer(AVMTEntry)^ := AHookAddress;
end;

Wrapping Up
I hope this article showed a legitimate use case for code hooking. I would suggest to only use code hooking for debugging and instrumenting purposes. In our upcoming remote logging library, code hooking will only be enabled on-demand, and only in DEBUG builds. So it will not have any effect on release builds at all.

[/SHOWTOGROUPS]

Articles Cross-Platform Code Hooking by Erik van Bilsen

emailx45

emailx45

Похожие темы