Addressing pointers by Rudy Velthuis
Rudy Velthuis
Rudy Velthuis
[SHOWTOGROUPS=4,20]
But pointers are important. Even in languages that do not support pointers explicitly, or which make it hard to use pointers, pointers are important factors behind the scenes. I think it is very important to understand them. There are different approaches to understanding pointers.
This article was written for everyone with problems understanding or using pointers. It discusses my working view on pointers in Delphi for Win32, which may not be entirely accurate in all aspects (for instance, memory for one program is not one big block, but for most practical purposes, it helps to pretend it is). This way, pointers are easiest to understand, in my opinion.
Memory
You probably already know what I write in this paragraph, but it is probably good to read it anyway, since it shows my view on things, which may differ a bit from your own.
Pointers are variables which are used to point to other variables. To explain them, it is necessary to understand the concept of a memory address and the concept of a variable. To do this, I’ll first have to roughly explain computer memory.
In short, computer memory can be seen as one very long row of bytes. A byte is a small storage unit that can contain 256 separate values (0 up to 255). In current 32 bit Delphi, memory can (with a few exceptions) be seen as an array of maximum 2 gigabytes in size (231 bytes).
What these bytes contain, depends on how the contents are interpreted, i.e. how they are used. The value of 97 can mean a byte of value 97, as well as the character 'a'. If you combine more than one byte, you can store much larger values. In 2 bytes you can store 256*256 different values, etc.
The bytes in memory can be addressed by numbering them, starting at 0, and up to 2147483647 (assuming you have 2 gigabyte — and even if you don’t have them, Windows will try to make it look as if you have them). The index of a byte in this huge array is called its address.
One could also say: a byte is the smallest addressable piece of memory.
In reality, memory is a lot more complex. There are for instance computers with bytes that are not 8 bit, which means they can contain fewer or more than 256 values, but not the computers on which Delphi for Win32 runs.
Memory is managed in hardware and software, and not all memory is really existent (memory managers take care that your program doesn’t notice, though, by swapping parts of memory out to or in from harddisk), but for this article, it helps to see memory as one huge block of single bytes, divided up to be used for several programs.
Variables
A variable is a location made up of one or more bytes in this huge “array”, from which you can read or to which you can write. It is identified by its name, but also by its type, its value and its address.
If you declare a variable, the compiler reserves a piece of memory of the appropriate size. Where this variable is stored is decided by the compiler and the runtime code. You should never make assumptions about where exactly a variable will be located.
The type of the variable defines how the memory location is used. It defines its size, i.e. how many bytes it occupies, but also its structure. For instance, the following shows a diagram of a piece of memory. It shows 4 bytes starting at address $00012344. The bytes contain the values $4D, $65, $6D and $00, respectively.
Note that although I use addresses like $00012344 in most of the diagrams, these are completely made up, and only used to distinguish different memory locations. They do not reflect the real memory addresses, since these depend on many things, and can not be predicted.
The type decides how these bytes are used. It can for instance be an Integer with value 7169357 ($006D654D), or an array[0..3] of AnsiChar, forming the C-style string 'Mem', or something else, like a set variable, a number of single bytes, a small record, a Single, part of a Double, etc.. In other words, the meaning of a piece of memory is not known before you know the type or types of the variable or variables stored there.
The address of a variable is the address of its first byte. In the diagram above, assuming this shows a variable of type Integer, its address is $00012344.
Uninitialized variables
The memory for variables can be reused. The memory set aside for variables is usually only reserved as long as the program can access them. E.g., local variables of a function or procedure (I like to call both routines) are only valid as long as the routine is running. Fields of an object (which are also variables) are also only valid as long as the object "exists".
If you declare a variable, the compiler reserves the required number of bytes for that variable. But the contents may well be what was already put in these bytes before, when they were used in another function or procedure. In other words, the value of an uninitialized variable is undefined (but not necessarily undetermined). An example is given in the form of this simple console program:
The first value displayed (the value of the uninitialized variable A) depends on the already existing content of the memory location reserved for A. In my case, it displays the value 2147319808 ($7FFD8000) each time, but this can be totally different on your computer. The value is undefined, because it was not initialized. In a more complex program, especially — but not only — when pointers are concerned, this is is a frequent cause of program crashes or unexpected results. The assignment initializes A with the value 12345 ($00003039), so that is the second value displayed.
Pointers
Pointers are also variables. But they do not contain numbers or characters, they contain the address of a memory location instead. If you see memory as an array, a pointer can be seen as an entry in the array which contains the index of another entry in the array.
Say I have the following declaration and initialisation:
Let’s assume it results in the following memory layout:
Now, after this code, assuming P is a pointer,
I have the following situation:
In the previous diagrams, I always showed each byte. This is generally not necessary, so the above could just as well be shown as:
This does not reflect the actual size anymore (C looks just as big as I or J), but it is good enough to understand what is going on with pointers.
Nil
Nil never points to valid memory, but since it is one well defined value, many routines can test for it (e.g. using the the Assigned() function). One can not test if any other value is valid. Stale or uninitialized pointers look no different than valid pointers (see below). There is no way to distinguish them. Program logic must always ensure that a pointer is either valid, or nil.
In Delphi, nil has the value 0, i.e. it points to the very first byte in memory. This is apparently a byte that will never be accessed by Delphi code. But you should generally not rely on nil being 0, unless you are fully aware of what is going on behind the scenes. The value of nil could change in a later version, for one reason or other.
Typed pointers
In the simple example above, P is of type Pointer. This means that P contains an address, but you don’t know what the variable at that address is supposed to contain. That is why pointers are usually typed, i.e. the pointer is interpreted to be pointing to a memory location that is supposed to contain a certain type.
Let’s assume we have another pointer, Q:
Q is of type ^Integer, which should be read as "pointer to Integer" (I was told that ^Integer stands for ↑Integer). This means that it is not an Integer, but points to a memory location, which is to be used as one, instead. If you assign the address of J to Q, using the @ address operator or the functionally equivalent Addr pseudo-function,
then Q points to the location at address $00012348 (it references the memory location identified by J). But since Q is a typed pointer, the compiler will treat the memory location to which Q points as an Integer. Integer is the base type of Q.
Although you will hardly ever see the Addr pseudo-function being used, it is equivalent to @. @ has the disadvantage that, when applied to complicated expressions, it is not always obvious to which part the operator applies. Addr, using the syntax of a function, is much less ambiguous, since the target is enclosed in parentheses ():
Assignment using a pointer is a bit different than direct assignment to a variable. Generally, you only have the pointer to go by. If you assign to a normal variable, you write something like:
That stores the integer 98765 (hex $000181CD) in the memory location. But to access the memory location using Q, you must work indirectly, using the ^ operator:
This is called dereferencing. You must follow the imaginary "arrow" to the location to which Q points (in other words, the Integer at address $00012348) and store it there.
For records, the syntax allows you to omit the ^ operator, if the code is unambiguous without it. For clarity reasons, I personally always write it, though.
It is generally useful to define types for the pointers one is using. For instance, ^Integer is not a valid parameter type declaration, so you’ll have to predefine a type:
In fact, the PInteger type and some other common pointer types are already defined in the Delphi runtime library (e.g. units System and SysUtils). It is custom to start the names of pointer types with the capital letter P followed by the type to which they point. If the base type is prefixed with a capital T, the T is usually omitted. Examples:
[/SHOWTOGROUPS]
Pointers are probably among the most misunderstood and most feared data types. That is why many programmers love to avoid them.Pointers are like jumps, leading wildly from one part of the data structure to another. Their introduction into high-level languages has been a step backwards from which we may never recover. — Anthony Hoare
But pointers are important. Even in languages that do not support pointers explicitly, or which make it hard to use pointers, pointers are important factors behind the scenes. I think it is very important to understand them. There are different approaches to understanding pointers.
This article was written for everyone with problems understanding or using pointers. It discusses my working view on pointers in Delphi for Win32, which may not be entirely accurate in all aspects (for instance, memory for one program is not one big block, but for most practical purposes, it helps to pretend it is). This way, pointers are easiest to understand, in my opinion.
Memory
You probably already know what I write in this paragraph, but it is probably good to read it anyway, since it shows my view on things, which may differ a bit from your own.
Pointers are variables which are used to point to other variables. To explain them, it is necessary to understand the concept of a memory address and the concept of a variable. To do this, I’ll first have to roughly explain computer memory.
In short, computer memory can be seen as one very long row of bytes. A byte is a small storage unit that can contain 256 separate values (0 up to 255). In current 32 bit Delphi, memory can (with a few exceptions) be seen as an array of maximum 2 gigabytes in size (231 bytes).
What these bytes contain, depends on how the contents are interpreted, i.e. how they are used. The value of 97 can mean a byte of value 97, as well as the character 'a'. If you combine more than one byte, you can store much larger values. In 2 bytes you can store 256*256 different values, etc.
The bytes in memory can be addressed by numbering them, starting at 0, and up to 2147483647 (assuming you have 2 gigabyte — and even if you don’t have them, Windows will try to make it look as if you have them). The index of a byte in this huge array is called its address.
One could also say: a byte is the smallest addressable piece of memory.
In reality, memory is a lot more complex. There are for instance computers with bytes that are not 8 bit, which means they can contain fewer or more than 256 values, but not the computers on which Delphi for Win32 runs.
Memory is managed in hardware and software, and not all memory is really existent (memory managers take care that your program doesn’t notice, though, by swapping parts of memory out to or in from harddisk), but for this article, it helps to see memory as one huge block of single bytes, divided up to be used for several programs.
Variables
A variable is a location made up of one or more bytes in this huge “array”, from which you can read or to which you can write. It is identified by its name, but also by its type, its value and its address.
If you declare a variable, the compiler reserves a piece of memory of the appropriate size. Where this variable is stored is decided by the compiler and the runtime code. You should never make assumptions about where exactly a variable will be located.
The type of the variable defines how the memory location is used. It defines its size, i.e. how many bytes it occupies, but also its structure. For instance, the following shows a diagram of a piece of memory. It shows 4 bytes starting at address $00012344. The bytes contain the values $4D, $65, $6D and $00, respectively.
Note that although I use addresses like $00012344 in most of the diagrams, these are completely made up, and only used to distinguish different memory locations. They do not reflect the real memory addresses, since these depend on many things, and can not be predicted.
The type decides how these bytes are used. It can for instance be an Integer with value 7169357 ($006D654D), or an array[0..3] of AnsiChar, forming the C-style string 'Mem', or something else, like a set variable, a number of single bytes, a small record, a Single, part of a Double, etc.. In other words, the meaning of a piece of memory is not known before you know the type or types of the variable or variables stored there.
The address of a variable is the address of its first byte. In the diagram above, assuming this shows a variable of type Integer, its address is $00012344.
Uninitialized variables
The memory for variables can be reused. The memory set aside for variables is usually only reserved as long as the program can access them. E.g., local variables of a function or procedure (I like to call both routines) are only valid as long as the routine is running. Fields of an object (which are also variables) are also only valid as long as the object "exists".
If you declare a variable, the compiler reserves the required number of bytes for that variable. But the contents may well be what was already put in these bytes before, when they were used in another function or procedure. In other words, the value of an uninitialized variable is undefined (but not necessarily undetermined). An example is given in the form of this simple console program:
Код:
program uninitializedVar;
{$APPTYPE CONSOLE}
procedure Test;
var
A: Integer;
begin
Writeln(A); // uninitialized yet
A := 12345;
Writeln(A); // initialized: 12345
end;
begin
Test;
Readln;
end.
The first value displayed (the value of the uninitialized variable A) depends on the already existing content of the memory location reserved for A. In my case, it displays the value 2147319808 ($7FFD8000) each time, but this can be totally different on your computer. The value is undefined, because it was not initialized. In a more complex program, especially — but not only — when pointers are concerned, this is is a frequent cause of program crashes or unexpected results. The assignment initializes A with the value 12345 ($00003039), so that is the second value displayed.
Pointers
Pointers are also variables. But they do not contain numbers or characters, they contain the address of a memory location instead. If you see memory as an array, a pointer can be seen as an entry in the array which contains the index of another entry in the array.
Say I have the following declaration and initialisation:
Код:
var
I: Integer;
J: Integer;
C: AnsiChar;
begin
I := 4222;
J := 1357;
C := 'A';
Let’s assume it results in the following memory layout:
Now, after this code, assuming P is a pointer,
Код:
P := @I;
I have the following situation:
In the previous diagrams, I always showed each byte. This is generally not necessary, so the above could just as well be shown as:
This does not reflect the actual size anymore (C looks just as big as I or J), but it is good enough to understand what is going on with pointers.
Nil
Nil is a special pointer value. It can be assigned to any kind of pointer. It stands for the empty pointer (nil is Latin short for nihil, which means nothing or zero; others say NIL means Not In List). It means that the pointer has a defined state, but not that you should attempt to access the value (in C, nil is called NULL — see the quote above).Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end. — Henry Spencer
Nil never points to valid memory, but since it is one well defined value, many routines can test for it (e.g. using the the Assigned() function). One can not test if any other value is valid. Stale or uninitialized pointers look no different than valid pointers (see below). There is no way to distinguish them. Program logic must always ensure that a pointer is either valid, or nil.
In Delphi, nil has the value 0, i.e. it points to the very first byte in memory. This is apparently a byte that will never be accessed by Delphi code. But you should generally not rely on nil being 0, unless you are fully aware of what is going on behind the scenes. The value of nil could change in a later version, for one reason or other.
Typed pointers
In the simple example above, P is of type Pointer. This means that P contains an address, but you don’t know what the variable at that address is supposed to contain. That is why pointers are usually typed, i.e. the pointer is interpreted to be pointing to a memory location that is supposed to contain a certain type.
Let’s assume we have another pointer, Q:
Код:
var
Q: ^Integer;
Q is of type ^Integer, which should be read as "pointer to Integer" (I was told that ^Integer stands for ↑Integer). This means that it is not an Integer, but points to a memory location, which is to be used as one, instead. If you assign the address of J to Q, using the @ address operator or the functionally equivalent Addr pseudo-function,
Код:
Q := @J; // Q := Addr(J);
then Q points to the location at address $00012348 (it references the memory location identified by J). But since Q is a typed pointer, the compiler will treat the memory location to which Q points as an Integer. Integer is the base type of Q.
Although you will hardly ever see the Addr pseudo-function being used, it is equivalent to @. @ has the disadvantage that, when applied to complicated expressions, it is not always obvious to which part the operator applies. Addr, using the syntax of a function, is much less ambiguous, since the target is enclosed in parentheses ():
Код:
P := @PMyRec^.Integers^[6];
Q := Addr(PMyRec^.Integers^[6]);
Assignment using a pointer is a bit different than direct assignment to a variable. Generally, you only have the pointer to go by. If you assign to a normal variable, you write something like:
Код:
J := 98765;
That stores the integer 98765 (hex $000181CD) in the memory location. But to access the memory location using Q, you must work indirectly, using the ^ operator:
Код:
Q^ := 98765;
This is called dereferencing. You must follow the imaginary "arrow" to the location to which Q points (in other words, the Integer at address $00012348) and store it there.
For records, the syntax allows you to omit the ^ operator, if the code is unambiguous without it. For clarity reasons, I personally always write it, though.
It is generally useful to define types for the pointers one is using. For instance, ^Integer is not a valid parameter type declaration, so you’ll have to predefine a type:
Код:
type
PInteger = ^Integer;
procedure Abracadabra(I: PInteger);
In fact, the PInteger type and some other common pointer types are already defined in the Delphi runtime library (e.g. units System and SysUtils). It is custom to start the names of pointer types with the capital letter P followed by the type to which they point. If the base type is prefixed with a capital T, the T is usually omitted. Examples:
Код:
type
PByte = ^Byte;
PDouble = ^Double;
PRect = ^TRect;
PPoint = ^TPoint;
[/SHOWTOGROUPS]