A Tale Of Two Assemblers
Rudy Velthuis
Rudy Velthuis
[SHOWTOGROUPS=4,20]
This is a preliminary version, a work in progress. I will probably (have to) add, enhance or rewrite some parts.
Для просмотра ссылки Войдиили Зарегистрируйся
Introduction
Recently, in the Для просмотра ссылки Войдиили Зарегистрируйся, there was a vivid discussion about the announcement that the first incarnation of a Win64 Delphi compiler would very likely not have a built-in assembler (BASM). The advise was to use an external assembler instead, and it was said that Embarcadero would probably choose Для просмотра ссылки Войди или Зарегистрируйся, the open source Netwide Assembler.
UPDATE: It is clear now, that there is a Win64 built-in assembler in Delphi XE2 and above. You will not have to use NASM, at least not at the moment.
There were complaints that the lack of a built-in assembler would mean that the compiler would be rather useless, and that converting code to an external assembler like NASM would be too much work. To check this, I decided to rewrite my Decimals.pas unit, which uses BASM throughout, entirely or at least partly in NASM, to see how feasible this is and how much work it would be.
In this process, I learned a lot about the similarities and differences between BASM and NASM, and managed to write a small package of macros that could make the conversion a little easier. This article decribes what I experienced during this process.
NASM
NASM is a versatile, but rather simple assembler. It does accept the usual Intel assembler syntax like MOV EAX,EDX or LEA EAX,[EBP+SomeOffset], but it does not accept large parts of the syntax of Microsoft’s MASM (ml.exe or ml64.exe), nor does it accept most of the syntax of Borland’s TASM’s Ideal Mode. Of course it also doesn’t know Delphi’s syntax for comments and other Delphi features you can use in Delphi BASM, like VMTOffset, etc.
For what it’s worth: for this article I used version 2.09.03 from 27 Oct. 2010.
NASM can produce a range of output formats. It can produce the most usual 16 bit, 32 bit and 64 bit object files, as well as simple .bin or .com files. It took me a while to find out how to create Delphi-compatible 32 bit OMF object files (see my article about using object files with Delphi).
There are two things that must be done: the chosen format must be obj, and each and every segment declared in the source file must be declared as USE32.
The first segment declaration should be close to the top, otherwise you’ll get an empty 16 bit segment, which makes your object file unusable.
NASM is a command line compiler. It comes with a console setup. I enhanced it to have a window of 160 characters wide, 75 characters high and Lucide Console 14pt as font. I wrote a little batch file (asm.bat) to compile the decimals.asm file:
It is used as:
The options I use are:
Writing assembler for NASM
As I already said, every segment must explicitly be declared as USE32, otherwise it will be generated as 16 bit. The NASM documentation says you can also use [BITS 32] (probably near the top of the file), but I did not have any success with that.
The decimals.asm file I have has three segments (or sections) declared. I don’t know if all three are necessary, but it works:
As you can see, both are declared as use32. Note that NASM is not case sensitive, except for its declared labels and “variables”, which can be case sensitive or case insensitive, depending on how you declared them. You can also see that comments start with a semicolon, instead of //.
Data
I declared all data in the data section of the .asm file.
Records
In decimals.pas, I declare a few types. To be able to use them, similar structs must be declared in NASM. In NASM, there is a standard macro, called struct, which allows you to do that. But I had a few problems with the built-in __SECT__ macro used there, so I wrote my own record macro which doesn’t use __SECT__. It can be found, together with a few other macros and definitions I prepared, in the delphi.mac file that accompanies this article.
One example is the Decimal type itself. If you omit the many methods and operator overloads, this is what remains, in Delphi:
Here follows the translation to NASM. Note that NASM does not “memorize” any declared sizes (it only uses the size to reserve space, but does not automatically generate opcodes for a certain operand size), so I cheated a bit and declared Flags as Word, which allowed me to declare Scale and Sign too:
The “fields” of such a record can then be accessed like:
The record macro actually declares an absolute segment (like the CGA screen segment 0B800H in the old DOS days) at 0, and the “fields” are in fact local labels (that is why they start with a dot). Fortunately, this is similar to how you can address fields of a Decimal in BASM too (BASM knows a few more ways, but if you want to stay compatible with both syntaxes, you can use this syntax, which both assemblers can understand).
The end at the end of the record declaration is a macro too. I found it nicer than a specific endrecord. It can end a record declaration as well as a function or procedure declaration, and will generate code appropriate for where it is placed. The semicolon is unnecessary (it just starts a comment), but makes it look a little nicer, IMO.
Similarly, I declared the TAccumulator type. In Delphi, this is a variant record. I have not found a way to declare such records in NASM yet, so I declared an alternative TAccumulator2 which maps to the original record the same way, but uses the alternative layout.
Alignment
You should be aware that such record declarations are always packed. There is no alignment whatsoever. If you need aligned records, you can take care of the alignment by yourself, by using the appropriate resb (byte), resw (word), resd (dword), etc. directives. They are explained in the NASM documentation.
If you have a record like:
then it is not necessary to reserve B (or actually .B) as a byte, since the assembler will not use the declared size for its opcode generation anyway. You can just as well reserve it with resd, since that will take care that the Longint is properly aligned on a 4 byte offset:
This means that you must know the details about record alignment and know how to pad with bytes to achieve a certain alignment. This is a lot easier in Delphi and BASM, of course.
Another possibility is to use the built-in align and alignb macros. One example of the use of alignb can be seen in the TFormatSettings record:
The documentation for NASM says that align should be used for code and data segments, while alignb should be used for bss segments (bss segments contain uninitialized data — using resb etc., while data segments contain initialized data — using db etc.). The record declarations only contain resb, resw, etc. declarations, so the proper alignment is done using alignb, since that uses resb by default to reserve space, while align uses sequences of NOP or other assembler code to fill the gaps, which is not allowed in a data (or bss) segment.
That still means you’ll have to know how alignment works (Delphi uses so called natural alignment — automatically padding variables so they begin on addresses that are a multiple of their size), but it allows you to align records properly
[/SHOWTOGROUPS]
This is a preliminary version, a work in progress. I will probably (have to) add, enhance or rewrite some parts.
Для просмотра ссылки Войди
Introduction
Recently, in the Для просмотра ссылки Войди
UPDATE: It is clear now, that there is a Win64 built-in assembler in Delphi XE2 and above. You will not have to use NASM, at least not at the moment.
There were complaints that the lack of a built-in assembler would mean that the compiler would be rather useless, and that converting code to an external assembler like NASM would be too much work. To check this, I decided to rewrite my Decimals.pas unit, which uses BASM throughout, entirely or at least partly in NASM, to see how feasible this is and how much work it would be.
In this process, I learned a lot about the similarities and differences between BASM and NASM, and managed to write a small package of macros that could make the conversion a little easier. This article decribes what I experienced during this process.
NASM
NASM is a versatile, but rather simple assembler. It does accept the usual Intel assembler syntax like MOV EAX,EDX or LEA EAX,[EBP+SomeOffset], but it does not accept large parts of the syntax of Microsoft’s MASM (ml.exe or ml64.exe), nor does it accept most of the syntax of Borland’s TASM’s Ideal Mode. Of course it also doesn’t know Delphi’s syntax for comments and other Delphi features you can use in Delphi BASM, like VMTOffset, etc.
For what it’s worth: for this article I used version 2.09.03 from 27 Oct. 2010.
NASM can produce a range of output formats. It can produce the most usual 16 bit, 32 bit and 64 bit object files, as well as simple .bin or .com files. It took me a while to find out how to create Delphi-compatible 32 bit OMF object files (see my article about using object files with Delphi).
There are two things that must be done: the chosen format must be obj, and each and every segment declared in the source file must be declared as USE32.
The first segment declaration should be close to the top, otherwise you’ll get an empty 16 bit segment, which makes your object file unusable.
NASM is a command line compiler. It comes with a console setup. I enhanced it to have a window of 160 characters wide, 75 characters high and Lucide Console 14pt as font. I wrote a little batch file (asm.bat) to compile the decimals.asm file:
Код:
@echo off
nasm -fobj -Ox -l%1.out %1.asm
It is used as:
Код:
asm decimals
The options I use are:
- -fobj — Sets output format to OMF. This can contain 16 and 32 segments!
- -Ox — Full optimization. Where possible, chooses the smallest opcode/literal combination.
- -l%1.out — %1 is the first argument of the batch file, e.g. decimal. -l gives the name of the listing file
- %1.asm — The file to be assembled, e.g. decimal.pas
Writing assembler for NASM
As I already said, every segment must explicitly be declared as USE32, otherwise it will be generated as 16 bit. The NASM documentation says you can also use [BITS 32] (probably near the top of the file), but I did not have any success with that.
The decimals.asm file I have has three segments (or sections) declared. I don’t know if all three are necessary, but it works:
section data public use32 ; data declarations, e.g. records or external data ; (external to this file, e.g. in a Delphi unit) section const public use32 ; constant declarations section code public use32 ; code |
As you can see, both are declared as use32. Note that NASM is not case sensitive, except for its declared labels and “variables”, which can be case sensitive or case insensitive, depending on how you declared them. You can also see that comments start with a semicolon, instead of //.
Data
I declared all data in the data section of the .asm file.
Records
In decimals.pas, I declare a few types. To be able to use them, similar structs must be declared in NASM. In NASM, there is a standard macro, called struct, which allows you to do that. But I had a few problems with the built-in __SECT__ macro used there, so I wrote my own record macro which doesn’t use __SECT__. It can be found, together with a few other macros and definitions I prepared, in the delphi.mac file that accompanies this article.
One example is the Decimal type itself. If you omit the many methods and operator overloads, this is what remains, in Delphi:
type Decimal = packed record private Lo: Longword; // Hi:Mid:Lo form 96 bit unsigned mantissa Mid: Longword; Hi: Longword; case Byte of 0: (Reserved: Word; // always 0 Scale: Shortint; // 0..28 Sign: Byte); // $80 = negative, $00 = positive 1: (Flags: Longword); end; |
Here follows the translation to NASM. Note that NASM does not “memorize” any declared sizes (it only uses the size to reserve space, but does not automatically generate opcodes for a certain operand size), so I cheated a bit and declared Flags as Word, which allowed me to declare Scale and Sign too:
record Decimal .Lo resd 1 .Mid resd 1 .Hi resd 1 .Flags resw 1 .Scale resb 1 .Sign resb 1 end; |
The “fields” of such a record can then be accessed like:
MOV EAX,[ESI+Decimal.Hi] MOV [.MyVar+Decimal.Hi],EAX |
The record macro actually declares an absolute segment (like the CGA screen segment 0B800H in the old DOS days) at 0, and the “fields” are in fact local labels (that is why they start with a dot). Fortunately, this is similar to how you can address fields of a Decimal in BASM too (BASM knows a few more ways, but if you want to stay compatible with both syntaxes, you can use this syntax, which both assemblers can understand).
The end at the end of the record declaration is a macro too. I found it nicer than a specific endrecord. It can end a record declaration as well as a function or procedure declaration, and will generate code appropriate for where it is placed. The semicolon is unnecessary (it just starts a comment), but makes it look a little nicer, IMO.
Similarly, I declared the TAccumulator type. In Delphi, this is a variant record. I have not found a way to declare such records in NASM yet, so I declared an alternative TAccumulator2 which maps to the original record the same way, but uses the alternative layout.
Alignment
You should be aware that such record declarations are always packed. There is no alignment whatsoever. If you need aligned records, you can take care of the alignment by yourself, by using the appropriate resb (byte), resw (word), resd (dword), etc. directives. They are explained in the NASM documentation.
If you have a record like:
{$A8} type TTest = record B: Boolean; L: Longint; end; |
then it is not necessary to reserve B (or actually .B) as a byte, since the assembler will not use the declared size for its opcode generation anyway. You can just as well reserve it with resd, since that will take care that the Longint is properly aligned on a 4 byte offset:
record TTest .B resd 1 .L resd 1 end |
This means that you must know the details about record alignment and know how to pad with bytes to achieve a certain alignment. This is a lot easier in Delphi and BASM, of course.
Another possibility is to use the built-in align and alignb macros. One example of the use of alignb can be seen in the TFormatSettings record:
record TFormatSettings .CurrencyString resd 1 alignb 1 .CurrencyFormat resb 1 alignb 1 .CurrencyDecimals resb 1 alignb 2 .DateSeparator resw 1 alignb 2 .TimeSeparator resw 1 alignb 2 .ListSeparator resw 1 alignb 4 .ShortDateFormat resd 1 alignb 4 .LongDateFormat resd 1 alignb 4 .TimeAMString resd 1 alignb 4 .TimePMString resd 1 alignb 4 .ShortTimeFormat resd 1 alignb 4 .LongTimeFormat resd 1 alignb 4 .ShortMonthNames resd 12 alignb 4 .LongMonthNames resd 12 alignb 4 .ShortDayNames resd 7 alignb 4 .LongDayNames resd 7 alignb 2 .ThousandSeparator resw 1 alignb 2 .DecimalSeparator resw 1 alignb 2 .TwoDigitYearCenturyWindow resw 1 alignb 1 .NegCurrFormat resd 1 end; |
The documentation for NASM says that align should be used for code and data segments, while alignb should be used for bss segments (bss segments contain uninitialized data — using resb etc., while data segments contain initialized data — using db etc.). The record declarations only contain resb, resw, etc. declarations, so the proper alignment is done using alignb, since that uses resb by default to reserve space, while align uses sequences of NOP or other assembler code to fill the gaps, which is not allowed in a data (or bss) segment.
That still means you’ll have to know how alignment works (Delphi uses so called natural alignment — automatically padding variables so they begin on addresses that are a multiple of their size), but it allows you to align records properly
[/SHOWTOGROUPS]