NASM

A general NASM guide for TASM coders and other ASM people
                         By Gij
                          V0.3
---------------------------------------------------------

Generalities
------------

The basic function of any assembler it to turn asm into the equivalent
binary code file, and that's true with both TASM and NASM.
The differences arise in the special features each assembler offers you.
for example the MODEL directive exists in TASM, making it easier for the
coder to reference data variables in other segments.
NASM does not have an equivalent directive, so you have to keep tabs of
segment registers yourself, and put segment overrides where needed.
This does not mean that NASM doesn't have good SEGMENT or GROUP support,
it has both.

It's a different way of coding, and it may seem to require more work,
but after you get used to it it's easier, because you know exactly what's
going on in your code.

TASM is chock-full of directives, looking at a small reference for TASM 4.0,
there are at least a few dozen directives TASM uses, and you have to know
quite a bit of them by heart.
NASM on the other hand has very few directives. Actually, you can write
an asm file that will assemble just fine without using a single directive,
although I doubt it will be useful in most cases.

NASM is also less ambivalent towards syntax, which leaves less room for
software bugs, but makes it more strict when assembling.
I actually think NASM is easier to learn then TASM since it's much more
straight-forward.

Your NASM Bible is of course the accompanying docs, you can get them in
a separate package from the same place you got the binaries for NASM.
All in all i think you will find NASM to be just as capable as TASM if not
more so. Although it's missing some features TASM has, you can always mail
the author and ask for a feature, and you just might get lucky when the
new version comes out.

ASM code is usually the same in any assembler ( AT&T syntax is an exception )
but there are a few subtleties that TASM coders should look out for.
The accompanying NASM docs have a nice list of them, i'll mention a few:

DATA offset vs DATA contents
----------------------------

TASM uses this syntax to move

    mov esi, offset MyVar
   OR
    lea esi, MyVar

LEA is used to load complex offsets like "[esi*4+ebx]" into a register, TASM
supports LEA even when used with a simple offset like "Myvar".

NASM on the other hand only supports one way of loading a simple offset into a
register, the LEA form is only valid when using complex offsets:

    mov esi, MyVar

This ALWAYS means move the offest of MyVar into esi.
On the other hand, This:

    mov eax, [MyVar]

Will always mean move the contents of MyVar into eax.

However, using LEA to load a complex offset is valid in both TASM and NASM:

    lea edi,[esi*4+EBX] ; valid in both assemblers

NASM also support a SEG keyword:

    mov ax,SEG MyVar

This moves the segment of the variable into ax.

Note: the LEA instruction is still valid for complex

Segment Overrides
-----------------

TASM is more lax in it's syntax, so both of these are valid code:

    mov ax,ds:[si]
AND
    mov ax,[ds:si]


NASM doesn't allow this, if you specify a variable inside the square brackets
all of the specifiers should be inside the square brackets.
So This is the only valid option:

    mov ax,[ds:si]

Specifying operand size
-----------------------

TASM coders usually have lexical difficulties with NASM because
it lacks the "ptr" keyword used extensively in TASM.

TASM uses this:

    mov al,  byte ptr [ds:si]
or
    mov ax,  word ptr [ds:si]
or
    mov eax, dword ptr [ds:si]

For NASM This simply translates into:

    mov al,  byte [ds:si]
or
    mov ax,  word [ds:si]
or
    mov eax, dword [ds:si]

NASM allows these size keywords in many places, and thus gives you a lot
of control over the generated opcodes in a unifrom way, for example These
are all valid:

    push dword 123
    jmp  [ds: word 1234]   ; these both specify the size of the offset
    jmp  [ds: dword 1234]  ; for tricky code when interfacing 32bit and
                               ; 16bit segments

it can get pretty hairy, but the important thing to remember is you can have
all the control you need, when you want it.

Functions
---------

TASM has special directives for declaring a procedure and ending it, why?
a procedure is just another code label you CALL instead of JMP, NASM got it
right.

TASM uses:

ProcName PROC
    xor ax,ax
    ret
ProcName ENDP

while NASM just uses:

Procname:
    xor ax,ax
    ret

Local Labels
------------

Those of you that know C, know that a member of a struct can be referenced
as StructInstance.MemberName, this is rather similar to the way NASM allows
you to use local labels. A Local Label is Denoted by preceeding a dot to
the label name.

Label1:
    nop
.local:
    nop
Label2:
    nop
.local:
    nop

This won't give you an error on multiple definitions of label, but you can
still jmp to a certain label like this:

    jmp Label2.local

so it's local, and in a way it's also a global label.

ORG directive
--------------

NASM supports the org directive, so if your coding a com you can start with:

    org 0x100h
OR
    org 100h

NASM allows both the asm and c methods of specifying hex, so both of the
above are valid.

reserving space
---------------

again, NASM uses a different syntax then that of TASM.

In TASM you would declare a 100 bytes of uninitialized space like this:

    Array1: db 100 dup (?)

NASM uses it's own keywords to do this, these are RESB,RESW and RESD,
for byte,word and dword respectively.
so you would use them like this:

    Array1: RESB 100
OR
    Array1: RESW 100/2
OR
    Array1: RESD 100/4

Declaring initialized space is much like TASM, but arrays are different.

In TASM:

    Array1: db 100 dup (1)

In NASM:
    Array1: TIMES 100 db 1

TIMES is a handy little directive, it instructs NASM to preform an action
a specified number of times, in the example above I preform "db 1" a 100
times.

it can be used for virtually anything:

    TIMES 69 nop

will put 69 nops at the current point in the file.

* the $ symbol is supported by NASM, and can be used to specify the count
  operand to times, so this is valid:

  label1:
    mov ax,1
    xor ax,ax
  label2:
    TIMES $-label1 nop

  This Will put as many one byte nops after label2, as the byte count between
  label1 and label2.

Making Structs
--------------

I fought long and hard to get structs going, the docs were a bit vauge, and
it took a while to get it, here it is.

using a struct is divided into 2 parts, declaring the prototype, and making an
instance.

struc st
stLong resd 1
stWord resw 1
endstruc

this declares a prototype struct named st, with 2 members, stLong which is a
DWORD, and stWord which is a word.
it uses the reserve directives because it's a prototype, not a real struct.
you can use it to make a real instance you can reference as data in your code:

mystruc:
istruc st
at stLong, dd 1
at stWord, dw 1
iend

*Note: it's important to put the label on a different line.

This creates a struct named mystruc of type st, the use of the "at" keyword
is used to assign initial values to members of the struc.

The notation for referencing members is not like in C. this is because of the
way struct supports is implemented, each member is assigned an offset relative
to the beginning of the struct:

mystruc:
istruc st
at stLong, dd 1  ; offset 0
at stWord, dw 1  ; offset 4
iend


The notation for referencing a memebr is therefore:

    mov eax, [mystruc+mtLong]

This is because mystruc is a constant base, and the member is a relative offset
to it, it's similar to referencing a data array in a way.

One thing I should mention, If you declare structs prototypes as above, the
member names/labels will be global, so you will get collisions if you use the
same member name in your code or in another struct prototype.
To avoid this, precede the member names with a dot '.', and then reference them
in relation to the prototype's name in the instance declaration. example:

struc st
.stLong resd 1
.stWord resw 1
endstruc

mystruc:
istruc st
at st.stLong, dd 1
at st.stWord, dw 1
iend


And this is how you reference the members in code:

    mov eax,[mystruc+st.stWord]

this may seem confusing, you should understand that "mystruc" is the base of a
particular instance, and "st.stLong" is an offset relative to the start of the
struct, so in pseudo-code it translates into:

    mov eax,[offset mystruc + (offset stWord-offset start_of_proto]
or
    mov eax,[offset mystruc + 4]

which gives you the correct offset for the stWord member of the "mystruc"
struct instance.

Using Macros
------------

This is a large part of the nasm docs, and a bit too much to get into in depth
here. I'll try and cover the major issues.

There are 2 types of macros, one-line and multi-line, all macro keywords are
preceeded with a '%' character.

example of a single-line macro:

%define mul(a,b) (a*b)

    mov eax,mul(2,3)

This will be converted into:

    mov eax,6

you can invocate other macros from within a macro:

%define fancymul(a,b) ( a * triple_mul(4) )
%define triple_mul(a) (a*3)

    mov eax,fancymul(2,3)

This becomes:

    mov eax, ( 2 * ( 3 * 4 ) )

These are not very useful examples, but i'm sure you can see the potential.


Multi-Line macros are much the same as single-line macros, but the syntax
is a bit different:

%macro name number_of_args
    <body of macro>
%endmacro

so for example, if you wanted to make a small asm effort-saver you could write
the following macro:

%macro prologue 1
    push ebp
    mov ebp,esp
    sub esp,%1
%endmacro

and then you can use it in your code like this:

DemoFunc:

    prologue 4*2

    <body of function>

This would setup a stack frame, and reserve room for 2 DWORD local variables.
you'll notice that args supplied to the macro can be referenced as %1....%n .

This is just a taste, there's more to be learned about NASM macros, the docs
are your friends.

Including files is easy, If you want to include .inc's into your asm file
you can use:

    %include "win32.inc"

If you wish to include binary files, you must use a different keyword:

    INCBIN   "data.bin"

NASM also has support for conditional assembly:

%define INCLUDE_WIN32_INCS

%ifdef  INCLUDE_WIN32_INCS
    %include "win32.inc"
    %include "toolhelp.inc"
    %include "messages.inc"
%endif

This way you can control the inclusion of files defining on the command line:

    "nasmw -dINCLUDE_WIN32_INC"

or by commenting out the %define line. The body of the %ifdef will be processed
only if a macro/define named INCLUDE_WIN32_INCS is defined.

Extern's, Globals and commons
-----------------------------

When Coding a multi-source-files project,  writing a dll, or calling API
functions you need to declare various symbols/data/functions a certain type
to make them available to the Assembler and you.

there are 3 types of symbols in NASM:

EXTERN, GLOBAL and COMMON

their invocation is all the same

EXTERN symbol_name      ; use this to define API calls for use
GLOBAL symbol_name
COMMON symbol_name

They all must appear before the actual symbol is defined/referenced.
If you have experience in asm/c their use should be clear.

NASM 0.97 also has IMPORT/EXPORT extensions to the .obj format, for
writing DLL's, read the docs for more info.

specifying segment type
-----------------------

you can declare segments much the same as you would in TASM:

    segment .data use32 CLASS=data
or
    segment .text use32 CLASS=code
or
    segment Gij use16 CLASS=code

this is a good way to set segments straight for linking.

output formats
--------------

Nasm supports a plethora of output formats, depending on what your trying
to accomplish, you should read the docs for special extensions to each type.
These are chosen using "nasm -f type", where type can be bin,obj,win32 and
others.

Each linker likes different formats, tlink likes obj for example, while
LCC-WIN32 likes the win32 format, investigate on your own.

*tip: when assembling into the "obj" type, make sure and use the special
      "..start:" symbol to specify the entry point for the file.