Advertisement
krot

NASM

May 26th, 2018
284
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. A general NASM guide for TASM coders and other ASM people
  2.                          By Gij
  3.                           V0.3
  4. ---------------------------------------------------------
  5.  
  6. Generalities
  7. ------------
  8.  
  9. The basic function of any assembler it to turn asm into the equivalent
  10. binary code file, and that's true with both TASM and NASM.
  11. The differences arise in the special features each assembler offers you.
  12. for example the MODEL directive exists in TASM, making it easier for the
  13. coder to reference data variables in other segments.
  14. NASM does not have an equivalent directive, so you have to keep tabs of
  15. segment registers yourself, and put segment overrides where needed.
  16. This does not mean that NASM doesn't have good SEGMENT or GROUP support,
  17. it has both.
  18.  
  19. It's a different way of coding, and it may seem to require more work,
  20. but after you get used to it it's easier, because you know exactly what's
  21. going on in your code.
  22.  
  23. TASM is chock-full of directives, looking at a small reference for TASM 4.0,
  24. there are at least a few dozen directives TASM uses, and you have to know
  25. quite a bit of them by heart.
  26. NASM on the other hand has very few directives. Actually, you can write
  27. an asm file that will assemble just fine without using a single directive,
  28. although I doubt it will be useful in most cases.
  29.  
  30. NASM is also less ambivalent towards syntax, which leaves less room for
  31. software bugs, but makes it more strict when assembling.
  32. I actually think NASM is easier to learn then TASM since it's much more
  33. straight-forward.
  34.  
  35. Your NASM Bible is of course the accompanying docs, you can get them in
  36. a separate package from the same place you got the binaries for NASM.
  37. All in all i think you will find NASM to be just as capable as TASM if not
  38. more so. Although it's missing some features TASM has, you can always mail
  39. the author and ask for a feature, and you just might get lucky when the
  40. new version comes out.
  41.  
  42. ASM code is usually the same in any assembler ( AT&T syntax is an exception )
  43. but there are a few subtleties that TASM coders should look out for.
  44. The accompanying NASM docs have a nice list of them, i'll mention a few:
  45.  
  46. DATA offset vs DATA contents
  47. ----------------------------
  48.  
  49. TASM uses this syntax to move
  50.  
  51.     mov esi, offset MyVar
  52.    OR
  53.     lea esi, MyVar
  54.  
  55. LEA is used to load complex offsets like "[esi*4+ebx]" into a register, TASM
  56. supports LEA even when used with a simple offset like "Myvar".
  57.  
  58. NASM on the other hand only supports one way of loading a simple offset into a
  59. register, the LEA form is only valid when using complex offsets:
  60.  
  61.     mov esi, MyVar
  62.  
  63. This ALWAYS means move the offest of MyVar into esi.
  64. On the other hand, This:
  65.  
  66.     mov eax, [MyVar]
  67.  
  68. Will always mean move the contents of MyVar into eax.
  69.  
  70. However, using LEA to load a complex offset is valid in both TASM and NASM:
  71.  
  72.     lea edi,[esi*4+EBX] ; valid in both assemblers
  73.  
  74. NASM also support a SEG keyword:
  75.  
  76.     mov ax,SEG MyVar
  77.  
  78. This moves the segment of the variable into ax.
  79.  
  80. Note: the LEA instruction is still valid for complex
  81.  
  82. Segment Overrides
  83. -----------------
  84.  
  85. TASM is more lax in it's syntax, so both of these are valid code:
  86.  
  87.     mov ax,ds:[si]
  88. AND
  89.     mov ax,[ds:si]
  90.  
  91.  
  92. NASM doesn't allow this, if you specify a variable inside the square brackets
  93. all of the specifiers should be inside the square brackets.
  94. So This is the only valid option:
  95.  
  96.     mov ax,[ds:si]
  97.  
  98. Specifying operand size
  99. -----------------------
  100.  
  101. TASM coders usually have lexical difficulties with NASM because
  102. it lacks the "ptr" keyword used extensively in TASM.
  103.  
  104. TASM uses this:
  105.  
  106.     mov al,  byte ptr [ds:si]
  107. or
  108.     mov ax,  word ptr [ds:si]
  109. or
  110.     mov eax, dword ptr [ds:si]
  111.  
  112. For NASM This simply translates into:
  113.  
  114.     mov al,  byte [ds:si]
  115. or
  116.     mov ax,  word [ds:si]
  117. or
  118.     mov eax, dword [ds:si]
  119.  
  120. NASM allows these size keywords in many places, and thus gives you a lot
  121. of control over the generated opcodes in a unifrom way, for example These
  122. are all valid:
  123.  
  124.     push dword 123
  125.     jmp  [ds: word 1234]   ; these both specify the size of the offset
  126.     jmp  [ds: dword 1234]  ; for tricky code when interfacing 32bit and
  127.                                ; 16bit segments
  128.  
  129. it can get pretty hairy, but the important thing to remember is you can have
  130. all the control you need, when you want it.
  131.  
  132. Functions
  133. ---------
  134.  
  135. TASM has special directives for declaring a procedure and ending it, why?
  136. a procedure is just another code label you CALL instead of JMP, NASM got it
  137. right.
  138.  
  139. TASM uses:
  140.  
  141. ProcName PROC
  142.     xor ax,ax
  143.     ret
  144. ProcName ENDP
  145.  
  146. while NASM just uses:
  147.  
  148. Procname:
  149.     xor ax,ax
  150.     ret
  151.  
  152. Local Labels
  153. ------------
  154.  
  155. Those of you that know C, know that a member of a struct can be referenced
  156. as StructInstance.MemberName, this is rather similar to the way NASM allows
  157. you to use local labels. A Local Label is Denoted by preceeding a dot to
  158. the label name.
  159.  
  160. Label1:
  161.     nop
  162. .local:
  163.     nop
  164. Label2:
  165.     nop
  166. .local:
  167.     nop
  168.  
  169. This won't give you an error on multiple definitions of label, but you can
  170. still jmp to a certain label like this:
  171.  
  172.     jmp Label2.local
  173.  
  174. so it's local, and in a way it's also a global label.
  175.  
  176. ORG directive
  177. --------------
  178.  
  179. NASM supports the org directive, so if your coding a com you can start with:
  180.  
  181.     org 0x100h
  182. OR
  183.     org 100h
  184.  
  185. NASM allows both the asm and c methods of specifying hex, so both of the
  186. above are valid.
  187.  
  188. reserving space
  189. ---------------
  190.  
  191. again, NASM uses a different syntax then that of TASM.
  192.  
  193. In TASM you would declare a 100 bytes of uninitialized space like this:
  194.  
  195.     Array1: db 100 dup (?)
  196.  
  197. NASM uses it's own keywords to do this, these are RESB,RESW and RESD,
  198. for byte,word and dword respectively.
  199. so you would use them like this:
  200.  
  201.     Array1: RESB 100
  202. OR
  203.     Array1: RESW 100/2
  204. OR
  205.     Array1: RESD 100/4
  206.  
  207. Declaring initialized space is much like TASM, but arrays are different.
  208.  
  209. In TASM:
  210.  
  211.     Array1: db 100 dup (1)
  212.  
  213. In NASM:
  214.     Array1: TIMES 100 db 1
  215.  
  216. TIMES is a handy little directive, it instructs NASM to preform an action
  217. a specified number of times, in the example above I preform "db 1" a 100
  218. times.
  219.  
  220. it can be used for virtually anything:
  221.  
  222.     TIMES 69 nop
  223.  
  224. will put 69 nops at the current point in the file.
  225.  
  226. * the $ symbol is supported by NASM, and can be used to specify the count
  227.   operand to times, so this is valid:
  228.  
  229.   label1:
  230.     mov ax,1
  231.     xor ax,ax
  232.   label2:
  233.     TIMES $-label1 nop
  234.  
  235.   This Will put as many one byte nops after label2, as the byte count between
  236.   label1 and label2.
  237.  
  238. Making Structs
  239. --------------
  240.  
  241. I fought long and hard to get structs going, the docs were a bit vauge, and
  242. it took a while to get it, here it is.
  243.  
  244. using a struct is divided into 2 parts, declaring the prototype, and making an
  245. instance.
  246.  
  247. struc st
  248. stLong resd 1
  249. stWord resw 1
  250. endstruc
  251.  
  252. this declares a prototype struct named st, with 2 members, stLong which is a
  253. DWORD, and stWord which is a word.
  254. it uses the reserve directives because it's a prototype, not a real struct.
  255. you can use it to make a real instance you can reference as data in your code:
  256.  
  257. mystruc:
  258. istruc st
  259. at stLong, dd 1
  260. at stWord, dw 1
  261. iend
  262.  
  263. *Note: it's important to put the label on a different line.
  264.  
  265. This creates a struct named mystruc of type st, the use of the "at" keyword
  266. is used to assign initial values to members of the struc.
  267.  
  268. The notation for referencing members is not like in C. this is because of the
  269. way struct supports is implemented, each member is assigned an offset relative
  270. to the beginning of the struct:
  271.  
  272. mystruc:
  273. istruc st
  274. at stLong, dd 1  ; offset 0
  275. at stWord, dw 1  ; offset 4
  276. iend
  277.  
  278.  
  279. The notation for referencing a memebr is therefore:
  280.  
  281.     mov eax, [mystruc+mtLong]
  282.  
  283. This is because mystruc is a constant base, and the member is a relative offset
  284. to it, it's similar to referencing a data array in a way.
  285.  
  286. One thing I should mention, If you declare structs prototypes as above, the
  287. member names/labels will be global, so you will get collisions if you use the
  288. same member name in your code or in another struct prototype.
  289. To avoid this, precede the member names with a dot '.', and then reference them
  290. in relation to the prototype's name in the instance declaration. example:
  291.  
  292. struc st
  293. .stLong resd 1
  294. .stWord resw 1
  295. endstruc
  296.  
  297. mystruc:
  298. istruc st
  299. at st.stLong, dd 1
  300. at st.stWord, dw 1
  301. iend
  302.  
  303.  
  304. And this is how you reference the members in code:
  305.  
  306.     mov eax,[mystruc+st.stWord]
  307.  
  308. this may seem confusing, you should understand that "mystruc" is the base of a
  309. particular instance, and "st.stLong" is an offset relative to the start of the
  310. struct, so in pseudo-code it translates into:
  311.  
  312.     mov eax,[offset mystruc + (offset stWord-offset start_of_proto]
  313. or
  314.     mov eax,[offset mystruc + 4]
  315.  
  316. which gives you the correct offset for the stWord member of the "mystruc"
  317. struct instance.
  318.  
  319. Using Macros
  320. ------------
  321.  
  322. This is a large part of the nasm docs, and a bit too much to get into in depth
  323. here. I'll try and cover the major issues.
  324.  
  325. There are 2 types of macros, one-line and multi-line, all macro keywords are
  326. preceeded with a '%' character.
  327.  
  328. example of a single-line macro:
  329.  
  330. %define mul(a,b) (a*b)
  331.  
  332.     mov eax,mul(2,3)
  333.  
  334. This will be converted into:
  335.  
  336.     mov eax,6
  337.  
  338. you can invocate other macros from within a macro:
  339.  
  340. %define fancymul(a,b) ( a * triple_mul(4) )
  341. %define triple_mul(a) (a*3)
  342.  
  343.     mov eax,fancymul(2,3)
  344.  
  345. This becomes:
  346.  
  347.     mov eax, ( 2 * ( 3 * 4 ) )
  348.  
  349. These are not very useful examples, but i'm sure you can see the potential.
  350.  
  351.  
  352. Multi-Line macros are much the same as single-line macros, but the syntax
  353. is a bit different:
  354.  
  355. %macro name number_of_args
  356.     <body of macro>
  357. %endmacro
  358.  
  359. so for example, if you wanted to make a small asm effort-saver you could write
  360. the following macro:
  361.  
  362. %macro prologue 1
  363.     push ebp
  364.     mov ebp,esp
  365.     sub esp,%1
  366. %endmacro
  367.  
  368. and then you can use it in your code like this:
  369.  
  370. DemoFunc:
  371.  
  372.     prologue 4*2
  373.  
  374.     <body of function>
  375.  
  376. This would setup a stack frame, and reserve room for 2 DWORD local variables.
  377. you'll notice that args supplied to the macro can be referenced as %1....%n .
  378.  
  379. This is just a taste, there's more to be learned about NASM macros, the docs
  380. are your friends.
  381.  
  382. Including files is easy, If you want to include .inc's into your asm file
  383. you can use:
  384.  
  385.     %include "win32.inc"
  386.  
  387. If you wish to include binary files, you must use a different keyword:
  388.  
  389.     INCBIN   "data.bin"
  390.  
  391. NASM also has support for conditional assembly:
  392.  
  393. %define INCLUDE_WIN32_INCS
  394.  
  395. %ifdef  INCLUDE_WIN32_INCS
  396.     %include "win32.inc"
  397.     %include "toolhelp.inc"
  398.     %include "messages.inc"
  399. %endif
  400.  
  401. This way you can control the inclusion of files defining on the command line:
  402.  
  403.     "nasmw -dINCLUDE_WIN32_INC"
  404.  
  405. or by commenting out the %define line. The body of the %ifdef will be processed
  406. only if a macro/define named INCLUDE_WIN32_INCS is defined.
  407.  
  408. Extern's, Globals and commons
  409. -----------------------------
  410.  
  411. When Coding a multi-source-files project,  writing a dll, or calling API
  412. functions you need to declare various symbols/data/functions a certain type
  413. to make them available to the Assembler and you.
  414.  
  415. there are 3 types of symbols in NASM:
  416.  
  417. EXTERN, GLOBAL and COMMON
  418.  
  419. their invocation is all the same
  420.  
  421. EXTERN symbol_name      ; use this to define API calls for use
  422. GLOBAL symbol_name
  423. COMMON symbol_name
  424.  
  425. They all must appear before the actual symbol is defined/referenced.
  426. If you have experience in asm/c their use should be clear.
  427.  
  428. NASM 0.97 also has IMPORT/EXPORT extensions to the .obj format, for
  429. writing DLL's, read the docs for more info.
  430.  
  431. specifying segment type
  432. -----------------------
  433.  
  434. you can declare segments much the same as you would in TASM:
  435.  
  436.     segment .data use32 CLASS=data
  437. or
  438.     segment .text use32 CLASS=code
  439. or
  440.     segment Gij use16 CLASS=code
  441.  
  442. this is a good way to set segments straight for linking.
  443.  
  444. output formats
  445. --------------
  446.  
  447. Nasm supports a plethora of output formats, depending on what your trying
  448. to accomplish, you should read the docs for special extensions to each type.
  449. These are chosen using "nasm -f type", where type can be bin,obj,win32 and
  450. others.
  451.  
  452. Each linker likes different formats, tlink likes obj for example, while
  453. LCC-WIN32 likes the win32 format, investigate on your own.
  454.  
  455. *tip: when assembling into the "obj" type, make sure and use the special
  456.      "..start:" symbol to specify the entry point for the file.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement