The basics of programming in assembly, the design of the processor, registers, memory, instruction, and use of assembly language within C++ and Delphi.
1. Introduction to assembly
Assembly language, a low-level programming language which allows you to use all the features of a computer processor is nowadays somewhat forgotten by “modern” developers.
The main reason for this is that writing in assembly is not the simplest of tasks, and is very time-consuming (testing code, finding bugs etc.).
However, in some situations assembly may be an ideal solution. An example is any kind of algorithm where speed is essential, such as in cryptographic (i.e. encryption) algorithms.
Despite incredible advancements in compilers in recent years, algorithms such as Blowfish, Rijndael, Idea written in assembly and “manually” optimised show significant speed advantages over their counterparts written e.g. in C++ and compiled at the maximum optimisation level.
In addition to cryptography, assembly is also often used by game developers. The best example may be the game QUAKE 2. After the publication of its source code, it turned out that all the algorithms that require speed were written in assembly.
So let's get started. To be clear, I should add that in this article I will focus on assembly for x86 processors, and its use in a Windows environment.
2. Fundamentals of assembly
If you have never written in assembly, before you can even create the simplest program, you must first learn several fundamentals like the CPU registers, instructions, and the stack.
From the programmer's perspective, a standard processor (I will use the Intel Pentium MMX as an example, as it is all I've got :-) has a large range of instructions ranging from 8 to 16 to 32-bit x86 instructions, as well as floating point and MMX instructions.
2.1. CPU registers
The processor has eight 32-bit general purpose registers and flags register, as well
as eight 80-bit coprocessor registers (st0 - st7
) and an equal number of
64-bit MMX registers (mm0 - mm7
). The processor also has several control
registers, that we generally don't use.
What is a register? A register is like a memory cell, which can temporarily store data;
we can exchange data between the registers, and perform logical operations and arithmetic on
the registers. The Pentium processor is 32-bit, which means that each of the general purpose
registers is 32 bits wide (corresponding to unsigned int
in C). All 32-bit
registers have a 16-bit half (a remnant from the 286 processor), while the 16-bit halves
of registers EAX
, EBX
, ECX
and EDX
are
each divided into two 8-bit halves:
Register Name | 16-bit half | 8-bit halves | Description |
---|---|---|---|
EAX | AX | AH and AL | Accumulator |
EBX | BX | BH and BL | Base |
ECX | CX | CH and CL | Counter for string operations and loops |
EDX | DX | DH and DL | Data |
ESI | SI | n/a | Source register for string instructions |
EDI | DI | n/a | Destination register for string instructions |
EBP | BP | n/a | Pointer to data within the stack, used by functions to locate parameters saved on the stack |
ESP | SP | n/a | Stack pointer |
2.2. General purpose registers
When writing a program, or inline assembly code under Windows, you can use all the general
purpose registers, but using the special registers ESP
and EBP
can
interfere with the operation of the program. For example, if you reset the ESP
register to zero within a function, the program will most likely crash later (e.g. if the
program tries to return from the function).
2.3. The stack
The stack is an area of memory reserved for the needs of the program. These include passing
parameters to functions (as 32-bit values), temporary data storage, and all local variables.
When the program starts, the ESP
register (stack pointer) points to the
end of the stack. When data is stored on the stack, the ESP
register is decremented,
and the data is then stored in the memory location which ESP
points to. To store
data on the stack, the push
instruction is used, for instance:
__asm {
push 5 // store the number 5 (32 bit) on the stack
push eax // save the contents of register EAX on the stack
push dword ptr[edx] // save the contents of memory referenced by
// the EDX register
sub esp,4 // equivalent to 'push 5'
mov dword ptr[esp],5
sub esp,4 // equivalent to 'push eax'
mov dword ptr[esp],eax
}
To retrieve and remove a value from the stack, the pop
instruction is used,
which works in the opposite way to push
. First the value is read from the address
indicated by the ESP
register, then the ESP
register is incremented:
__asm {
push 5 // store 4 32-bit values on the stack
push eax
push dword ptr[edx]
push 13B0C032h
pop eax // remove the most recent value from the stack,
// which in this case is the number 13B0C032h
pop dword ptr[edx] // this operation does not change anything, since
// the value stored on the stack came from the
// location referenced by EDX and is simply being
// returned there
pop edx // put the value originally held by EAX into EDX
pop ecx // put the value 5 into register ECX
push 5 // store the value 5 on the stack
// the following instructions simulate 'pop eax'
mov eax,dword ptr[esp]
add esp,4
}
2.4. Limitations in Windows
If you have written assembly programs under MS/DOS, where there were no limitations, you will need to be aware that there are some differences under Windows. As I said earlier, in assembly we can use all the instructions that the CPU supports, however some instructions are not permitted by the operating system, in our case Windows. For instance, if we use I/O port instructions, the compiler will not give an error, but the program will most likely crash if these instructions are executed under Windows.
Instructions which can cause the program to be terminated include the above-mentioned I/O port instructions, as well as instructions that refer to interrupts, segment registers and control registers.
Regarding the segment registers, Windows uses the flat memory model, which means
that all code and data exists in the same memory space ranging from 0
up to
0xFFFFFFFF
. So, when accessing memory there is no need to bother with segment
registers. Unlike in MS-DOS, there is no need to use segment prefixes like DS:
.
3. Using assembly language
To take advantage of the benefits of assembly, you must first check whether your development tools allow its use. Products such as Borland Delphi, Builder, Watcom C++ or Microsoft Visual C++ allow you to use (compile) assembly code; Visual Basic is the only popular RAD package which does not allow writing code in assembly. These products support the use of assembly code in two ways. The first is called inline assembly, where the assembly code is inserted into the regular code written in e.g. C++. The second method is linking modules (i.e. separate files) written in assembly with modules written e.g. in Delphi or C++.
3.1. Inline assembly
Before you start writing assembly code, you must check how to write it, because there are two types of syntax for assembly code. The first type is called “intel syntax”, and is used in products such as Delphi, Builder, MSVC, Borland TASM, Microsoft MASM (assembly compilers). This syntax is now the standard and is used in 90% of sources. The second type is called “at&t syntax”, and is used e.g. in C compilers, such as GCC (Linux platform), DJGPP and LCC.
Inline assembly is the easiest way to write asm code. When writing assembly code in Delphi or Builder, it must be enclosed between the asm
keyword marking the beginning of the assembly code, and the end;
keyword after the code. For example:
// our first 'hello world' in assembly, Delphi version
asm // start of assembly code
mov eax,1 // move the value 0x00000001 into register EAX
// the C++ equivalent of this instruction is the
// assignment operator '=', e.g.
// x = 1;
// the Delphi equivalent is the assignment
// operator ':=', e.g.
// y := 1;
mov ecx,eax // move the contents of register EAX into
// register ECX, that is, the value 0x00000001
// will end up in ECX
shl ecx,2 // this 'Shift Left' instruction will shift the
// contents of register ECX to the left by 2 bits
// As you may know, left shifting serves to
// multiply values by successive powers of 2
// Shifting 0x00000001 to the left by two bits
// will result in the value 0x00000001 * 4 = 0x00000004
// saved to ECX
// in C++, bit shifts are achieved with the '<<'
// operator, e.g.
// x = y << 2;
// in Delphi, bit shifts use the same keywords as
// as assembly code, namely 'shl' or 'shr', e.g.
// x := y shl 2;
shr eax,1 // this 'Shift Right' instruction will shift the
// EAX register to the right by 1 bit
and eax,0 // 'And' is a logical multiplication of bits
// according to the following table:
// 0 * 0 = 0
// 1 * 0 = 0
// 0 * 1 = 0
// 1 * 1 = 1
// Any value multiplied by 0 will give 0; in this
// case, the EAX register will be zeroed out
// The C++ equivalent of this instruction is
// the '&' operator, e.g.
// x = y & 0;
// in Delphi:
// x = y and 0;
or eax,0FFFFFFFFh // 'Or' is a logical sum of bits according
// to the following table:
// 0 + 0 = 0
// 1 + 0 = 1
// 0 + 1 = 1
// 1 + 1 = 1
// in this case EAX will be ORed with the value
// 0xFFFFFFFF, which will result in the value
// 0xFFFFFFFF no matter what EAX contains
// The C++ equivalent of this operation is the
// '|' operator, e.g.
// x = y | 0xFFFFFFFF;
// in Delphi:
// x := y or $FFFFFFFF;
sub edx,edx // 'Subtract' subtracts the value of one register
// from another. In this case, EDX will become zero
// The C++ equivalent is '-', e.g.
// x = x - x;
xor eax,eax // 'eXclusive Or' follows this table:
// 0 ^ 0 = 0
// 1 ^ 0 = 1
// 0 ^ 1 = 1
// 1 ^ 1 = 0
// This function yields 1 when its two inputs are
// different; if they are the same it will give 0
// Hence the instruction 'xor eax,eax' will zero
// out the EAX register
// The C++ equivalent is the '^' operator, e.g.
// x = x ^ y
// in Delphi:
// x := x xor y;
end; // end of assembly code
Writing inline assembly in MSVC only really differs in how the assembly code is introduced to the compiler:
// our second 'hello world' in assembly
__asm { // start of assembly code
push 5 // save the value 0x00000005 on the stack
pop eax // remove 0x00000005 from the stack and write
// it to register EAX
push eax // save the contents of register EAX on the stack
// (in this case the value 5)
pop edx // remove the value 5 from the stack and write it
// to register EDX
mov ax,0FFFFh // write the value 0FFFFh to the 16-bit lower
// half of register EAX
mov dx,ax // write the value from register AX to the 16-bit
// lower half of register EDX
mov al,11 // write the value 11 (decimal) to the 8-bit
// lower half of register AX
mov ah,11h // write the value 11 (hex) to the 8-bit upper
// half of register AX, which is 17 in decimal
} // end of assembly code
3.2. Using variables in assembly
Writing in assembly, you have access to all global variables, and if the code is in a procedure, it also has access to the local variables and parameters of the procedure/function, so its capabilities are practically the same as normal code. An example of the use of global and local variables:
// global variables
var
ByteVar: Byte; // byte - 8 bits
WordVar: Word; // word - 16 bits
IntVar: Integer; // double-word - 32 bits
...
procedure noop;
// local variables of function 'noop'
var
LocalByte: Byte;
LocalWord: Word;
LocalInt: Integer;
begin
// initialise global variables
ByteVar := $FF; // 8-bit value
WordVar := $FFFF; // 16-bit value
IntVar := $FFFFFFFF; // 32-bit value
asm
mov al,ByteVar // write an 8-bit value to an 8-bit register
mov LocalByte,al // write an 8-bit value to a local variable
mov ax,WordVar // 16-bit value to 16-bit register
mov LocalWord,ax
mov eax,IntVar // 32-bit value to 32-bit register
mov LocalInt,eax
end;
end;
The example for MSVC is not much different from that of Delphi:
// global variables
char ByteVar;
short WordVar;
int IntVar;
...
void noop()
{
// local variables
char LocalByte;
short LocalWord;
int LocalInt;
// initialise global variables
ByteVar = 0xFF; // 8-bit value
WordVar = 0xFFFF; // 16-bit value
IntVar = 0xFFFFFFFF; // 32-bit value
__asm {
mov al,ByteVar // write an 8-bit value to an 8-bit register
mov LocalByte,al // write an 8-bit value to a local variable
mov ax,WordVar // 16-bit value to 16-bit register
mov LocalWord,ax
mov eax,IntVar // 32-bit value to 32-bit register
mov LocalInt,eax
}
}
You can write entire functions in assembly language. When doing this, there are a few things to keep in mind. If the function returns a value, we must ensure that the returned value is stored in the EAX
register before leaving the function. A simple example:
// Delphi version
function add(x, y:integer):integer;
asm
mov edx,x // copy the function's first parameter to EDX
mov ecx,y // copy the function's second parameter to ECX
add edx,ecx // add x and y together
mov eax,edx // write the result to register EAX
// this becomes the function's return value
end;
// C++ version
int mult(int x,int y)
{
__asm {
mov edx,x // copy the function's first parameter to EDX
mov ecx,y // copy the function's second parameter to ECX
imul edx,ecx // multiply x by y
mov eax,edx // write the result to register EAX
// this becomes the function's return value
}
}
We already know that functions written in assembly must place the return value in the EAX
register, but what about the other registers?
In short, registers EAX
, EDX
, and ECX
may contain any value when the function exits, but registers EDI
, ESI
, EBX
, and EBP
generally must not change (their value must be the same as it was before the call). You may wonder why this is the case. Well, the code produced by the compilers of the HLL (high-level language) use this second group of registers throughout the program to hold e.g. addresses of functions, constants, etc., and if they are changed by a function, code that runs later may use invalid values, which can cause anything from data corruption to a crash. It is easy to prevent such errors:
// Delphi version
function count(w,x,y,z:integer):integer;
asm
push edi // save the contents of registers EDI, ESI and EBX
push esi // on the stack
push ebx
mov edi,w // copy each function parameter to a register
mov esi,x
mov edx,y
mov ebx,z
add edi,esi // w + x
add edx,ebx // y + z
imul edi,edx // (w+x) * (y+z)
xchg eax,edi // 'eXCHanGe' swaps the contents of two registers
// in this case EAX and EDI, in other words,
// the old value of EAX is now in EDI, and the
// old value of EDI is now in EAX, which becomes
// the function's return value
pop ebx // Remove the saved values of the registers from
pop esi // the stack, and put them back in the registers
pop edi // We must remove the values in reverse order -
// looking at the code we can see that it is
// 'symmetrical'. If the values were saved in the
// order EDI, ESI, EBX, then they must be removed
// in the order EBX, ESI, EDI
end;
In addition to the registers EDI
, ESI
, EBX
, and EBP
, the status flag DF
(Direction Flag) is expected to be zero (cleared) before and after any call. Just use the CLD
instruction if its status is changed within the function.
When writing code in assembly that uses the stack, special attention should be paid to ensuring that the stack pointer ESP
is always restored. E.g. if the procedure or function stores something on the stack, then this item must be removed before exiting the function. This time we'll look at an example in MSVC:
// example of an encryption function
void crypt(unsigned char *string)
{
__asm {
push edx // save the contents of register EDX on the stack
mov edx,string // grab the parameter from the stack; in this case
// a pointer to the string we must encrypt
cmp edx,0 // check whether the parameter is valid
je _exit_encrypt // if invalid, exit the function
_encrypt_loop:
mov al,byte ptr[edx] // load the next byte of the string
cmp al,0 // check for the end of the string
// strings are represented as ASCII; byte 00h
// means end-of-string
je _exit_encrypt // once we reach the end of the string, exit
xor al,7 // encrypt the byte with a simple xor
mov byte ptr[edx],al // store the encrypted byte in the string
inc edx // set the string pointer to point to the next byte
jmp _encrypt_loop // go to the start of the loop so that the
// process repeats
_exit_encrypt:
pop edx // IMPORTANT: correct the stack, and restore the
// register EDX to its original value
}
}
3.3. Calling functions from assembly
Sometimes in assembly code you will need to call a function written in another language. How is this done? Very simply, a function is called with the instruction call func_name
. It is worth noting that there are several ways to call and “clean up” after a function:
Name | in C code | Parameters | Return values | Modified registers | Info |
cdecl | cdecl | passed on the stack; the parameters are not removed by the function | eax, 8 bytes: eax:edx | eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7 | This is the method of calling C library functions, introduced by Microsoft. All system functions on the Linux platform also use this convention |
fastcall | __fastcall | ecx, edx, any remaining parameters are passed on the stack | eax, 8 bytes: eax:edx | eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7 | Microsoft introduced this standard, but later switched to the cdecl convention in its products |
watcom | __declspec (wcall) | eax, ebx, ecx, edx | eax, 8 bytes: eax:edx | eax | This function calling convention was introduced by Watcom in their C++ compiler |
stdcall | __stdcall | passed on the stack; parameters are removed by the function | eax, 8 bytes: eax:edx | eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7 | The default calling convention for Windows API functions in DLLs |
register | n/a | eax, edx, ecx, any remaining parameters are passed on the stack | eax | eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7 | This is the calling convention used in Borland's Delphi |
The correct calling convention for functions in our own programs (as opposed to WinApi) often depends on the options with which the program was compiled. In Delphi the default convention is “register”, while for most programs written in C, the default is “cdecl”.
WinApi functions (Windows system functions) use the mechanism stdcall, where function parameters are first stored on the stack, and then the function is called. After the function returns, there is no need to adjust the stack (remove the previously saved parameters), since the called function does it for us. Interestingly, a few WinApi functions do not use the stdcall convention, but instead use cdecl, that is, the parameters are stored on the stack, then the function is called, but afterwards the stack must be cleaned up manually. An example of such a function is the wsprintfA
function from the Windows system library user32.dll (whose counterpart in the C standard library is sprintf
). The cdecl was probably chosen because these functions do not have a fixed number of parameters:
// global string
unsigned char title[] = "The values of x and y";
...
// this function changes the values x and y into ASCII form, after which
// a message box is displayed showing x and y in their string form
unsigned int int2str(unsigned char *buffer, unsigned int x, unsigned int y)
{
// local string, accessible only by the function int2str
unsigned char format[] = "x = %lu\ny = 0x%X\n";
__asm {
// Note the way in which the parameters of the function are passed.
// In C++, the function call would look like this:
// wsprintf(buffer, "x = %lu\ny = 0x%X\n", x, y);
// In assembly the parameters are pushed onto the stack in reverse
// order, after which the function is called.
push y // save y on the stack
push x // save x on the stack
lea eax,format // load the address of the local string into EAX
push eax // save the address of this string on the stack
push buffer // save the pointer to the output buffer, where
// the formatted text will end up
call wsprintfA // call this WinApi function
add esp,4*4 // clean up the stack - 4*4 = 16 bytes. This is
// how much space was taken by the parameters
// saved on the stack before the function was called
// When writing code e.g. in C++, the compiler
// takes care of this for you, but in assembly you
// must do this yourself
push MB_ICONINFORMATION // specifies the icon that will appear
// next to the text in the message box
push offset title // the window title (a global variable); we use
// the keyword 'offset' because we want to write
// the address of the string to the stack
push buffer // the text which will appear in the message box
push 0 // handle of the parent window
call MessageBoxA // show the message box
}
}
4. MMX instructions
MMX is the name of an extension to the Pentium series of processors, introduced by Intel. The name is said to be an abbreviation of “MultiMedia eXtensions”, but Intel denies this, and has never explained the issue. The MMX extension to the Pentium line of processors includes a set of new instructions (57, to be exact), and 8 additional 64-bit registers.
MMX registers are shared with the FPU registers. This means that you cannot mix FPU (Floating Point Unit) instructions with MMX unit instructions otherwise the contents of the registers will be corrupted. MMX instructions can operate on data in SIMD fashion (Single Instruction Multiple Data). This means that one operation can be performed simultaneously on many data items, which is not possible using standard x86 instruction.
MMX instructions are ideal for processing multimedia data, e.g. video, graphics, sound. For example, programs such as DivX or Winamp make intensive use of MMX code. Currently, most processors produced by Intel, AMD and Cyrix possess MMX support.
Although MMX has for quite a few years been practically standard, HLL compilers generally do not generate MMX code (except specialised compilers like VectorC). It seems that the natural solution is to program MMX in assembly.
Writing procedures using MMX can sometimes get a 100% speed increase compared to the original code. This is possible because of the aforementioned SIMD mode. Imagine a situation where we have two tables of 8 bytes, and we want to add corresponding bytes from both tables to each other. In C++ we would do it this way:
unsigned char table1[] = { 0x0A,0x1A,0x2A,0x3A,0x4A,0x5A,0x6A,0x7A };
unsigned char table2[] = { 0xA7,0xA6,0xA5,0xA4,0xA3,0xA2,0xA1,0xA0 };
...
for (int i = 0; i < 8; i++)
{
table1[i] += table2[i];
}
There's no problem with this, but the operation of adding bytes will be repeated 8 times. Let's look at how this can be done much more efficiently by using MMX:
__asm {
movq mm0,qword ptr[table1] // load 8 bytes from the first table
// into register MM0
movq mm1,qword ptr[table2] // 8 bytes from the second table into MM1
paddb mm0,mm1 // add the bytes from MM1 to MM0
movq qword ptr[table1],mm0 // write the result back to table1
}
In total, just one instruction is executed instead of 8 additions. Neat, isn't it? And more importantly, efficient. Here a few examples of graphical functions:
#define IMG_WIDTH 640
#define IMG_HEIGHT 320
...
//
// this function initialises the MMX unit
// it should be called:
// - before using the MMX unit for the first time
// - after using MMX when we intend to make use of the FPU
// - after using the FPU when we intend to make use of MMX
//
void InitMMX()
{
__asm emms; // Empty MultiMedia State;
} // initialises the MMX unit
//
// a fadeout effect of the screen (fullscreen)
//
void fadeout(DWORD *lpScreen,DWORD iRounds)
{
__asm {
mov edx,iRounds // load the total number of repetitions
mov eax,03030303h // mask for each component of a pixel;
// reducing the value of each RGB
// component gives the impression of a
// fading image
movd mm0,eax // transfer the mask to the lower half
// of register MM0
punpckldq mm0,mm0 // copy the mask to the upper half of MM0
// such that its full value becomes
// 0x0303030303030303
// (recall that MM0 is a 64-bit register)
pxor mm1,mm1 // zero out register MM1
_fadeout_max:
paddb mm1,mm0 // multiply the mask, which will be
// subtracted from the components of
dec edx // pixels by the number of rounds
jne _fadeout_max //
mov eax,lpScreen // load the pointer to the image buffer
// into register EAX
// the number of pixels divided by 2
// we divide by 2 because by using MMX we
// can process 2 pixels simultaneously
// (MM1 is an 8-byte register, but each
// pixel is only 4 bytes)
mov ecx,(IMG_WIDTH*IMG_HEIGHT) / 2
_clear_screen_2_mmx:
// load 2 pixels from the image buffer
// into MM0
movq mm0,qword ptr[eax]
psubusb mm0,mm1 // subtract our mask from all components
// (bytes) of those 2 pixels
// Both the mask and the pixels are
// treated as tables of 8 separate bytes
// SIMD-style
// write the 2 modified pixels back to
// the image buffer
movq qword ptr[eax],mm0
add eax,8 // update the pointer to the image buffer,
// ready for the next 2 pixels
dec ecx // reduce the loop counter (the loop will
// repeat for the number of pixels / 2)
jne _clear_screen_2_mmx
}
}
//
// image negative effect
//
void negative(DWORD *lpScreen)
{
__asm {
mov eax,lpScreen // load the pointer to the image buffer
// into EAX
// write the pixel count / 4 into ECX,
// since we will process 4 pixels at once
mov ecx,(IMG_WIDTH*IMG_HEIGHT) / 4
pcmpeqb mm7,mm7 // set register MM7 to 0xFFFFFFFFFFFFFFFF
_neg_mmx:
// load 2 pixels from the image to MM0
movq mm0,qword ptr[eax]
pxor mm0,mm7 // XOR-ing with all 1s works like the
// logical 'NOT' function
movq qword ptr[eax],mm0
// repeat with the next 2 pixels
movq mm0,qword ptr[eax+8]
pxor mm0,mm7
movq qword ptr[eax+8],mm0
add eax,16 // update the pointer to the image
dec ecx // and the loop counter
jne _neg_mmx
}
}
//
// image blur effect
//
void blur(DWORD *lpScreen)
{
__asm {
push esi // save registers ESI and EDI
push edi
mov esi,lpScreen // load the pointer to the image buffer
// into ESI
mov ecx,( (IMG_WIDTH*IMG_HEIGHT) - (IMG_WIDTH*8) + 4 )
mov eax,IMG_WIDTH*4 // the width of a line in the image
mov edx,IMG_WIDTH*8 // the width of two lines
lea esi,[esi+eax+4] // set the pointer to the first pixel
// of the second line of the image
pxor mm7,mm7 // zero out MM7
movd mm0,[esi-4] // read pixel to the left into MM0
_blur_more:
movd mm1,[esi+4] // read pixel to the right into MM0
mov edx,esi
sub edx,eax
movd mm2,[edx] // read pixel above into MM2
movd mm3,[esi+eax] // read pixel below into MM3
punpcklbw mm0,mm7 // unpack the components of 4 successive
punpcklbw mm1,mm7 // pixels into WORDs
punpcklbw mm2,mm7
punpcklbw mm3,mm7
paddusw mm0,mm1 // add the components of the 4 pixels
paddusw mm0,mm2
paddusw mm0,mm3
psrlw mm0,2 // divide this sum by 4, in this way
// we find the 'average' of the 4 pixels
packuswb mm0,mm7 // pack the components (each of which is
// a WORD) back into a single DWORD
movd [esi],mm0 // write the pixel to the image buffer
add esi,4
dec ecx
jne _blur_more
pop edi
pop esi
}
}
5. When to use assembly
As I mentioned at the beginning of the article, assembly is used mainly where speed is important. When writing an algorithm, we should sometimes stop and ask ourselves whether our program could be enhanced, if at some critical points (for instance in loops, etc.), we were to employ, say, MMX.
Imagine that you just wrote an mp3 encoder, and a competitor did the same, but you used hand-written MMX code which is three times faster than the competition. Which product will users choose, when they can complete a task in 10 minutes instead of 30? The answer is obvious.
Besides being ideal for writing algorithms that require speed, assembly is also used to write particular programs such as EXE-compressors. I'll bet that most people will think of programs like UPX or Aspack, which are used to compress executables. Put simply, if you write a program which occupies let's say 700 kB, when compressed by UPX its size will decrease to approx. 300 kB, but the program will still be in the form of an EXE file, and will be just as functional as before compression. This is achieved by using assembly to write a loader for the code. This is a fragment of code that is stored in the EXE file (almost like a virus), and when you start such a program, the loader decompresses the remainder of the EXE file and allows it to run. Writing a loader in a HLL, whether it be C++, Delphi or even Power Basic is virtually impossible.
It can be said that assembly programming is only useful for speed and unusual applications, but this is not entirely true. Writing in assembly language can be more than just inline routines and a few procedures here and there. Entire programs can be written in assembly language! Sometimes I hear people say that it is impossible; that you can't write large applications in assembly from scratch. Often these are people who have only dabbled in assembly for a few hours. If you are a competent programmer, there is nothing stopping you from building professional applications in assembly language. Writing programs in assembly gives us full control over them. Everything is up to us, the program is executed according to our will, and we are not at the mercy of the compiler.
These days, writing in assembly is reasonably simple and convenient. A lot of people around the world are beginning to see the magic of this language. People are creating many projects; you can find a whole bunch of sample tutorials and source code, thanks to which many challenges have ceased to be problems. Writing entire applications in assembly also has the advantage that a project with 5MB of source code will be compiled to an executable of approximately 90kB. Compare an application written in Delphi 6, containing 1 window, which takes approx. 300kb compiled, to a program written in assembly language which does exactly the same thing, and works on every Windows release from 95 to XP, with just a 4kb executable. Why the big difference? It's simple: the compiler adds a lot of unnecessary things, “just in case”. Why isn't this made more efficient? We should ask the companies who make compilers.
Despite the fact that assembly can be used for many useful things, it is also used to write malicious programs, such as viruses, ransomware, or exploits, but in the words of Winnie the Pooh, that is a story for another day...
6. Summary
These examples represent only a small range of what is possible with assembly. There is a lot to discover, just as much for me as there is for you, because contrary to what they say, assembly is not dead, it is constantly changing, evolving, giving us possibilities which do not exist in any high-level language. The terms we hear in the press: SSE, SSE2, 3DNow, are not fiction. Everything is out there. We just have to reach for it.
For my part, writing assembly language gives me a feeling of freedom, which I never found when writing in any other language. I hope that your journey into assembly doesn't end with this article!
7. References
a page for assembly programmers, sources, tutorials, forums | |
www.int80h.org | FreeBSA assembly programming |
programming Windows graphics, algorithms, fractals | |
Chris Dragan's page, many samples in assembler (MMX) | |
www.azillionmonkeys.com/qed/index.html | an excellent articles about low level code optimization (MMX, Pentium) |
Assembly Programming Journal, a computer programming magazine for the assembler language, C libraries code optimization, assembly programming for Unix shells, game programming in assembly with DirectX and many other interesting resources | |
www.nasm.us | an official page for the free NASM assembler framework (Windows, Unix) |
www.borland.com/Products/Software-Testing/Automated-Testing/Devpartner-Studio | SoftIce, debugger that let you analyze any application on high and low level formats |