              SYSTEM64 HARDWARE MAINTENANCE SERVICE v1.210

				   by

		   Desmond Germans (Simm / Analogue)



INTRODUCTION
------------
SYSTEM64 is Analogue's proprietary protected mode system especially taylor-made
for demoprogramming. It supports raw, XMS and VCPI (EMM) protected mode as well
as simple datafile loading, IRQ handling, heap management and things like that.
Together with SYSTEM64 comes DEBUG64 which interactively lets you view your
code in action. Currently SYSTEM64 and DEBUG64 are under development and
Analogue internal use, so please don't spread it around too much. I don't ask
that because my code is supposed to be a secret or something, but because I
don't want to put in all the effort to really release this. Also, use this at
your own risk. If your computer hangs up or if Windows 95 crashes, DON'T call
me, DON'T ask me for a bugfix and DON'T blame it on my code, atleast that is
what you agree with when you use this product. The only reason I released this
far-from-perfect prerelease is because of many people asking for it.
I will maybe make a descent public domain version with the musicplayer and an
example demo (maybe even the source of something we released), but right now I
don't feel like that.
This documentation is not available in German, Dutch, Finnish or Swedish.


SYSTEM REQUIREMENTS
-------------------
To be able to use SYSTEM64, you need atleast a 386 compatible PC with let's say
4MB of memory. For a demo which is coded with SYSTEM64 you'd most certainly
need a Gravis Ultrasound and a fast VGA card, and probably something much
better than a 386 PC :)


PROGRAMMING STYLE WITH SYSTEM64
-------------------------------
Programming in protected mode is easy... Very easy. For instance, you don't
need to worry about segment registers, you got 32-bit register power at your
fingertips and most of all, you can access as much memory as you wish at once.
However, switching from real mode to protected mode properly is a menace, so
SYSTEM64 does that for you. What is important for the programmer is that after
initialization of SYSTEM64, the processor is running in CPL0 (Current
Priviledge Level 0 - the system is yours) protected mode.
The architecture used can be called 'semi-flat' memory model. There is one big
code segment called 'pcode' and one big data segment called 'data'. All
pointers are near pointers relative to one of these. After initialization,
SYSTEM64 will jump to an external label called 'main' which is the starting
point of your code. To exit to DOS again, jump to SYSTEM64's label called
'exit'. In a simple overview, this all means:

after initialization, the registers are set to:

	EAX = 00000000				CS = pcode
	EBX = 00000000				DS = data
	ECX = 00000000				ES = data
	EDX = 00000000				FS = data
	ESI = 00000000				GS = pcode (for SMC)
	EDI = 00000000				SS = pstack
	ESP = points to empty stacktop
	EBP = 00000000
	EIP = main

WARNING: Because SYSTEM64 uses these segment register settings internally when
processing IRQ's and exceptions, it is very hazardous to change any of them !!


A MODEL SOURCE FILE
-------------------
Here is a source file which can serve as a model for SYSTEM64 programming
(MODEL.ASM):

	.386
	locals

	include s64.ash

	public main,datafile


	pcode		segment public use32
			assume cs:pcode,ds:data

	main		proc near

			; your code comes here
			jmp exit

	main		endp

	pcode		ends


	data		segment public use32

	datafile	db 0

	data		ends


			end


USING DATAFILES WITH SYSTEM64
-----------------------------
When you want to use a datafile with your project, change the variable called
'datafile' in 'data' to an ASCIIZ-string containing the name of the datafile.
This datafile must be present in the current directory together with the
executable. So if you have a file called "dope2.dat" in the current directory,
you change the datafile declaration into:

        datafile        db "dope2.dat",0


ASSEMBLING AND LINKING WITH SYSTEM64
------------------------------------
When you use TASM and TLINK (as I do), you can supply it with several options.
Currently I use the following TASM options:

	/s	- source code segment ordering
	/ml	- case sensitivity on all symbols
	/m2	- 2-pass mode
	/p	- check code for CS overrides in protected mode
	/q	- suppress OBJ records not needed for linkage
	/zn	- no symbolic debugging info

And the following TLINK options:

	/x	- no map file
	/c	- case sensitivity on symbols
	/3	- enable 32-bit records

You can put these options in TASM.CFG and TLINK.CFG response files so you don't
have to type them all again each time if you are going to use them.
If you have a one-file project as the above 2 examples, you can TLINK the
objects like this:

	TLINK S64 <file>,RUN

and then run the project by starting RUN.EXE. Ofcourse more-file projects can
be linked the same way like this:

	TLINK S64 <file1> <file2> ... <filen>,RUN

It is also quite easy to use a makefile for this or some other utility. I don't
believe it's possible to link S64 with compiled C or Pascal (certainly not 16-
bit C or Pascal) but who knows what wonders are still awaiting the world.


SYSTEM64 SERVICES
-----------------
Apart from simple initialization, SYSTEM64 also supports some features of a
demo-oriented system. Following is a reference list of all service functions
supported in SYSTEM64 v1.000:

        stub            This is a routine that does absolutely nothing!

        exit            Exit to DOS. This just switches back to real mode and
                        returns back to the crippled world of DOS.

        panic           Panic to DOS with a message. If something goes wrong in
                        the demo you can call 'panic' with the pointer to a
                        '$'-terminated error message in eax. Note that this
                        error message must reside within 64Kb from the start of
                        'data' because the error is displayed using a real mode
                        routine.

        set_exit        Sets up a custom exit routine which has to be called
                        just before returning to DOS. This routine will then be
                        called each time an exception occurs, a jump/call to
                        'panic' occurs or a jump/call to 'exit' occurs. This is
                        useful to turn off annoying soundcard tones or loops
                        just before exceptions or panic situations occur. The
                        exit routine address needs to be in eax.

        heap_avail      Returns the available amount of bytes left on the heap
                        in eax.

        heap_alloc      Allocates eax bytes on the heap and returns a pointer
                        to the allocated region in eax. If there is not enough
                        space on the heap, 'panic' is called with an
                        appropriate error message.

        heap_mark       Marks the current state of the heap in eax. This can be
                        used together with heap_release to allocate and
                        deallocate things on the heap easily.

        heap_release    Releases the current heap state from eax. Everything
                        allocated AFTER 'heap_mark' is deallocated. This is the
                        only way of deallocating something on the heap because
                        the heap is not as sophisticated as heaps in higher
			programming languages.

	heap_reset	Deallocates EVERYTHING on the heap.

        set_irq         Modifies the IDT to have the appropriate vector point
                        to a new IRQ handler. This IRQ handler must handle
                        interrupt acknowledgement and may not modify the
                        registers and must exit using an IRETD instruction. The
                        IRQ number is in bl and the address of the routine in
                        eax.

        get_irq         Returns a pointer of the current IRQ handler for a
                        certain IRQ. The IRQ number is in bl and the returned
                        pointer is in eax.

        int_rm          Performs a real mode interrupt call. This can be useful
                        in very rare occasions. Note that a true protected mode
                        to real mode switch occurs, so interrupts are set back
                        to their original real mode handlers. This means that
                        when playing music or something like that, the music
                        will stop during the real mode interrupt. The register
                        values for the interrupt are in 'rm_ax'..'rm_es' and
                        the interrupt number is in al.

Functions supported in SYSTEM64 v1.001 are:

        set_color       change a VGA DAC palette entry. This function is
                        supported by SYSTEM64 to make the debugger aware of
                        color changes so it can properly show the screen when
                        pressing F5. the color needs to be in EAX like this:

                                EAX = 0BBGGRRXXh

                        where BB is the blue value (0..3F), GG is the green
                        value (0..3F) and RR is the red value (0..3F). XX is
                        the color index number (0..FF).

        set_palette     change the entire VGA DAC palette. ESI points to a
                        768-byte array of R,G,B-records for each color.

These functions were supported to ensure that DEBUG64 sets the colors to the
right values each time. Not too elegant, but you wanted a prerelease :)

Functions supported in SYSTEM64 v1.203 are:

        enable_irq      hardware enable the IRQ signal for the IRQ in bl.

        disable_irq     hardware disable the IRQ signal for the IRQ in bl.

Functions supported in SYSTEM64 v1.205 are:

        unchain_vga     initialize mode X (and instruct debugger if needed)

        chain_vga       deinitialize mode X (and instruct debugger if needed)

Functions supported in SYSTEM64 v1.209 are:

        heap_alloc4     allocate memory block on the heap, dword aligned.

Note: chain_vga and unchain_vga are removed in v1.210 because mode X is not for
      me and it's also not faster than mode 13 if you do what I do with it.
      (ofcourse nobody says that you can't put them back in :))

Global variables in SYSTEM64 v1.000 are:

        sys_type        a dword indicating the type of protected mode system.
                        0 = raw, 1 = XMS and 2 = VCPI (EMM).

        data_base       When using datafiles, this points to a memory region
                        (relative to 'data') where the datafile is loaded.

        psp             This points to the PSP (relative to 'data') if you need
                        information from there.

        rm_al           These are the real mode register copies for the
	rm_ah		'int_rm' function call.
	rm_ax
	rm_bl
	rm_bh
	rm_bx
	rm_cl
	rm_ch
	rm_cx
	rm_dl
	rm_dh
	rm_dx
	rm_bp
	rm_si
	rm_di
	rm_ds
	rm_es

	rm_data 	Segment value corresponding to 'data'.

Global variables in SYSTEM64 v1.001 are:

        page_base       base of a 64K area which can be used as a VGA double
                        buffer for animation purposes. With animation, write
                        everything you want to page_base and then when you are
                        finished, copy page_base to vid_base to update the
                        screen.

        vid_base        base of the 64K VRAM window at A0000 in system memory.

        rm_flags        this should fix a nasty bug in the int_rm routine, but
                        I'm not sure yet :)

Global variables in SYSTEM64 v1.205 are:

        page_base       base of a 256K (!!!) area which can be used as a VGA
                        double buffer for animation purposes.


THE HEAP UNDER SYSTEM64
-----------------------
SYSTEM64 supports a very simple heap-like memory management. The SYSTEM64-
compliant memory map looks like this:


	+--------+ <- heap top
	|	 |
	:	 :
	   FREE
	   HEAP
	:	 :
	|	 |
	+--------+ <- heap pointer
	|  USED  |
	|  HEAP  |
	+--------+ <- heap base

	+--------+
	|	 |
	|  DATA  |
	|  FILE  |
	|	 |
        +--------+ <- 'data_base'

        +--------+
        |  PAGE  |
        | BUFFER |
        +--------+ <- 'page_base'                          above 1MB
   ....................................................................
	+--------+					   below 1MB
	|	 |
	| STACK  |
	|	 |
	+--------+ <- 'pstack' (ss)
	|	 |
	|  DATA  |
	|	 |
	+--------+ <- 'data' (ds, es, fs)
	|	 |
	|  CODE  |
	|	 |
	+--------+ <- 'pcode' (cs, gs)


Where all these regions are physically is unknown because of the different
protected mode systems. SYSTEM64 calculates all pointers relative to 'data', so
no segment register reloading is necesary.

Allocating something on the heap with 'heap_alloc' is very simple. Let's say
you want to allocate 1MB of data. You put 000100000h in eax and call
'heap_alloc':

	heap before			heap after
	+--------+ <- heap top		+--------+ <- heap top
	|	 |			|	 |
	:	 :			:	 :
					   FREE
	   FREE 			   HEAP
	   HEAP 			:	 :
					|	 |
	:	 :			+--------+ <- heap pointer
	|	 |			|	 |
	+--------+ <- heap pointer	+  USED  + <- returned eax
	|  USED  |			|  HEAP  |
	|  HEAP  |			|	 |
	+--------+ <- heap base 	+--------+ <- heap base

Allocating can go on until the heap is full, so a primitive way of saving and
restoring the heap state is also supported. Let's say you want to allocate 1MB
on the heap and then deallocate it again. Now, you mark the state of the heap
in a variable with 'heap_mark' before you allocate anything. This goes as
follows:

        'heap_mark' returns the current heap pointer in eax which you can store
        in a variable.

        'heap_alloc' allocates the wanted 1MB and returns a pointer in eax.

	You use the allocated region.

        'heap_release' restores the current heap pointer back to it's saved
        state, in effect deallocating the 1MB region.

For demos, it's maybe best to do the following:

        Allocate some general stuff on the heap, things you are always going to
        use during the whole demo.

	Mark the state of the heap.

	For each event:

		Allocate event-specific stuff on the heap.

		Use it.

                Release the state of the heap with the earlyer marked value.

It is not necesary to deliver an empty heap before exiting to DOS because DOS
doesn't understand what you're doing with a protected mode heap, and all memory
is 'physically' deallocated before exiting to DOS anyway.


CUSTOM IRQ HANDLERS UNDER SYSTEM64
----------------------------------
By default SYSTEM64 has simple empty IRQ handlers for IRQ0,IRQ2..IRQF which
only acknowledge the interrupt. IRQ1 triggers a 'keyboard break fault'
exception which returns to DOS. This can be considered as the protected mode
ctrl-break in SYSTEM64.
To use custom IRQ handlers, you can use 'set_irq' and 'get_irq'. 'set_irq' sets
up a new IRQ handler for a specific IRQ number and 'get_irq' returns a pointer
to the current IRQ handler. A new IRQ handler must do the following things:

	1. NOT modify ANY of the registers (use PUSHA/POPA or something)
	2. Acknowledge the interrupt by outputting 020h to I/O-port 020h
	3. return to normal processing with an IRETD instruction

These rules are not SYSTEM64 specific, they're just the way Intel and IBM want
their IRQ's to be handled, and IRQ's in SYSTEM64 are mapped DIRECTLY into the
IDT (so no safe and slow routine shells around it) :)


REAL MODE INTERRUPTS UNDER SYSTEM64
-----------------------------------
Sometimes, you want to call a real mode interrupt. Advice is against this, but
there are some situations where a real mode interrupt is easyer than something
else (setting a video mode, waiting for a key and you're too lazy to write your
own IRQ1 handler, etc.). SYSTEM64 supports this as well. A group of real mode
register copies are available in 'data' to be used for this purpose. A simple
example which sets the VGA to 320x200 256 color mode:

	mov [rm_ax],00013h			; real mode ax = 00013h
	mov al,010h				; interrupt number 010h
	call int_rm				; do it

Actual setting of the VGA to 320x200x256 is quite useless because S64 does
that already.


SYSTEM64 FUNCTIONS - QUICK REFERENCE
------------------------------------

	stub		do nothing.
			in: -
			out: -

	exit		return to DOS.
			in: -
			out: -

	panic		Panic to DOS with message.
			in: eax = pointer to message
			out: -

	set_exit	Setup custom exit routine.
			in: eax = pointer to custom exit routine
			out: -

	heap_avail	Return free heap size.
			in: -
			out: eax = free heap size

	heap_alloc	Allocate heap block.
			in: eax = size in bytes
			out: eax = pointer to allocated heap block

        heap_alloc4     Allocate dword-aligned heap block.
                        in: eax = size in bytes
                        out: eax = pointer to allocated heap block

	heap_mark	Mark heap state.
			in: -
			out: eax = current heap state

	heap_release	Release heap state.
			in: eax = heap state
			out: -

	heap_reset	Reset heap.
			in: -
			out: -

	set_irq 	Setup custom IRQ handler.
			in: eax = pointer to custom IRQ handler
			    bl = IRQ number
			out: -

	get_irq 	Get address of current IRQ handler.
			in: bl = IRQ number
			out: eax = pointer to current IRQ handler

	int_rm		Real mode interrupt.
			in: al = interrupt number
			    rm_?? = real mode register values
			out: rm_?? = real mode register result values

        set_color       Set VGA DAC palette color.
                        in: eax = color data (bbggrrxx)
                        out: -

        set_palette     Set entire VGA DAC palette.
                        in: esi = pointer to palette array
                        out: -

        enable_irq      Hardware enable IRQ signal.
                        in: bl = IRQ number
                        out: -

        disable_irq     Hardware disable IRQ signal.
                        in: bl = IRQ number
                        out: -


USING DEBUG64
-------------
Using DEBUG64 is pretty simple and straightforward. When you want to use
DEBUG64, simply link S64D.OBJ instead of S64.OBJ with the project and watch the
debugger unfold after running the executable. To be able to do this,
DEBUG64.EXE must be in the same directory as the executable. A quick overview
of the screen:

	+-----------+----+
	|	    |regs|
	|	    +----+
	|   code    |flgs|
	|	    +----+
	|	    |heap|
	+-----------+----+
	|   data    |stck|
	+-----------+----+

The keys that can be used here are:

	F1 = switch data window to BYTE or ASCII BYTE mode
	F2 = switch data window to WORD mode
	F3 = switch data window to DWORD mode
	F4 = enter address to have data window point to
        F5 = show contents of page_base on full screen
	F6 = force step over
	F7 = trace into
	F8 = step over
	F9 = run
	Q = quit to DOS
	A = about
	cursor keys = scroll data window

From some version number on, you can capture hicolor images by pressing C
during the user-screen mode. This only works on S3 chips and actually I never
tested or finished it anywhere else but home. Also, it could be that you have
a special version with some Pentium messages flying by. Ignore these, they are
test by me.

Note: The instruction disassembly is far from complete, so don't use the F8 key
      too bluntly because chances are you'll hit an "unsupported" instruction
      and desync the debugger with the processor. The program will then run on
      until it is supposed to end... ...without any debugger information.


PRACTICAL USE IN DEMOS
----------------------
I'm currently working on a small demo shell which uses SYSTEM64 as a base
(there is actually nothing very difficult to do). This demo shell should do the
following:

        initialize soundcards (GUS preferred)
        initialize music player
        initialize user shit
        start playing music
        do all the events
        stop playing music
        deinitialize user shit
        deinitialize music player
        deinitialize soundcards

For that I have made XMGUS.ASM and TIMER.ASM which respectively control the GUS
with the playing of an XM soundtrack and IRQ0 to process the playing of this
music. XMGUS and TIMER will be released with the public domain version, so
wait for that or write your own stuff.
For reference, watch the TEST.ASM program. Because all the demodata is loaded
before the demo is started, it is not necesary to use disk access. I thought of
the following scheme for doing the events. For each event, do:

        initialize the timing routine (see below)
        wait until event has to start by checking music_state in XMGUS
        do the event loop:
                update values from the timing routine
                process the data and build the screen
                see if the event has to stop by checking music_state
        deinitialize the timing routine

Each event can have its own timing routine which updates special volatile
event-specific variables each 1/70th of a second. The event loop must copy
these in order to use them. To process the data and build the screen, I am used
to:

        clear page_base (or not if you want some nice effects with it :) )
        draw your stuff on page_base
        wait for the vertical retrace
        copy page_base to vid_base to update the VRAM

Again, these are just what I do, so if you have better ideas, use them instead.


PROTECTED MODE PROGRAMMING TIPS
-------------------------------
Protected mode optimization is a little different that real mode or Motorola
32-bit optimization. Below are a set of small tips to improve performance of
your low-level code. All these tips help a little, but you have to understand
that when coding an algorithm, slowness is mainly caused by stupid design of
the algorithm itself than using slow instruction passages.
Also, when developing some algorithm that you have trouble understanding,
first try it out in some higher level language. This might seem like a big
lump of useless time, but you'll see that it works much faster than when you
directly code it in assembly and sit 4 weeks at your workstation trying to
find out why it doesn't work.

* Use either 32-bit or 8-bit registers where possible. Native mode Intel
  Protected mode simply swaps all word-opcodes with dword-opcodes. To use a
  word (which sounds faster because it's 16 bits), you need to tell the
  processor to switch to Compatible mode (8088, real mode, V86 mode or
  whatever). This is done by a prefix-byte called OPSIZE: which switches the
  way the processor interprets the operands. Using this prefix will COST you
  a cycle rather then GIVING you one. Example:

        mov ax,[eax]    codes as:       66 89 00        2 cycles
        mov eax,[eax]   codes as:       89 00           1 cycle

* Use Native mode memory operands wherever possible. The same thing as above
  applies to memory locations. If you use [di] or [bx + si] or some other
  familiar 8088 address, the processor needs to briefly switch to Compatible
  mode to interpret the address operand. This also costs a cycle. Example:

        mov eax,[bx]    codes as:       67 89 07        2 cycles
        mov eax,[ebx]   codes as:       89 03           1 cycle
        mov ax,[bx]     codes as:       66 67 89 07     3 cycles !!!

* Align your data to a 4-byte boundary on speed-critical routines.

* Use DEC/JNZ rather than LOOP.

* Use MOV/INC rather than MOVSx/STOSx/LODSx. Actually, use only MOV/MOV instead
  of MOVSx.

* Avoid AGI's (Address Generation Interlocks) by putting instructions that
  have nothing to do with eachother after eachother. Uhm, example:

        shld ebx,ecx,8
        mov bh,ch               ; depends on last instruction
        add ebx,[boing]         ; depends on last instruction
        mov al,[ebx]            ; depends on last instruction
        mov [edi],al            ; depends on last instruction
        inc edi                 ; depends on last instruction
        add ecx,edx
        dec ecx
        jnz @@ome_willem        ; depends on last instruction

  better is:

        shld ebx,ecx,8
        mov bh,ch               ; depends on last instruction
        add ecx,edx
        add ebx,[boing]
        mov al,[ebx]            ; depends on last instruction
        mov [edi],al            ; depends on last instruction
        inc edi                 ; depends on last instruction
        dec ecx
        jnz @@one_willem        ; depends on last instruction

* Use Self Modifying Code to save registers because registers on the Intel
  processors are not as general-purpose as everyone thinks. So instead of:

         :
        add ebx,ecx
        add edx,ebp
        inc edi
        dec esi
        jnz @@bolke_de_beer

  you can self-modify the adders for eax and edx, making ecx and ebp
  available again:

        add ebx,????????
        add edx,????????
        inc edx
        dec esi
        jnz @@bolke_de_beer

  You'll need an i86 data sheet with the instruction format for these things.

* Unroll loops when possible. It is often faster to make a jump-table and
  an unrolled loop instead of a real loop. This saves registers in many cases
  as it does cycles very much (any type of jump is a menace, even for the P5).
  So instead of:

        @@next:         mov [edi],al
                        inc edi
                        dec ecx
                        jnz @@next

  Do:

                        jmp [jump_table + ecx * 4]
                         :
                         :
                        mov [edi + ????],al
                        mov [edi + ????],al
                        mov [edi + ????],al
                         :
                         :

PROBLEMS
--------
If there are any problems with SYSTEM64, any undocumented features (bugs) or
any north-Finnish girls who desperately want to get to know a lonely coder
cowboy, don't hesitate to contact me ASAP using one of the following means:

	snail:	Desmond Germans
		Jaagweg 15
		1452 PB  Ilpendam
		The Netherlands

        phone:  (+31)-(0)20-4363364 (between 18:00 and 22:00 CET)

        email:  germans@cs.vu.nl, npgerman@nat.vu.nl, nigerman@nat.vu.nl

        irc:    'Simm' on #coders, #nlcoders, #3dcoders, #finland, #lapland,
                #metsa and #analogue.
