
     This second release is in response to those felt that the original
text was too cryptic to understand.  I assume that I was too brief in some
areas, so this time, I'll make an effort to cover everything in more detail
and explain more of the examples.  In addition, several new topics and
tricks are covered here, including fading in/out and how the checkerboard
trick was achieved.  Oh, and the source to the AA loader is also packaged
in the ZIP file.  Enjoy...
------------------------------------------------------------------------------


                  Writing (Graphic) VGA Intros/Loaders
                            (Second Release)
                                   By
                              Fred Nietzche
                                11/08/91
 
 
     All that this text will cover is the standard VGA video mode 13h
(320x200 x 256 colors), which is the easiest to understand and also what
is generally used to write the loaders.  Again, if you would like more
information concerning the higher resolution modes (and also the non-
standard video modes), I encourage you to check out Power Graphics
Programming by Michael Abrash, and also The Advanced Programmer's Guide to
the EGA/VGA by George Sutty and Steve Blair.


VGA Memory Organization for Standard Mode 13h
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     Before getting on to the fun stuff, you need a little knowledge of how
something is displayed on the screen.  Bear in mind, though, that this
interpretation is simplified for you to understand the basics.  Later on,
more components will be added to this.
     Every X number of times a second, your VGA controller SCANS through
its display memory, and whatever it picks up there, is placed onto your
screen.  When writing loaders, our goal then, is to modify that display
memory so that when the VGA controller scans through it, it will show what
we want on the screen.  The organization of this display memory, as it
corresponds to how the controller interprets it, turns out to be fairly
simple.  The first reason why is because each pixel on the screen
corresponds to exactly 1 byte on the display memory.  Thus, each offset of
the address corresponds to 1 pixel.  Take this for granted for now if
you're not following.  And the second reason is that the memory map is
linear.  What does this mean?  Starting out at offset 0 (coordinate (0,0)),
the controller scans until offset 319 (319,0) for first line of the screen. 
For the second line on the screen, it just continues scanning from Offset
320 to 639, and the third line, Offset 640 to 959. Examine the following
example closely:


     Display, Horizontal, 0..319 Ĵ

   Ŀ       Ŀ
   Offset 0  Offset 1  Offset 2   ----> Offset 318Offset 319
   (0,0)     (1,0)     (2,0)            (318,0)   (319,0)   
 V  Ĵ       Ĵ
 e  Offset 320Offset 321Offset 322 ----> Offset 638Offset 639
 r  (0,1)     (1,1)     (2,1)            (318,1)   (319,1)   
 t  Ĵ       Ĵ
 ,  Offset 640Offset 641Offset 642 ----> Offset 958Offset 959
    (0,2)     (1,2)     (2,2)            (318,2)   (319,2)   
 0         
 .
 .                |                                    |
 1                |                                    |
 9               \|/                                  \|/
 9
 
 
 

     Looking at this, the equation for calculating the Offset is:

          Offset = (320 * YCoord) + XCoord

     Now, all that is missing from the memory address is the segment.  The
VGA and EGA modes, unlike CGA, all use 0A000h as their display segment. 
The completed display address then is given by:

          [ 0A000h : Offset ]

     You can now put a pixel on the screen, by filling in the display
memory address with a BYTE value, ranging from 0 to 255 (total of 256),
each one of these values representing a different color (to be re-defined
in the next part).


Definition of 256 Colors, and the VGA Video DACs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     When a video mode allows for 256 colors, this means that a total of
256 colors may be displayed on the screen at once.  But the definitions of
these colors can vary, according to how YOU would like them.  For example,
it would be feasible to change all of the 256 colors available to just
BLUE.  This would mean that everything on the screen would appear blue, no
matter what the display memory addresses say.  Why?  Because each and every
one of those 256 values has been DEFINED as blue.  Before scanning through
the display memory, the controller stores a table of what each of those 256
colors has been designated as so that it knows which color to place onto
the screen.  So how to go about changing the definitions of the colors as
stored on that table is the goal here.  First off, though, you need an
understanding of how to define a color.
     The spectrum of visible light (the colors) can be defined by just
three components and their respective intensities, red, green, and blue
(RGB).  Using this information, and the fact that each each component's
intensity is limited to 64 levels (ranging from 0 to 63), you can calculate
the total spectrum of colors that the card will allow.

     Red     Green    Blue
     64    x 64     x 64     =  262,144 colors available.

     Changing a color is then just a matter of determining which value from
the table you would like the change, and the three components that make up
your color.  There are four VGA Video DAC ports, but only three are
important to look up and change the values on the table.  Those three are
as follows:

     03C7h  -  Read Index Register.  When reading a value from the
               lookup table, placing a value here will set the read
               index of the table at that value.  Reading the Data
               Register port three times after this will result in the
               three components of that color stored on the table.
               After this step, the read index of the table will
               automatically be incremented by 1, so taking this a
               little further, reading the Data Register three times
               again will result in the three components of the color of
               the NEXT value stored on the table.

     03C8h  -  Write Index Register.  This operation is similar to the
               Read Index Register.  When modifying a value's three
               components on the lookup table, placing a value here will
               set the write index of the table at that value.  The next
               three byte values you place into the Data Register will
               be the three components of the color for the value. 
               Again, after this is completed, the write index for the
               table is incremented by 1, and the next three bytes that
               is put into the Data Registers will be for the NEXT value
               of the table.
     03C9h  -  DAC Data Register.  This is where the data is read and
               written to the table (also called the LUT, or Look-Up
               Table).

     Note:  I am uncertain whether the write index and the read index of
       the LUT are the same.  I've encountered that when you set the
       write index and read index both at once, and read/write from/to
       the Data Register several times, the indices will mess up
       (meaning that they both are not incremented by one each time upon
       three successive read/writes).  There's a chance that they are
       one and the same, in which case, you'll have to either read them
       all at once and then write, or set the read index, read three
       values, and set the write index, and write the three values.  The
       fading in and out examples that follow will treat them as the
       same.  If you know otherwise, please inform me.

     Now, combining this information with our knowledge of the display
memory organization, here is an example to place a blue dot at coordinate
(10,20) on the screen.

     ; Change color of "1" to blue.

          MOV  DX,03C8h         ; Write Index Register
          MOV  AL,01            ; We want to modify value 1 on the LUT
          OUT  DX,AL            ; Set write index to value 1
          INC  DX               ; Data Register
          MOV  AL,00            ; Red component is zero
          OUT  DX,AL            ; Set value's red on LUT to 0
          MOV  AL,00            ; Green component is zero
          OUT  DX,AL            ; Set value's green on LUT to 0
          MOV  AL,63            ; Blue component is 63 (max intensity)
          OUT  DX,AL            ; Set value's blue to 63

     ; Change display memory at coord (10,20) to value of 1.

          MOV  ES,0A000h        ; Set ES to display segment
          MOV  DI,20*320+10     ; Calculate offset of (10,20) into DI
          MOV  Byte ES:[DI],1   ; Move "1" into that memory location

     Note, however, that this coding is very inflexible and inefficient,
and is done only for the purpose of example.


The Vertical Retrace
~~~~~~~~~~~~~~~~~~~~
     The last basic component to writing VGA intros is knowing how to
produce smooth animation.  Refering back to my explanation in the first
section, the controller scans the display memory, and in doing so, moves
the electron gun with it, left to right.  Upon reaching the end of the
line though, the electron gun must RETRACE back to the beginning of the next
line, and then continues reading the display memory.  This retracing is
called the Horizontal Retrace.  What we are interested in is the Vertical
Retrace.  After the electron gun finishes placing the display memory data
onto the screen, it must RETRACE back up from the lower right hand corner of
the screen to the upper left hand corner, before beginning again to scan the
display memory and thus update the screen.  This retracing is called the
Vertical Retrace, and it is during this time that we should make our
modifications to the display memory and LUT so that it will appear to "pop"
into place.  This is the key to smooth animation.
     The video port and bit # that you can read in determining whether the
electron gun is beginning its vertical retrace is 03DAh, bit #3.  The
following example illustrates its use.

                  MOV   DX,03DAh
          Wait:   IN    AL,DX           ;Get value from port
                  TEST  AL,08h          ;Is bit #3 on?
                  JZ    Wait            ;No, then wait until on
          Retr:   IN    AL,DX
                  TEST  AL,08h          ;Is bit #3 on?
                  JNZ   Retr            ;Yes, then wait until off

     If you didn't understand the above section, then take it for granted,
and any time you want to modify the display memory and/or LUT, put this
code in front before doing so.


Algorithm/Overhead/Misc
~~~~~~~~~~~~~~~~~~~~~~~
     And before working out your algorithm, take into consideration that your
graphics execution cycle is the most important part of the program, and that
your goal is to optimize that with little regard for overhead time and memory.
In order for your intro to deliver its full effect, smooth animation is the
key, and achieving this requires that your processor does not have to spend
excess time unnecessarily on portions which could have been done in the
overhead.  By doing as many pre-calculations and memory writes as possible
in the overhead, you will speed up the processor, and maybe free it for other
animations you have in mind.  This can be demonstrated in the AA.EXE intro,
in which it takes approximately 5-7 seconds (for my 20Mhz machine) of overhead
before the program continues on to the animation.


Some Tricks of the Trade
~~~~~~~~~~~~~~~~~~~~~~~~
     Now that you (hopefully) have the basics of the VGA down pat, I'm
going to move on the some tricks and effects that can be accomplished using
this knowledge.  This section includes how to fade in/out, moving color
bars around using palette cycling, producing the checkerboard using palette
cycling, and accessing the internal CGA character set in order to get some
message across (purpose of the intro in the first place!).

     Fading In and Out
     ~~~~~~~~~~~~~~~~~
          Fading a screen in and out is simply a matter of continuously
     incrementing/decrementing all the components of the values on the
     LUT until they either reach their final values, or they go down to
     zero, whichever task you want to do.  The difficult part, though,
     is working out a fast algorithm to do this.  The following is an
     example showing a general fade-out procedure.  A fade-in procedure
     will not be included, but you can figure one out from this example.

          All_RGB    DB  256*3 DUP (?)       ;Define temporary work area

               MOV   CX,64                   ;Lower intensities 64 times
          OneCycle:

               CALL  WaitVerticalRetrace     ;Using already written proc

               MOV   DX,03C7h                ;Read Index
               XOR   AL,AL                   ;Set read index to 0
               OUT   DX,AL
               INC   DX
               INC   DX                      ;Data Register
               XOR   BX,BX                   ;Init work counter
          ReadLoop:
               IN    AL,DX                   ;Read component
               MOV   Byte Ptr All_RGB[BX],AL ;Store component
               INC   BX                      ;Increment work counter
               CMP   BX,256*3                ;All 256*3 done yet?
               JL    ReadLoop                ;No, continue reading

               XOR   BX,BX                   ;Init work counter (dec)
          DecLoop:  
               CMP   Byte All_RGB[BX],0      ;Is it zero already?
               JZ    Continue                ;Yes, skip decrement
               DEC   All_RGB[BX]             ;No, decrement

          Continue:
               INC   BX                      ;Increment work counter
               CMP   BX,256*3                ;256*3 times already?
               JL    DecLoop                 ;No, continue decrementing

               CALL  WaitVerticalRetrace     ;Using already written proc

               MOV   DX,03C8h                ;Write Index
               MOV   AL,0                    ;Start writing at zero
               OUT   DX,AL
               INC   DX                      ;Data Register
               MOV   SI,OFFSET All_RGB       ;DS:SI point to work area
               PUSH  CX                      ;Store cycle number
               MOV   CX,256*3                ;Will repeat 256*3 times
               CLD                           ;Set SI forward
               REP   OUTSB                   ;Set ALL components to work
               POP   CX                      ;Restore cycle number

               LOOP  OneCycle                ;Continue 64 times...

          I believe this is as fast as you can perform the operation
     when decrementing each component by 1, the reason being that all
     the fade-out routines need to wait for the vertical retrace (for
     smoothness again), which takes up most of the time.  If anyone
     comes out with anything simpler and faster, would appreciate it if
     you'd let me know..


     Color Bars
     ~~~~~~~~~~
          I don't believe I ever defined what palette cycling meant in
     the last release, which may have caused a lot of confusion. 
     Basically what it is is having a static screen (very little or no
     modifications to the display memory) and just changing the
     components of the LUT around for a nice effect.  Example, color
     bars.
          You've all seen the color bars on the Amiga's and ST's.  It's
     also feasible on the PC's, although it takes a little bit more
     work.  To get horizontal color bars moving up and down, you simply
     assign a different color value in the display memory FOR EACH
     DISPLAY LINE that the bars will be moving across.  Examine the
     following closely:

     Ŀ
       1   1   1   1   1   1   --->
     Ĵ
       2   2   2   2   2   2   --->
     Ĵ
       3   3   3   3   3   3   --->
     Ĵ
       4   4   4   4   4   4   --->
     
              |           |
              |           |
             \|/         \|/

          *The numbers in the boxes are color values*

          Looking at this arrangement, we can now selectively choose
     which color value we want on, and which ones we want off (off =
     black, or whatever the background color happens to be).  For
     example, say if I defined a bar to having a width of 4.  First I
     would set ALL of the colors involved in this palette cycling (the
     number of lines you filled up and want the bar to travel around in)
     to the background color (if black, then the RGB components would be
     0,0,0).  Then, say if I wanted the bar at location 20, I would just
     fill in the RGB components of the bar at values 20, 21, 22, and 23.
     To move the bar down one, I would first black out values 20, 21,
     22, and 23, and then fill in values 21, 22, 23, and 24.  It's
     fairly straightforward.  You're now probably wondering, wouldn't it
     be easier, then, to just move the bars across?  No, because you
     would have to move complete lines around, which is very cumbersome.
     This method allows for quick line color changes, saving you more
     time for other animations.  I won't include an example of the
     actual routine in this text, but one of the intros I've packaged
     with this ZIP demonstrates this.  Notice there that the bars
     are moving in a sine wave motion.


     Accessing the CGA Character Set
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          The whole point of an intro is to get some message across to the
     viewer (be it to brag about some group (most cases) or to advertise
     some board).  In order to do so, though, you must find a character set
     to print this message in, and one that will fit into our tight
     requirements.  These requirements are the size of the set, and how
     easily the character set can be obtained.
          For the reasons stated above, although not exactly pretty, I have
     chosen to use the 8x8 CGA character set located within BIOS.  This
     set, being internal, takes up no addition space in the intro and is
     fairly easy to access.  Starting at the memory address
     [ 0F000 : 0FA6E ], the characters are stored in memory line by line
     (or byte by byte since each character is 8 x 8 bits).  Thus, every 8
     bytes, you will encounter a next character in the set.  Example:

          Suppose, that the character set starts with the letter "A" (which
          it does not, since "A" is about 63 or so characters into the set).

          0F000 : 0FA6E  00110000       0F000 : 0FA75  11111000
          0F000 : 0FA6F  01111000       0F000 : 0FA76  01001100
          0F000 : 0FA70  11001100       0F000 : 0FA77  01111000
          0F000 : 0FA71  11111100       0F000 : 0FA78  01000110
          0F000 : 0FA72  11001100       0F000 : 0FA79  01000110
          0F000 : 0FA73  11001100       0F000 : 0FA7A  11111100
          0F000 : 0FA74  00000000       0F000 : 0FA7B  00000000

          And so on, with each character following one another according
          to the ASCII table.

          Looking at the latest intros though, it seems that more of the
     major groups are using the nice multi-colored lettering.  That would
     be feasible, but would (a) take up extraneous space (even though it
     compresses well), and (b) take up too much of my time.  How would you
     do it you're asking?  There are two ways:  Either display the
     characters on the screen and capture each letter of the alphabet there
     (VERY tedious), or, figure out how the character set is stored in the
     file and access it through there.  Good luck on any of those.  If
     anyone has already gotten any multi-colored set into a known format,
     please let me know about it...


     Swirling Letters (With the CGA Character Set)
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          Producing the swirling letters across the screen (first shown by
     the earlier TDT-TRSI intros) is simply a matter of a combination of
     pre-calculated coordinates and fast display memory writes.  First, you
     must calculate all the coordinates of the path that the letters will
     travel in (the swirling effect uses the sine wave).  Since the height
     of the characters (of the CGA character set) is eight, you need eight
     of these paths.  Shifting the starting phase of the angle by one
     increment (whose size is dependent upon your program) for each of
     these paths will give a 3-dimensional effect.
          The next step you should do is to think of some way of rapidly
     placing the pixels of the characters on the display memory according
     to the path.  One consideration is the fact that calculating the
     offset for every path pixel during the graphics execution cycle is
     very cumbersome, and will slow down your animation.  As a result, I
     not only pre-calculated the coordinates of the pixels, but its display
     memory offset as well.  And the other item you should consider is the
     format of the storage of the characters of the message.  By converting
     each bit of the character set into a byte, I don't waste any time
     during the graphics cycle stage striping the whole byte to get the
     character printed on the screen.
          After this has been completed, your graphics cycle should look
     similar (in process) to this pseudocode:

          { Wait for vertical retrace }
          { Delete (set to zero) section of screen where letters pass
            through }
          { Re-plot ALL the points of the characters, according to the path
            defined }
          { Repeat }

          Examine CENTERPT.PAS and try tracing through the whole thing if
     you don't understand this portion of the text.


     Checkerboard
     ~~~~~~~~~~~~
          I got the idea for the checkerboard effect after looking at the
     TLS.EXE graphics demo.  Trying to figure out a way to compress this
     into something small was difficult though, but I finally found that
     the easiest way to do this is through palette cycling.
          So how do you go about palette cycling this mess?  First of all,
     you need to break everything down into small pieces.  Consider
     a standard checkerboard, instead of one going off into the horizon
     point.  How would you go about placing down the values on the display
     memory so that the result would be that you could have a solid square
     moving about in all directions with palette cycling?  Here's the
     layout of the values on the display memory:

          (I'm considering a 4x4 pixel square now.)

          01  02  03  04      05  06  07  08         (Figure 1)
          09  0A  0B  0C      0D  0E  0F  10
          11  12  13  14      15  16  17  18
          19  1A  1B  1C      1D  1E  1F  20

          To produce a checkerboard in this setup, you would just set the
     values 01 to 04, 09 to 0C, 11 to 14, and 19 to 1C to the solid color
     (say blue), and the values 05 to 06, 0D to 10, 15 to 18, and 1D to 20
     to the other color (say white).  The result would be:

          B   B   B   B       W   W   W   W
          B   B   B   B       W   W   W   W
          B   B   B   B       W   W   W   W
          B   B   B   B       W   W   W   W

          Where B = Blue, and W = White.

          To produce this checkboard through the entire screen, you would
     just place squares of these values in the following order (relative
     to the display screen):
                                                     (Figure 2)
          01  02  03  04      05  06  07  08
          09  0A  0B  0C      0D  0E  0F  10   --->  And so on...
          11  12  13  14      15  16  17  18
          19  1A  1B  1C      1D  1E  1F  20

          05  06  07  08      01  02  03  04
          0D  0E  0F  10      09  0A  0B  0C   --->
          15  16  17  18      11  12  13  14
          19  1A  1B  1C      1D  1E  1F  20

          01  02  03  04      05  06  07  08
          09  0A  0B  0C      0D  0E  0F  10   --->
          11  12  13  14      15  16  17  18
          19  1A  1B  1C      1D  1E  1F  20

                 |
                \|/

          Which produces the following:

          B   B   B   B       W   W   W   W
          B   B   B   B       W   W   W   W   --->  And so on...
          B   B   B   B       W   W   W   W
          B   B   B   B       W   W   W   W

          W   W   W   W       B   B   B   B
          W   W   W   W       B   B   B   B
          W   W   W   W       B   B   B   B
          W   W   W   W       B   B   B   B

                 |
                \|/

          Now, in order to "move" the squares left one, looking at the
     Figure 1 and using the value layout on the display memory, I would just
     set the following colors to their respective values:

          B   B   B   W       W   W   W   B
          B   B   B   W       W   W   W   B
          B   B   B   W       W   W   W   B
          B   B   B   W       W   W   W   B

          Use Figure 2, and see what happens to the rest of the display when
     the values are set to the colors above.

          And to move the squares down one, the colors corresponding to the
     values in Figure 1 are as follows:

          W   W   W   W       B   B   B   B 
          B   B   B   B       W   W   W   W
          B   B   B   B       W   W   W   W
          B   B   B   B       W   W   W   W

          Put these colors in Figure 2, and you can see that the rest of the
     display will also shift down one.
          Now determining which colors correspond to which values can take
     time (in fact, a lot!).  So, what I did in the AA.EXE loader was pre-
     calculate all the palette colors in every possible position and stored
     them into a separate array for each (in this case, this would be 8 x 8
     arrays, or 64 arrays total).  Then I assigned 4 pointers to each
     record, pointing to the record which would move it up, down, left, and
     right.  So moving all the squares right one pixel would just require me
     to continually specify the palette array to the one on the right.
     Simple.
          Now that you've got the standard checkerboard worked out, how do
     you relate this to a checkerboard with the perspective of going into
     the horizon point?  I'm NOT going to go into the details of shifting
     the perspective, but for their relation, basically you need to know is
     when to place down a value on the display memory, and how many to put
     down.  This can be down by either rounding (I found this did not work
     well), or by truncating (better).  Go through the AA.EXE code if you
     want to know how I did the perspectives and how I determined the
     placement of the values.


     Line/Vector Letters
     ~~~~~~~~~~~~~~~~~~~
          Just saw a new TDT-TRSI intro (these guys seem to be at the
     forefront of coming up with new ideas) incorporating line/vector drawn
     letters.  Using line/vector lettering has the advantage of having
     the letters follow certain shapes, increase/decrease sizes, and overall,
     producing a neat effect.  I'm not going to go into the details,
     major reason being I haven't tried this yet, but will just run through
     a couple of important points.
          The lettering used is defined by points and lines connecting the
     points.  You'll probably have to make up your own lettering as well as
     the storage format since there are few pre-defined vector/line
     characters out there simple enough to be used in animation, and small
     enough to be used in an intro.  This part will take some time (one
     reason I haven't tried it).
          A pre-calculated path is also needed for the characters to travel
     in.  This time, though, you only need a top and bottom path, and maybe a
     middle one if you want lines going through the middle.  Because these
     characters are line drawings, you only need the paths which the points
     will be.
          And finally, you'll also need a fast line drawing procedure,
     probably implemented in assembly for greatest speed.  Basically, what
     you'll be doing is drawing the letters with lines continually with
     shifting coordinates (with respect to the defined paths), so you'll need
     to be able to draw all lines fast enough for smooth animation.
          Be forewarned that this type of intro will take a lot of time
     (compared to the others).  Make sure that you've got enough of it before
     starting...


Final Word
~~~~~~~~~~
     Believe me, there is MUCH more information about the VGA than I've
covered in this text.  A lot of those functions, though, go into the higher
resolution mode, which takes more time to learn about.  Functions such as
split screen mode, modifying the dimensions of display memory, tricks
with the non-standard modes, etc, etc are all available.  When you get the
hang of mode 13h (or get bored with it, whichever first), I encourage you to
delve into those higher modes I've been referring to...

------------------------------------------------------------------------------
     Thanks for bearing through all 10 pages of this text.  Whew!  Don't
think I've ever written anything so huge before, and given the time it took,
I can almost guarantee that you won't see anything this big from me again.

Anyways, if you want to get in touch with me, you can reach me on my board,
CenterPoint! BBS (301)309-0144, 9600+ only.  I welcome all your ideas/
executables/and sources (if you're willing to part with them).  Impress me.

I also just got an Internet account (since I'm at UMCP now).  If you want
faster replies, send mail to nietzche@wam.umd.edu, and I'll usually reply
within a day or two.

Later on..
