THE AGI SOUND FORMAT 
Written by Lance Ewing <be@ihug.co.nz>
(18 Aug 97)

NOTE: The original version of this document did not cover every aspect of the
sound format. It made no mention that the volume control and noise voice were
also part of AGI's sound format. It turns out that the data contained in a
sound resource is so much like the data sent to the PCjr's T1 chip that I have
included a lot of Peter Nortons T1 sound chip section from the "Programmers
Guide to the IBM PC".


INTRODUCTION 

 Most people who think of AGI games remember that they played their music
and sounds over the PC speaker. What they may not know is that all sounds are
composed of four parts, one which is the melody, two which are accompaniment,
and the final one being noise. The IBM PC can only play one note at a time so
all AGI games for the PC play the melody by itself. The other three parts are
still included in the data though because some PC comptibles, including the
IBM PCjr, have more than one sound generator.


HISTORY

 According to Donald B. Trivette author of  'The Official Book of King's
Quest', a year before the IBM PCjr was announced IBM asked Sierra to create a
game that would show off the new computers color graphics capabilities. IBM
supplied the company with a prototype Junior, and Roberta set to work
designing a new type of adventure game. The game produced was called King's
Quest.
 This is important because the IBM PCjr had a different method of sound
generation than the IBM compatibles of today. The sound data was stored to
make it easy to send to the Juniors sound generators. This format appears to
have remained right through the AGI games up until 1989-90 when SCI took over
even though the PCjr had long since been surpassed by the 286, and 386.


SOUND AND THE IBM PCjr 

 The best known source of sound in the Junior is the TI SN76496A sound
generator chip. This source has four separate sound voices. Three of these are
tone generators and the fourth is a noise source. All four voices have an
independent volume control, providing an evenly graduated set of 15 volume
levels, plus a zero volume (off). Each of the three pure voices has an
independently selected frequency. The noise voice has three preselected
frequencies and a fourth option, which borrows the frequency of the third pure
voice. The data stored in the AGI games is designed to be sent to these four
voices.


THE TONE GENERATIONS 

 A tone is produced on a voice by passing the sound chip a 3-bit register
address and then a 10-bit frequency divisor. The register address specifies
which voice the tone will be produced on. This is done through port 192 on the
IBM PCjr by sending it 2 bytes in the following format:


 First Byte

 7  6  5  4  3  2  1  0

 1  .  .  .  .  .  .  .      Identifies first byte (command byte)
 .  R0 R1 R2 .  .  .  .      Register number in T1 chip (0, 2, 4).
 .  .  .  .  F6 F7 F8 F9     4 of 10-bits in frequency count.
	

 Second Byte

 7  6  5  4  3  2  1  0

 0  .  .  .  .  .  .  .      Identifies second byte (completing byte)
 .  X  .  .  .  .  .  .      Unused, ignored.
 .  .  F0 F1 F2 F3 F4 F5     6 of 10-bits in frequency count.


 Register Addresses:
		
		R0	R1	R2

		0	0	0		Holds voice 1 frequency number.
		0	1	0		Holds voice 2 frequency number.
		1	0	0		Holds voice 3 frequency number.


 The actual frequency produced is the 10-bit frequency divisor given by F0 to
F9 divided into 1/32 of the system clock frequency (3.579 MHz) which turns out
to be 111,860 Hz. Keeping all this in mind, the following is the formula for
calculating the frequency:

         F = 111860 / (((Byte-2 AND 0x3F) * 16) + (Byte-1 MOD 16)); 

 Note: The order of the bytes are reversed for AGI sound data.


ATTENUATION 

 Each voice in the T1 sound chip has an independent sound-level control, which
is calculated in terms of decibels of attenuation, or softening. There are
four bits uses to control the volume. These bits, labeled A0 through A3, can
be set independently or added together to produce sixteen volume levels as
shown below.

 A0 A1 A2 A3        Value        Attenuation (decibels)

  .  .  .  1          1                    2
  .  .  1  .          2                    4
  .  1  .  .          4                    8
  1  .  .  .          8                   16
  1  1  1  1                           Volume off

 When a bit is set on, the sound is attenuated (reduced) by a specific
amount: either 2, 4, 8, or 16 decibels. When all four bits are set on, the
sound is turned completely off. When all four bits are off, the sound is at
its fullest volume.

 The attenuation is set by sending a byte of the following format to the T1
sound chip:

 7  6  5  4  3  2  1  0

 1  .  .  .  .  .  .  .      Identifies first byte (command byte)
 .  R0 R1 R2 .  .  .  .      Register number in T1 chip (1, 3, 5, or 7).
 .  .  .  .  A0 A1 A2 A3     4 attenuation bits


 Register Addresses:

		R0	R1	R2

      0  0  1     Holds voice 1 attenuation.
      0  1  1     Holds voice 2 attenuation.
      1  0  1     Holds voice 3 attenuation.
      1  1  1     Holds noise voice attenuation.


THE NOISE GENERATOR 

 There are two modes for the noise operation, besides the four frequency
selections. One, called periodic noise, produces a steady sound; the other,
called white noise, produces a hissing sound. These two modes are controlled
by a bit known as the FB bit. When FB is 0, the periodic noise is generated;
when FB is 1, the white noise is produced.

 Two bits, known as NF0 and NF1, control the frequency at which the noise
generator works. Three of the four possible combinations of NF0 and NF1 set
an independent noise frequency based on the timer. The fourth combination
borrows the frequency from the third of the three pure voices made by the tone
generators.

 NF0  NF1       Noise Frequency

  0    0         1,193,180 / 512 = 2330
  0    1         1,193,180 / 1024 = 1165
  1    0         1,193,180 / 2048 = 583

 The noise frequency is set by sending a byte of the following format to the
T1 sound chip:

 7  6  5  4  3  2  1  0

 1  .  .  .  .  .  .  .      Identifies first byte (command byte)
 .  1  1  0  .  .  .  .      Register number in T1 chip (6)
 .  .  .  .  X  .  .  .      Unused, ignored; can be set to 0 or 1
 .  .  .  .  .  FB .  .      1 for white noise, 0 for periodic
 .  .  .  .  .  . NF0 NF1    2 noise frequency control bits


AGI SOUND FILES

 We now know enough about the PCjr's T1 sound chip to discuss the AGI sound
format. The sound is stored as four separate units of data, one for each
voice. Each sound file stored in the VOL files has an 8-bit header which
contains offsets into file. The format is as follows:

	Byte		Meaning

	0-1		Offset of first voice data.
	2-3		Offset of second voice data.
	4-5		Offset of third voice data.
   6-7      Offset of noise voice data.

 The data starting at each voice offset is stored as 5-byte notes which give
the frequency and duration of a note played on that voice. The 5 bytes have
the following meanings:

	Byte

	0-1		Duration   (16-bit word)
   2-3      Frequency divisor of the format described in the PCjr section
            above except the two bytes are around the other way.
    4       Attenuation of the note in the format described above in the PCjr
            section.

 Note that the last three bytes were around the other way in version 1 of the
AGI interpreter. The above order is opposite from the order that would be
output to the T1 sound chip.

 Each voice's data section in the SOUND resource file is usually terminated
by two consecutive 0xFF codes. Another way of checking for the end is to see
if it has reached the start of the next voice section, or in the case of the
noise voise, the end of the SOUND data.


PLAYING THE SOUNDS ON A SOUND CARD

 Writing a program to play the tunes will require four pointers which keep
track of where in each voice segment the program currently is since all four
voices are played simultaneously. The first voice is the melody and is the
voice that is played on the PC speaker in today's modern PC compatibles, the
other two voices being ignored. I'd imagine that other platforms such as the
Amiga and Macintosh would probably play all three voices.

 A program would start by reading each of the four offsets in the header. It
would then go through a loop which begins by reading the first note of each
voice section. The duration's are then monitored and when each note finishes,
another note is read. Note that the notes for each voice will usually finish
at different times.  The program finishes when all of the voice sections
have been entirely played. This will usually occur for each voice at the same
time but not necessarily I don't think.

 Then of course you could always convert the AGI SOUND to a MIDI file and
play that which will sound a hundred times better :)


CALCULATING FREQUENCIES WHEN PLAYING NOTES ON A SOUND CARD 

 My program reads in the duration as a 16 bit word. It then loads the two
following bytes and calculates the frequency as follows:

   Freq. = 111860 / (((Byte-2 AND 0x3F) * 16) + (Byte-3 MOD 16));

 The 111860 comes from the PCjr discussion above. Note that the bytes are in
the opposite order from that mentioned in the PCjr information.

 Remember also that the SOUND format includes volume information for each
voice. The exact conversion from the decible values to the volume control on
todays sound cards is uncertain at this stage.


-----------------------------------------------------------------------------
APPENDIX 1: SOUND FORMAT SUMMARY 

 The header consists of four two-byte offsets, one for each voice. The low
 byte is first, followed by the high byte. Each offset points to the note
 data for the relevant voice. The note data for a voice consists entirely of
 five-byte note entries of the following format:


 FIRDT BYTE   \
               > Note duration (low byte and then high byte).
 SECOND BYTE  /    


 THIRD BYTE

  ---> In the case of a tone voice,

   7  6  5  4  3  2  1  0

   0  .  .  .  .  .  .  .      Always 0.
   .  X  .  .  .  .  .  .      Unused, ignored.
   .  .  F0 F1 F2 F3 F4 F5     6 of 10-bits in frequency count.


  --->  In the case of the noise voice, this byte is equal to zero.


 FOURTH BYTE

  ---> In the case of a tone voice,

   7  6  5  4  3  2  1  0

   1  .  .  .  .  .  .  .      Always 1.
   .  R0 R1 R2 .  .  .  .      Register number in T1 chip (0, 2, 4).
   .  .  .  .  F6 F7 F8 F9     4 of 10-bits in frequency count.

   F = frequency = 111860 / (((Byte-3 AND 0x3F) * 16) + (Byte-4 MOD 16))
   R = register address


   ---> In the case of the noise voice,

   7  6  5  4  3  2  1  0

   1  .  .  .  .  .  .  .      Always 1.
   .  1  1  0  .  .  .  .      Register number in T1 chip (6)
   .  .  .  .  X  .  .  .      Unused, ignored; can be set to 0 or 1
   .  .  .  .  .  FB .  .      1 for white noise, 0 for periodic
   .  .  .  .  .  . NF0 NF1    2 noise frequency control bits

   NF0  NF1       Noise Frequency

    0    0         1,193,180 / 512 = 2330
    0    1         1,193,180 / 1024 = 1165
    1    0         1,193,180 / 2048 = 583


 FIFTH BYTE

   7  6  5  4  3  2  1  0

   1  .  .  .  .  .  .  .      Identifies first byte (command byte)
   .  R0 R1 R2 .  .  .  .      Register number in T1 chip (1, 3, 5, or 7).
   .  .  .  .  A0 A1 A2 A3     4 attenuation bits


   A0 A1 A2 A3        Value        Attenuation (decibels)

    .  .  .  1          1                    2
    .  .  1  .          2                    4
    .  1  .  .          4                    8
    1  .  .  .          8                   16
    1  1  1  1                           Volume off


 Register Addresses:

   R0 R1 R2        Parameter

    0  0  0        Voice 1 frequency control number (10 bits)
    0  0  1        Voice 1 attenuation (4 bits)
    0  1  0        Voice 2 frequency control number (10 bits)
    0  1  1        Voice 2 attenuation (4 bits)
    1  0  0        Voice 3 frequency control number (10 bits)
    1  0  1        Voice 3 attenuation (4 bits)
    1  1  0        Noise voice control (4 bits; 3 used)
    1  1  1        Noise voice attenuation (4 bits)


 The note data for one voice is terminated by two consecutive 0xFF values.


-----------------------------------------------------------------------------
APPENDIX 2: AGI v1.12 SOUND FORMAT 

 The sound format used in version 1.12 of the AGI interpreter was quite
different from the format described above for AGIv2 and AGIv3. It still uses
the PCjr format for the note data but it does not store the duration as a
separate field. The best way to describe it is by an example:

 90 80 16 B0 A0 15 D0 C0 0E FF E4 00 80 17 A0 16 C0 11 00 80 16 B1 A0 14
 C0 12 00 80 16 B2 A0 16 C0 13 00 ...

 The first thing to point out is that the PCjr note data is in the opposite
order to AGIv2. Secondly, all four parts are included together rather than in
separate sections. Taking the above example, lets look at the first note and
show the equivalent AGIv2 notation.

 90 80 16  -->     03 00 16 80 90

 Now, the duration isn't immediately obvious, but we will come to that in a
short while. The followint three bytes give the first note for the second
part, the third part, and the noise part (at least as far as this example is
concerned).

 B0 A0 15  -->     03 00 15 A0 B0
 D0 C0 0E  -->     03 00 0E C0 D0
 FF E4 00  -->     33 00 00 E4 FF

 The data that follows after these initial four starting notes is basically
any changes in the note value which each 3 duration step. For example,

 80 17     -->     03 00 17 80 90

 Note that 0x90 doesn't need to be stored because that byte has retained its
value. Every 0x00 byte that is encountered is the end of one set of note
changes. Each set of note changes is the equivalent of a duration of 3 in the
AGIv2 format. Continuing with our example,

 A0 16     -->     03 00 16 A0 B0
 C0 11     -->     03 00 11 C0 D0 

 The example now encounters a 0x00 byte which means that the noise voice
isn't changed at this point. In fact, from the AGIv2 equivalent note above,
you will see that the noise note will not change until 49 (or 0x33) sets of
note changes have been processed.

 80 16     -->     03 00 16 80 90
 B1 A0 14  -->     03 00 14 A0 B1
 C0 12     -->     03 00 12 C0 D0

 How exactly the AGIv1.12 interpreter knows which voice is having its notes
changed, and which bytes of the note are being changed, is not yet certain.
On some occassion a sets of changes will contain only one byte which
corresponds to one of the bytes which makes up one of the voices note value,
but how it knows which one is a mystery to me.

 On other occassions, there could be a whole chain of 0x00 bytes which means
that during that whole time, none of the voices are changing their notes
value.

