

	BOOTSTRAPPING T3X USING S86


	1) INTRODUCTION

	Since Release 5.5, it is possible to boostrap the T3X compiler
	without using any third-party software like TurboC/TASM/TLINK,
	or MASM/LINK.

	Currently, this option works only under the PCDOS/MSDOS
	operating systems, since the other supported system (like *BSD
	and Plan9) already include C compilers and assemblers which can
	be used for bootstrapping.

	The only additional tool which is highly recommended to
	bootstrap the T3X compiler is GYMAKE, a free MAKE(1) utility
	for DOS. It can be retrieved from the DOS/C section of any
	SIMTEL mirror. However, any other *working* MAKE utility
	should suffice, as well. (Unfortunately, TCC 2.01's MAKE
	does NOT work -- it does not support I/O redirection.)

	Provided that a working MAKE program is present, the compiler
	can be bootstrapped automatically just by typing

		make -f Makefile.S86 boot

	Notice, however, that the major part of this process employs
	*interpreted* stages. Therefore, this process is going to need
	some time. On a 100Mhz 486-based PC (the fasted box I have),
	it may take as long as 20 minutes. My 4.77 MHz Palmtop ran
	out of batteries while building the compiler, but approximately,
	the whole step would have taken about 10 hours.

	When you just want to have a working compiler, there are
	simpler and faster options. You might either get the binary
	distribution kit from my home page and just install the
	precompiled binaries, or you can use the precompiled binaries
	to REBUILD the compiler using S86. No additional software
	(except for MAKE) is required for this step, neither.

	If you want to see how bootstrapping a compiler works, or
	you are just interested in the gory details, the remainder
	of this document is the proper reading...


	2) HOW T3X AND S86 WORK

	The T3X compiler itself as well as the S86 development kit
	consist of several parts. The single stages of compilation
	are normally invoked by the batch file TX.BAT. The following
	steps are necessary to create a DOS executable from a T3X
	source program:

	(1) [Optional] Preprocess the T3X source program:

		prog.t ---[TXPP]---> prog.tmp

	(2) Translate the T3X source program into a Tcode program:

		prog.tmp ---[TXTRN]---> prog.tco

	(3) [Optional] Optimize the Tcode program:

		prog.tco ---[TXOPT]---> prog.tco

	(4) Generate native code (assembly language):

		prog.tco ---[TXCG86A]---> prog.s

	(5) Assemble the code generator output:

		prog.s ---[S86B]---> prog.o

	(6) Load the program:

		prog.o + libtx86.o ---[SLD]---> prog.x

	(7) Convert the SLD load module into DOS EXE format:

		prog.x ---[CVEXE]---> prog.exe

	Since the optimizing step (3) is really optional, we can
	safely ignore it during the bootstrapping process.

	The actual TX.BAT program performs some more operations like
	evaluating compiler options, and checking for compilation
	errors, but basically, it can be reduced to the following
	set of commands:

	TXPP <%1.t >%1.tmp
	TXTRN <%1.tmp >%1.tco
	TXCG86A <%1.tco >%1.s
	S86B <%1.s >%1.o
	ECHO -o %1.x >__ldctrl
	ECHO %1.o >>__ldctrl
	ECHO libtx86.o >>__ldctrl
	SLD <__ldctrl
	CVEXE <%1.x >%1.exe
	DEL %1.tmp
	DEL %1.tco
	DEL %1.s
	DEL %1.o
	DEL %1.x
	DEL __ldctrl

	Consequently, the following programs are required for compiling
	a T3X program using DOS and S86:

	TXPP, TXTRN, TXCG86A, S86B, SLD, CVEXE


	3) REQUIREMENTS

	Since all stages of the T3X compiler as well as the entire S86
	development kit is written in T3X, some precompiled binaries
	are required for bootstrapping T3X and S86.
	The T3X/S86 bootstrapping kit contains two executables:

	TXX.EXE		A Tcode interpreter compiled with TCC 2.01
	TXTRN.TCO	A Tcode image of the T3X->Tcode translator

	Using TXX and the TXTRN image, any T3X program can be compiled
	to Tcode and using TXX, the resulting Tcode image can be
	interpreted. Basically, this method is used to bootstrap the
	rest of the compiler.

	At this point, a good question would be:

		Where does the TXTRN image come from?

	Of course, it has been compiled using TXX and TXTRN.TCO.
	But the *first* TXTRN image must have been generated in a
	different way, because there was no TXTRN to compile it.

	The answer is that an existing language is *always* required
	to build a new language. In the case of T3X, the first T1
	compiler has been written in C in 1995. It generated DOS COM
	files and could compile itself. Since then, each T{1,2,3...}
	compiler was used to generate its successor.
	Since the introduction of T3X, the compilers generate symbolic
	code (Tcode) which can be interpreted by a virtual Tcode
	machine (TXX). TXX is the only part of the system which has
	not been coded in T3X. Therefore, a DOS executable is supplied
	with this archive.


	4) BUILDING A STAGE-0 COMPILER

	The stage-0 compiler is usually built using a different
	language -- or at least a different compiler -- than the one
	being built.

	In the case of T3X and S86, the heart of this compiler already
	has been precompiled. It consists of the programs TXX.EXE and
	the Tcode image of the T3X translator TXTRN. They are both
	located in the BUILD/ directory.

	In this phase, the rest of the stage-0 compiler is built.

	First, the preprocessor TXPP is compiled, because it is required
	to build the code generator:

	(S-1)	compiler/txpp.t ---[TXX TXTRN]---> build/txpp.tco

	Then, the code generator generator TXCGG is preprocessed,
	compiled, and applied to the S86 code generator definition
	file CG86A.DEF. This step results in the S86 code generator
	TXCG86A.

	(S-2a)	native/txcgg.t ---[TXX TXPP]---> build/txcgg.t
	(S-2b)	build/txcgg.t ---[TXX TXTRN]---> build/txcgg.tco
	(S-2c)	native/cg86a.def ---[TXX TXCGG]---> build/txcg86a.t
	.	+ native/txcg_asm_frame
	(S-2d)	build/txcg86a.t ---[TXX TXPP]---> build/txcgg86a.tmp
	(S-2e)	build/txcg86a.tmp ---[TXX TXTRN]---> build/txcg86a.tco

	In the next steps, the S86kit tools S86B, SLD, and CVEXE are
	built. S86B is used instead of S86 (which is faster), because
	some programs are too big to be assembled using S86.

	(S-3)	s86kit/s86b.t ---[TXX TXTRN]---> build/s86b.tco
	(S-4)	s86kit/sld.t ---[TXX TXTRN]---> build/sld.tco
	(S-5)	s86kit/cvexe.t ---[TXX TXTRN]---> build/cvexe.tco

	Finally, the Tcode image of S86B is used to create the
	runtime support module LIBTX86 which contains the interface
	between native T3X programs and the DOS operating system.

	(S-6)	native/libtx86.s ---[TXX S86B]---> build/libtx86.o

	At this point, the stage-0 compiler is complete and can be used
	to compile T3X programs to DOS EXE style binaries. The following
	batch would now be sufficient to compile a T3X program:

	TXX TXPP.TCO <%1.t >%1.tmp
	TXX TXTRN.TCO <%1.tmp >%1.tco
	TXX TXCG86A.TCO <%1.tco >%1.s
	TXX S86B.TCO <%1.s >%1.o
	ECHO -o %1.x >__ldctrl
	ECHO %1.o >>__ldctrl
	ECHO libtx86.o >>__ldctrl
	TXX SLD.TCO <__ldctrl
	TXX CVEXE.TCO <%1.x >%1.exe
	DEL %1.tmp
	DEL %1.tco
	DEL %1.s
	DEL %1.o
	DEL %1.x
	DEL __ldctrl

	Basically, this is what the batch file BUILD/TC.BAT does (plus
	some argument evluation and error checking). TC.BAT is used to
	build the next stage.


	5) BUILDING A STAGE-1 COMPILER

	In this step, the (interpreted) stage-0 compiler is used to
	compile the programs which are used by the compiler to DOS
	native code (EXE files). The resulting stage-1 compiler
	then consists of native executables. For performance reasons,
	this version of the compiler is used to translate the
	remaining programs.

	First, the compiler frontend consisting of TXPP, TXTRN, and
	TXOPT is compiled. TXOPT is also compiled to so that optimized
	executables can be generated in the following step.

	The program [TC] represents the process described at the end
	of the previous section (TC.BAT).

	(S-7)	compiler/txpp.t ---[TC]---> compiler/txpp.exe
	(S-8)	compiler/txtrn.t ---[TC]---> compiler/txtrn.exe
	(S-9)	compiler/txopt.t ---[TC]---> compiler/txopt.exe

	In the next step, an executable of the meta code generator TXCGG
	and the bootstrapping code generator TXCGBOOT are created in the
	NATIVE/ directory:

	(S-10)	native/txcgg.t ---[TC]---> txcgg.exe
	(S-11a)	native/cg86a.def ---[TXCGG]---> native/txcg86a.tmp
	.	+ native/txcg_asm_frame
	(S-11b)	native/txcg86a.tmp ---[TC]---> native/txcgboot.exe

	In the last step at this stage, the S86 kit programs required
	for building EXE files are compiled to native code:

	(S-12)	s86kit/s86b.t ---[TC]---> s86kit/s86b.exe
	(S-13)	s86kit/sld.t ---[TC]---> s86kit/sld.exe
	(S-14)	s86kit/cvexe.t ---[TC]---> s86kit/cvexe.exe

	At this point, all programs which are required to create
	native DOS executable files from T3X source programs have been
	themselves compiled to native code.
	Notice that this was possible, because interpreted versions of
	the compiler phases were used. Therefore, the only required
	executable program was TXX, the Tcode interpreter.
	Using the created executables, T3X programs can now be compiled
	much more efficiently, because
		(1) the compiler runs faster now
		(2) an optimizer is available.

	The batch program TXBOOT.BAT in the NATIVE/ directory will be
	used to compile the rest of the T3X package and to rebuild the
	(non-optimized) executables build by the stage-0 compiler.
	Basically, TXBOOT performs the following steps:

	TXPP <%1.t >%1.tmp
	TXTRN <%1.tmp >%1.tco
	TXOPT <%1.tco >%1.tmp
	TXCG86A <%1.tmp >%1.s
	S86B <%1.s >%1.o
	ECHO -o %1.x >__ldctrl
	ECHO %1.o >>__ldctrl
	ECHO libtx86.o >>__ldctrl
	SLD <__ldctrl
	CVEXE <%1.x >%1.exe
	DEL %1.tmp
	DEL %1.tco
	DEL %1.s
	DEL %1.o
	DEL %1.x
	DEL __ldctrl

	This is basically the same procedure as shown in the introduction,
	but it contains an additional optimization step.


	6) BUILDING A STAGE-2 COMPILER


	6.1 THE FRONT END AND CODE GENERATORS

	Almost the entire stage-2 compiler (excluding only the assembler
	and loader) is built in the NATIVE/ directory. These files will
	be built (resp. rebuilt):

	TXCG386, TXCG86T, TXCG86A, TXCGC, TXTRN, TXOPT, TXOPTB, TXPP,
	UX, TXINFO

	The code generators TXCG386 (GAS-386), TXCG86T (TASM/MASM), and
	TXCGC (C) as well as the additional tools UX (Tcode disassembler)
	and TXINFO (Tcode examiner) are not really required, but they
	are also compiled during this phase.

	The batch program TXBOOT (which has been described in the
	previous section) is used to compile T3X programs at this stage.

	Additionally, the batch program MAKEGEN.BAT is used to build
	native code generators from TXCGG definition files. Basically,

	MAKEGEN -TYPE MACHINE

	performs the following tasks:

	(1) Select the appropriate code generator frame file (where
		%TYPE% must be either ASM or HLL).

		COPY TXCG_%TYPE%_FRAME TXCG_FRAME

	(2) Run TXCGG on the requested machine description
		(a corresponding CG*.DEF file must exist).

		TXCGG < CG%MACHINE%.DEF >TXCG%MACHINE%.TMP

	(3) Preprocess the resulting code generator.

		TXPP <TXCG%MACHINE%.TMP > TXCG%MACHINE%.T

	(4) Compile the code generator.

		TXBOOT TXCG%MACHINE%.T

	(5) Do some cleanup.

		DEL TXCG_FRAME
		DEL TXCG%MACHINE%.TMP
		DEL TXCG%MACHINE%.T

	In detail, the first phase of the stage-2 build process
	includes these steps:

	(S-15)	native/cg386.def ---[MAKEGEN]---> native/txcg386.exe
	.	+ txcg_asm_frame
	(S-16)	native/cg86t.def ---[MAKEGEN]---> native/txcg86t.exe
	.	+ txcg_asm_frame
	(S-17)	native/cg86a.def ---[MAKEGEN]---> native/txcg86a.exe
	.	+ txcg_asm_frame
	(S-18)	native/cgc.def ---[MAKEGEN]---> native/txcgc.exe
	.	+ txcg_hll_frame

	(S-19)	compiler/txtrn.t ---[TXBOOT]---> native/txtrn.exe
	(S-20)	compiler/txopt.t ---[TXBOOT]---> native/txopt.exe
	(S-21)	compiler/txpp.t ---[TXBOOT]---> native/txpp.exe

	(S-22)	compiler/ux.t ---[TXBOOT]---> native/ux.exe
	(S-23)	compiler/txinfo.t ---[TXBOOT]---> native/txinfo.exe

	The code generator generator TXCGG and the initial code
	generator TXCGBOOT are not rebuilt, because they are no longer
	required after building the compiler.


	6.2 THE S86 KIT

	In the final step, stage-2 versions of the assemler S86B, the
	loader SLD, and the EXE format converter CVEXE are created.
	Additionally, the rest of the S86 development kit is also
	build at this point. TXBOOT is used to compile each program.

	This phase begins with the creation of the not yet compiled
	programs contained in the S86 kit.

	(S-24)	s86kit/cvimg.t ---[TXBOOT]---> s86kit/cvimg.exe
	(S-25)	s86kit/hd.t ---[TXBOOT]---> s86kit/hd.exe
	(S-26)	s86kit/rmsym.t ---[TXBOOT]---> s86kit/rmsym.exe
	(S-27)	s86kit/s86.t ---[TXBOOT]---> s86kit/s86.exe
	(S-28)	s86kit/snm.t ---[TXBOOT]---> s86kit/snm.exe
	(S-29)	s86kit/sz.t ---[TXBOOT]---> s86kit/sz.exe
	(S-30)	s86kit/xo.t ---[TXBOOT]---> s86kit/xo.exe

	Then, the extended version of the T3X runtime support module
	is built. In fact, there are two modules: LIBTX86 which holds
	the basic RT support routines, and LIBVIO which contains the
	terminal I/O services.

	(S-31)	s86kit/libtx86.s ---[S86]---> s86kit/libtx86.o
	(S-32)	s86kit/libvio.s ---[S86]---> s86kit/libvio.o

	The last step rebuilds the two-pass assembler S86B, the loader
	SLD, and the EXE converter CVEXE. Compiling the first two
	programs works as usual:

	(S-33)	s86kit/s86b.t ---[TXBOOT]---> s86kit/s86b.exe
	(S-34)	s86kit/sld.t ---[TXBOOT]---> s86kit/sld.exe

	Recompiling CVEXE cannot be done using TXBOOT, because the
	last command in TXBOOT would be

	CVEXE <CVEXE.X >CVEXE.EXE

	in this case, and CVEXE.EXE would get destroyed before it
	could be invoked. Therefore, it must be renamed before compiling
	it:

	(S-35a)	copy cvexe.t _.t
	(S-35b)	txboot _
	(S-35c)	del _.t
	(S-35d)	del cvexe.exe
	(S-35e)	ren _.exe cvexe.exe

	This is a situation which occurs frequently when bootstrapping
	compiler, and therefore, it is shown in detail here.

	After building the stage-2 version of the S86 kit, the entire
	compiler has been translated into EXE-style native code.

	The bootstrapping process is completed.


	7) TESTING THE COMPILER

	A very basic test for the newly built compiler is the so called
	TRIPLE TEST which works as follows:

	(1)	A stage-0 compiler is built. Any language may be
	.	used to build this compiler).
	.	------------------------------------------------
	(2)	The stage-0 compiler is used to create a stage-1
	.	compiler. The stage-1 compiler is written in the
	.	source language of the compiler being tested.
	.	------------------------------------------------
	(3)	The resulting stage-1 compiler is used to create
	.	stage-2 compiler from the same source code. The
	.	compiler 'compiles itself'.
	.	------------------------------------------------
	(4)	Since the stage-1 and stage-2 compiler have been
	.	build from the same source code, their output
	.	must be identical.

	If step (4) is passed, all further steps in the style of (3)
	would yield the same output, because each resulting compiler
	would be equal to the one generated at the previous stage.

	The test is called 'triple test', because the compiler must
	be compiled three times when the first compiler is a stage-0
	compiler. This is because the code generated by the stage-0
	compiler might differ from the code generated by the
	compiler at later stages -- because it is coded in a different
	language. Frequently, the stage-0 compiler implements only a
	subset of the source language, because its only purpose is
	to create the stage-1 version.

	The Makefile.S86 in the NATIVE/ directory contains a DIFF
	target which performs the steps (2-4) of the triple test.
	Notice, though, that this test compares only the translator
	TXTRN.

	NOTICE: The triple test for simple compilers like T3X is a
	negative test: When it is passed, this is no indicator for the
	quality of the compiler. When it fails, however, something
	went horribly wrong.

	To run a more complete test, follows these steps:

	(1) after bootstrapping T3X, install the binaries.

	(2) clean the distribution using 'make clobber' and rebuild
	    the kit using the installed binaries (make all).

	(3) install the kit again, but this time in a different place.

	(4) compare the entire directories.

