While the implementation of HyperCode will be different for each programming language, the basic notions of tightly interlinked code is universal among programming languages. The methods we used to gather the information necessary for creating these links in C code are applicable to many languages.
The fundamental benefit HyperCode gives the user is the explication of program structures. In order to generate HyperCode from raw program source code, the implicit relationships between function, variable, data type, and preprocessor macro definitions must be determined. For a given program, these relationships are implied by the collection of source files and compilation directives which produce an executable form of the program; only their conjunction provides the context necessary to generate HyperCode's links. Since it is generally only at compile-time that these various elements are brought together, we have chosen the compiler itself as our preferred point from which to generate HyperCode.
In general, certain information must be determined about the contents of each individual source file in order to generate HyperCode from the entire collection. We have modified the GNU C Compiler, gcc, to collect information about the nature and exact line and column positions of function, data type, variable, and macro identifiers within each file, then use this aggregate information to generate HyperCode. The gcc program itself is just a wrapper on a group of programs. In order to produce an executable program, it runs several distinct programs to compile each individual file. We collect different sections of the needed program structure information at different stages.
To begin with, each C source file is compiled to an object file. Before actual compilation, each C source file is processed by the C-preprocessor, cpp. Since cpp is responsible for handling macros, we have modified it to output macro definition and expansion information to an auxiliary file.
After preprocessing, each C source file is compiled by cc1, the C language compiler portion of gcc. Because cc1 fully parses the program source code, we were able to modify it to output function, data type, and variable use and definition information to another auxiliary file.
Once each individual source file has been processed, the information gathered about each file must be collected and cross-referenced to discover the program structures that enable such links. We have chosen to perform this collection and cross-referencing at the link stage of compilation, when the names of all source files are presented simultaneously to the linker to produce an executable program. Rather than modify the linker itself, however, we instead modified the gcc wrapper executableto run the actual HyperCode-generating program, hype, with the list of files given to the linker. By maintaining this parallelism with the linker's invocation, HyperCode is guaranteed to generate links which correspond to the actual linking of the executable program.
hype collects the function, data type, and variable information across all program files. It then applies this information, in conjunction with the per-file macro information, to generate a fully linked HyperCode presentation of the source code; this presentation takes the form of three HTML files per program source file, plus an additional top-level file containing links to each source file and the location of the main function.
The HyperCode generation process is actually complicated by a number of C-specific issues. In particular, the #if and #ifdef preprocessor constructs can drastically change the semantics of a piece of code by their truth or falsity at compile-time; accurate HyperCode representation of code involving these constructs is an open problem. Although we will not discuss this or other issues further here, the authors are available for discussion by email.