Draft tutorial: Building programs Introduction: compiled languages Languages like Fortran, C, C++ and Java, to name but a few, share certain characteristics: you write code in your language of choice but then you have to build an executable program from that source code. Other languages are interpreted - the source code is analysed by a special program and taken as direct instructions. Two very simple examples of that type of language: Windows batch files and Linux shell scripts. In this tutorial we concentrate on the first type of languages, with Fortran as the main example. One advantage of compiled languages is that the build process that you need to build an executable program, is used to transform the human-readable source code into an efficient program that can be run on the computer. Let us have a look at a simple example: program hello write(*,*) 'Hello!' end program hello This is just about the simplest program you can write in Fortran and it is certainly a variation on one of the most famous programs. Even though it is simple to express in source code, a lot of things actually happen when the executable that is built from this code runs: * A process is started on the computer in such a way that it can write to the console - the window (DOS-box, xterm, ...) at which you type the program's name. * It writes the text "Hello!" to the console. To do so it must properly interact with the console. * When done, it finishes, cleaning up all the resources (memory, connection to the console etc.) it took. Fortunately, as a programmer in a high-level language you do not need to consider all these details. In fact, this is the sort of things that is taken care of by the build process: the compiler and the linker. Compiling the source code The first step in the build process is to compile the source code. The output from this step is generally known as the object code - a set of instructions for the computer generated from the human-readable source code. Different compilers will produce different object codes from the same source code and the naming conventions are different. The consequences: * If you use a particular compiler for one source file, you need to use the same compiler (or a compatible one) for all other pieces. After all, a program may be built from many different source files and the compiled pieces have to cooperate. * Each source file will be compiled and the result is stored in a file with an extension like ".o" or ".obj". It is these object files that are the input for the next step: the link process. Compilers are complex pieces of software: they have to understand the language in much more detail and depth than the average programmer. They also need to understand the inner working of the computer. And then, over the years they have been extended with numerous options to customise the compilation process and the final program that will be built. But the basics are simple enough. Take the gfortran compiler, part of the GNU compiler collection. To compile a simple program as the one above, that consists of one source file, you run the following command: $ gfortran -c hello.f90 (assuming the source code is stored in the file "hello.f90") This results in a file "hello.o" (as the gfortran compiler uses ".o" as the extension for the object files). The option "-c" means: only compile the source files. If you were to leave it out, then the default action of the compiler is to compile the source file and start the linker to build the actual executable program. The command: $ gfortran hello.f90 results in an executable file, "a.out" (on Linux) or "a.exe" on Windows. Some remarks: * The compiler may complain about the contents of the source file, if it finds something wrong with it - a typo for instance or an unknown keyword. In that case the compilation process is broken off and you will not get an object file or an executable program. For instance, if the word "program" was inadvertently typed as "prgoram": $ gfortran hello3.f90 hello.f90:1:0: 1 | prgoram hello | Error: Unclassifiable statement at (1) hello3.f90:3:17: 3 | end program hello | 1 Error: Syntax error in END PROGRAM statement at (1) f951: Error: Unexpected end of file in ‘hello.f90’ Using this compilation report you can correct the source code and try again. * The step without "-c" can only succeed if the source file contains a main program - characterised by the "program" statement in Fortran. Otherwise the link step will complain about a missing "symbol": $ gfortran hello2.f90 /usr/lib/gcc/x86_64-pc-cygwin/9.3.0/../../../../x86_64-pc-cygwin/bin/ld: /usr/lib/gcc/x86_64-pc-cygwin/9.3.0/../../../../lib/libcygwin.a(libcmain.o): in function `main': /usr/src/debug/cygwin-3.1.4-1/winsup/cygwin/lib/libcmain.c:37: undefined reference to `WinMain' /usr/src/debug/cygwin-3.1.4-1/winsup/cygwin/lib/libcmain.c:37:(.text.startup+0x7f): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `WinMain' collect2: error: ld returned 1 exit status The file "hello2.f90" is almost the same as the file "hello.f90", except that the keyword "program" has been replaced by "subroutine". The above examples of output from the compiler will differ per compiler and platform on which it runs. These examples come from the gfortran compiler running in a Cygwin environment on Windows. Compilers also differ in the options they support, but in general: * Options for optimising the code - resulting in faster programs or smaller memory footprints. * Options for checking the source code - checks that a variable is not used before it has been given a value, for instance or checks if some extension to the language is used. * Options for the location of include or module files, see below. * Options for debugging. Linking the pieces Almost all programs, except for the simplest, are built up from different pieces. We are going to examine such a situation in more detail. Here is a general program for tabulating a function (source code in "tabulate.f90"): program tabulate use function implicit none real :: x, xbegin, xend integer :: i, steps write(*,*) 'Please enter the range (begin, end) and the number of steps:' read(*,*) xbegin, xend, steps do i = 0,steps x = xbegin + i * (xend - xbegin) / steps write(*,'(2f10.4)') x, f(x) enddo end program tabulate Note the use statement - this will be where we define the function f. We want to make the program general, so keep the specific source code - the implementation of the function f - separated from the general source code. There are several ways to achieve this, but one is to put it in a different source file. We can give the general program to a user and they provide a specific source code. Assume for the sake of the example that the function is implemented in a source file "function.f90" as: module function implicit none contains real function f( x ) real, intent(in) :: x f = x - x**2 + sin(x) end function f end module function To build the program with this specific function, we need to compile two source files and combine them via the link step into one executable program. Because the program "tabulate" depends on the module "function", we need to compile the source file containing our module first. A sequence of commands to do this is: $ gfortran -c function.f90 $ gfortran tabulate.f90 function.o The first step compiles the module, resulting in an object file "function.o" and a module intermediate file, "function.mod". This module file contains all the information the compiler needs to determine that the function f is defined in this module and what its interface is. This information is crucial: it enables the compiler to check that you call the function in the right way. It might be that you made a mistake and called the function with two arguments in stead of one. If the compiler does not know aything about the function's interface, then it can not check anything. The second step invokes the compiler in such a way that: * it compiles the file "tabulate.f90" (using the module file); * it combines the object files tabulate.o and function.o into an executable program - with the default name "a.out" or "a.exe" (if you want a different name, use the option "-o"). What you do not see in general is that the linker also adds a number of extra files in this link step, the run-time libraries. These run-time libraries contain all the "standard" stuff - low-level routines that do the input and output to screen, the sine function and much more. If you want to see the gory details, add the option "-v". This instructs the compiler to report all the steps that are in detail. The end result, the executable program, contains the compiled source code and various auxiliary routines that make it work. It also contains references to so-called dynamic run-time libraries (in Windows: DLLs, in Linux: shared objects or shared libraries). Without these run-time libraries the program will not start. Run-time libraries To illustrate that even a simple program depends on external run-time libraries, here is the output from the "ldd" utility that reports such dependencies: $ ldd tabulate.exe ntdll.dll => /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll (0x7ff88f2b0000) KERNEL32.DLL => /cygdrive/c/WINDOWS/System32/KERNEL32.DLL (0x7ff88e450000) KERNELBASE.dll => /cygdrive/c/WINDOWS/System32/KERNELBASE.dll (0x7ff88b9e0000) cygwin1.dll => /usr/bin/cygwin1.dll (0x180040000) cyggfortran-5.dll => /usr/bin/cyggfortran-5.dll (0x3efd20000) cygquadmath-0.dll => /usr/bin/cygquadmath-0.dll (0x3ee0b0000) cyggcc_s-seh-1.dll => /usr/bin/cyggcc_s-seh-1.dll (0x3f7000000) ... To continue ... Include files and modules Managing libraries (static and dynamic libraries) Build tools