Getting Started with GAUSS


What is GAUSS?

GAUSS is a programming language designed to operate with and on matrices. It is a general-purpose tool. As such, it is a long way from more specialised econometric packages. On a spectrum that runs from the computer language C at on e end to, say, the menu-driven econometric program MicroFit at the other, GAUSS is very much at the programming end.

Using GAUSS thus calls for a very different approach to other packages. Although a number of econometric add-ons have been written (for example, ML-GAUSS, a suite of maximum likelihood applications), you will rarely be able to "turn up and go" with GAUSS. More often than not, getting useful results from GAUSS requires thought, a systematic approach, and usually a little time.

Having said that, the thought required is often no more than recognition of what precisely you are trying to achieve. The GAUSS operators and the standard library functions are designed to work with matrices. This means that if you can write down the o perations you want to perform, the chances are they can be translated directly into a line in your program. The statement "b=(X'X)-1X'y" is acceptable to GAUSS with only minor changes.

GAUSS is not case sensitive. However, throughout this page, capitals will be used for 'reserved words' and standard GAUSS functions. The names of all variables are lower case, with capital letters separating words. Procedures will be id entified by an initial capital. All this makes no difference to GAUSS; it just makes life easier. Italics will be used to indicate a value to be substituted.

Where a constant is mentioned, this means an actual number or character set. Values are the results of some operation. A value may be a constant, but a constant may not be a value. Constant-list and value-list are lists of constants or values, separated by spaces or punctuation marks. The type of separator may affect the result of the operation.

Examples

LET

GAUSS reserved word

DELIF

GAUSS standard procedure

Process

User-defined procedure

FindFile

User-defined procedure

mat1

variable

fileName

variable

constants

A

"a"

27

"ok"

-0.0062

5.3E+2 (5,300 in scientific notation)

Invalid constants

a*b

c-27

constant-lists

a b c d e

a, b, c,

"a", "b", "c"

1,2,3,4.5,6.7,8

1 2 3 4.5 6.7 "hello" 8

values

a

"a"

a*b

b+a

"ok"

5.3*102

value-lists

a*b, b*c, c*a

a*b 25 b*c "hello" c*a

 

Note that, when constants are expected, a string constant (a piece of text) may or may not be enclosed in quotation marks. It makes no difference to GAUSS, other than to make errors more likely. By contrast, when a value i s expected, a string without quotation marks will be treated as a variable the current value of which is to be used. To try to avoid this confusion, this course book will place string constants in quotation marks; strings with no quotation m arks will be variables.

Layout and Syntax

GAUSS could be described as a free-form structured language: structured because GAUSS is designed to be broken down into easily-read chunks; free-form because there is no particular layout for programs. Although the syntax is closely defined, extra spaces between words (including line breaks) are ignored. Commands are separated by a semicolon, rather than having one command on each line as in FORTRAN or BASIC. A complete instruction is identified by the placing of semicolons, not by the placing of commands on different lines. Program layout is generally a matter of supreme indifference to GAUSS, and this gives the user freedom to lay out code in a style he finds acceptable.

For example, the conditional branching operation IF could be written

IF condition; action1; ELSE; action2; ENDIF;

Equally acceptable to GAUSS would be

IF condition;

action1;

ELSE;

action2;

ENDIF;

IF condition; action1;

ELSE; action2; ENDIF;

IF condition;

action1;

action2;

ENDIF;

There are some exceptions to the rule that layout does not matter. Obviously, there cannot be extraneous spaces within words or numbers: 'I F', 'var 1' and '27 000' are not the same as 'IF', 'var1' and '27000'. In more recent versions o f GAUSS (3.2 and above) spaces within mathematical expressions are not allowed in certain places, although this does not seem to be consistently enforced.

The other place (in this course) where spacing is important is in comments:

/* This is a comment */

Anything within the /*...*/ markers is ignored by the program. However, there must not be a space between the slash and the asterisk, or the program will not recognize a comment marker and will erroneously try to analyze the contents of the comment block.

 The Editor and the Command Line

GAUSS in common with many other programs, will take instructions either from a file or from the command line. From the command line, as each instruction is typed in, it is executed. A semicolon is not necessary at the end of each line. Alternatively, giving GAUSS the command

RUN fileName

will execute all the instructions in the file fileName in sequence. The results are, in theory, identical, whether the commands are in a file or typed in one at a time. The choice of when to work at the command line and when to place instructions in a file depends on the problem at hand; however, for more than a couple of lines of code, working in a file is usually easier.

The command line actually uses the file editor when taking instructions from the user. The file editor is a full screen editor: the arrow keys are employed to move up, down, left and right. PageUp and PageDown move around the file one s creen at a time. If Home is pressed once, the cursor moves to the start of the line; twice, it moves to the top of the screen; three times, the start of the file. End works just the same going forwards through the file. Delete and BackSpace work as normal . ALT-X (pressing the ALT and "x" key at the same time) exits the editor, with the option to Write&quit or just Quit.

There are a couple of curious keys used by GAUSS. The grey "+" and "-" keys copy and cut, respectively, a line of text - so do not use the numeric keypad for entering calculations. The Insert key (sometimes labelled Ins) reverses this, inserting the last line cut or copied. ALT-L selects a block, so groups of lines can be cut or copied and then inserted. Only one block is kept in the delete buffer at one time, so deleting one line and then another means that the first is lost for good, whereas the second can be recovered repeatedly.

Four other useful functions include: ALT-I toggles between insertion and overwrite modes; ALT-R reads another file into the currently edited one; ALT-G means "go to line number...", prompting for a number; and ALT-H brings up the Help s creen.

Variable

GAUSS variables are of two types: matrices and strings. Matrices obviously include vectors (row and column) and scalars as sub-types, but these are all treated the same by GAUSS. For example

a = b + c;

is valid whether a, b, and c are scalars, vectors, or matrices, assuming the variables are conformable. However, the results of the operation might be slightly different depending on the variable type.

Matrices may contain numerical data or character data or both. Numerical data are stored in scientific notation to around 12 places of precision with a range of about 10±35. Character data are sequences of up to eight characters that count as one element of the matrix. If you enter text of more than eight characters into the cells in a matrix, the text will be truncated.

Strings are pieces of text of unlimited length. These are used to give information to the user. If you try to assign a string value to an element of the matrix, all but the first eight characters will be lost.

Examples of data types

Numerical matrix 4x3

1

2.2

-3

6.29*10-6

5

7

9

99

100

1000

-5.3*1020

4

 

Character matrix 2x3

Will

Will

Harry

Steve

Harry

Dick

John

HarryIII

 

Mixed matrix 5x3

Edinburg

40

EH

Glasgow

25

G

Heriot-W

43

EH

Stirling

0

FK

Strathcl

23

G

 

Strings

"Hello Mum!"

"Strings are pieces of text of unlimited length"

"2.2"

""

Note the truncation of text in the character and mixed matrices. The null string "" is a valid piece of text for both strings and matrices.

Because GAUSS treats all matrix data the same, GAUSS sometimes must be told that it is dealing with character data. The "$" sign identifies text and is used in a number of places. For example, to display the value of the variable "v1" r equires

PRINT v1;

PRINT $v1;

PRINT v1; or PRINT $v1;

depending on whether v1 is a numerical matrix, a character matrix, or a string. Strings are identified by GAUSS and don’t need the $. You can put one in if you like but it makes no difference to printing.

All variables must be created and given an initial value before they are referenced; that is, a named memory location is reserved. Acceptable names for variables are up to eight characters long, can contain alphanumeric data and the und erscore "_", and must not begin with a number. Reserved words may not be used; standard procedure names may be reassigned, but this is not generally a good idea.

Acceptable variable names:

eric

Eric

eric1

eric_1

_eric1

_e_r_i_c

Unacceptable variable names:

1eric

100

if (reserved word)

DELIF (legal, but foolish)

 

Creating matrices

New matrices can be defined at any point. The easiest way is to assign a value to one. There are two ways to do this - by assigning a constant value or by assigning the result of some operation.

LET creates matrices. The format for creating a matrix called varName is

LET varName = constant-list;

LET varName[r,c] = constant-list;

In the first case, the type of matrix depends on how the constants were specified. A list of constants separated by space will create a column vector. If, however, the list of constants is enclosed in braces {}, then a row vector will b e produced. When braces are used, inserting commas in the list of constants instructs GAUSS to form a matrix, breaking the rows at the commas. If curly braces are not used, then adding commas has no effect. In the first case, the actual word 'LET' is opti onal.

If the second form is used, then an r by c matrix will be created; the constants will be allocated to the matrix on a row-by-row basis. If only one constant is entered, then the whole matrix will be filled with that number.

Note the square brackets. This is the standard way to tell GAUSS either the dimensions of a matrix or the coordinates of a block, depending on context. The first number refers to the row, the second the column. Braces generally are used within GAUSS to group variables together.

Referencing matrices

Referencing strings is easy. They are one unit, indivisible. Matrices, on the other hand, are composed of the individual cells and access to these might be required. GAUSS provides ways of accessing cells, columns, rows and blocks o f the matrix as well as referring to the whole thing.

The general format is

mat[r1:r2,c1:c2]

where r1, r2, c1, and c2 may be constants, values, or other variables. This will reference a block from row r1 to row r2, and from column c1 to column c2. A value could be assigned to this block; or this block could be ext racted for output or transfer to some other location.

Using procedures

The library functions in GAUSS work like library routines in other packages - a procedure is called with some parameters, something happens, and a result may be returned. The difference in GAUSS is that the parameters are variables, and the returns are variables - and there may be several of them. The general format is

{outVar1, outvar2, ... outVarN} = ProcName (inVar1, invar2, ... inVarN);

The inVar parameters are giving information to the procedure; the outVar variables are collecting information from the procedure. The input parameters will be unaffected by the action of the procedure (unless, of co urse, they also feature in the output list). The outVar parameters will be affected, and so obviously constants can not be used:

{outVar1, "eric"} = ThisProc (inVar1, inVar2);

is incorrect.

Note that we have curly brackets {} to group variables together for the purposes of collecting results; but that we have round brackets () to delineate the input parameters. Don't ask me why.

If there is one or no parameter, then the form can be simplified:

{outVar1, outvar2, ... outVarx} = ProcName (inVar);

one input parameter

{outVar1, outvar2, ... outVarx} = ProcName;

no input parameter

ProcName (inVar1, invar2, ... inVarx);

no returned result

outVar = ProcName (inVar1, invar2, ... inVarx);

one result returned

 

For example, the procedure DELIF requires two input parameters (a matrix and a column vector), and returns one output, a matrix:

outMat = DELIF (inMat, colVec);

It is the programmer's responsibility to ensure that the right sort of data is used; all GAUSS will check is that the correct number of parameters is being passed back and forth.

INPUT AND OUTPUT

GAUSS reads input from, and writes output to, a number of types of file. This course is only concerned with three kinds:

GAUSS File Types

File Extension

GAUSS datasets

.dat, .dht (files come in pairs)

GAUSS matrices

.fmt

ASCII files (normal text)

anything

 

The first type is a data set much as you would give to any other econometric package, although it has to be converted to a GAUSS-readable form prior to use. The second is a matrix, pure and simple. The third type could contain anything - including a data set in ASCII format or program display output. We consider each of these in turn, starting with the simplest.

Remember that Unix file extensions are case sensitive.

Unix GAUSS and the PC GAUSS have a different data format, doing away with the .dht files. A program called "transdat" converts between the formats.

GAUSS Matrices (.fmt files)

A .fmt file contains a GAUSS matrix; nothing more or less. A matrix has been saved onto disk and can be retrieved at any time. This is the default option - if no extension is given to file names, GAUSS will assume it is reading or w riting a matrix file.

The commands for matrix files are

LOAD varName=fileName; or

LOADM varName=fileName;

SAVE fileName=varName;

LOAD and LOADM are synonyms. The reason for using the latter is that there are other similar commands (LOADP, LOADS, LOADF, LOADK) which load different types of object (see LOAD in the manual).

varName is the name of the variable in memory to be saved or loaded.; fileName is the name of the matrix file with no .fmt extension. For example,

SAVE "file1" = mat1;

LOADM mat2 = "file1";

creates a file on disk called file1.fmt which contains the matrix mat1. This is then read into a new matrix, mat2.

If the disk file has the same name as the variable, then fileName can be omitted:

LOADM eric;

SAVE lucy;

will load the matrix eric from the file eric.fmt, and then save the matrix lucy to a file called lucy.fmt.

An alternative is to have the name of the file in a string variable. To tell GAUSS that the name is contained in the string, the caret (^) operator has to be used. GAUSS then looks at the current value of the variable to see which name to use, instead of taking the variable name as a constant value. For example,

 fileName = "file1";

LOADM mat1 = ^fileName;

fileName = "file2";

SAVE ^fileName = mat1;

This piece of code reads a matrix from file1.fmt and then saves it to file2.fmt. If the caret was left out, then GAUSS would be looking for files called "fileName". This indirect referencing is the more usual way of using file names: it allows for the program to prompt for names, rather than having them explicitly coded into the program. This is useful when the program does not know what files are to be used - for example, if a program is to be run on several sets of data.

 GAUSS Data sets (.dat/.dht files)

GAUSS data sets are created by writing data from GAUSS or by taking an ASCII file and converting through a stand-alone program called ATOG.EXE (Ascii TO Gauss). As with the data sets for other econometric packages, they consist of r ows of data split into fields. The actual dataset is held in the .dat (data) file, while the .dht (header) file contains the names of each of these fields, along with some other information about the data file. GAUSS will automatically add . dat (or .dht) to the filenames you give, and so there is no need to include the extension.

Unlike the GAUSS matrices, reading from or writing to a GAUSS data set is not a single, simple operation. For matrices, the whole object is being moved into memory or onto disk. By contrast, a GAUSS data set is used in a number of stage s. Firstly, the file must be opened; then it may be read from or written to, which may involve the whole file or just a few lines; finally, when references to the file are finished, it should be closed.

All files used will be given a handle by GAUSS; this is a scalar which is GAUSS's internal reference for that file. It will be needed for all operations on that file, and so should not be altered. The handle is needed because sev eral files can be 'open' at one time (for example, reading from one, writing to another); precisely how many depends on the computer's configuration (the CONFIG.SYS file instructions). Without the file handle, a data set cannot be accessed, and if the fil e handle is overwritten then the wrong file may be used. So be careful with your handles.

A file must exist before it can be opened. To start a new data set for writing, it must be created. This is done by

CREATE handle = fileName WITH colNames, columns, type;

handle is the handle GAUSS will return if it is successful in creating filename. This fileName may be a constant like "file1", or it may be a string, referenced using the ^ operator (as for LOAD and SAVE). colNames is the list of names for the columns (usually a character vector); columns tells GAUSS how many columns of data there are (which is not necessarily the same as the number of names - it may be sensible to have some "spare" columns); and type is the storage precision of the data - integers, single precision, or double precision. For example,

fileName = "file1";

varNames = "Name" "age" "sex" "wage";

CREATE handle1 = ^fileName WITH ^varNames, 4, 4;

prepares a datafile called file1.dat for writing. A header file file1.dht will also be created, which records that the datafile should contain four columns, named "Name", "age", "sex" and "wage", and in single precision (type=4, the def ault).

CREATE is not needed very often - only when writing a brand new data set. More usually data sets are ATOG conversions from ASCII files. Alternatively, matrices may be converted into data sets using the command

success = SAVED (variable, fileName, colNames);

where variable is the matrix to be saved, fileName and colNames are above, and success is a scalar variable set to 1 if the operation worked.

A data set must be opened for either reading or writing or "updating" (both). Once a data set has been opened for one "mode" it cannot be switched to another. The command is

OPEN handle=fileName FOR mode VARINDXI offset

handle is a non-negative scalar, the file handle returned to you if the operation is successful (if the command did not work, the handle is set to -1). The file handle should always be set to zero before this command, to avoi d the possibility of GAUSS trying to open a file already open. fileName is as above.

The mode is one of READ, APPEND, or UPDATE. If the mode is omitted, GAUSS defaults to READ. If READ is chosen, updating the file is not allowed. Choosing APPEND means that data can only be appended to the file; the existing conte nts cannot be read. UPDATE allows reading and writing.

When GAUSS opens the file, it reads the names of fields (columns) from the .dht file and prefixes them all with "i" (for index). These can then be used to reference the columns of the data set symbolically instead of using column number s explicitly. This makes programs more readable, more easily adapted, and less likely to be upset by changes in the structure of the data set.

Using these index variables causes some problems for GAUSS when it is checking a program prior to running it. VARINDXI is an option for the READ command, but it is a way of getting round these problems and so should generally be include d. The offset scalar option shifts all these indexes by a scalar and so is useful if the data is to be concatenated horizontally to another matrix or data set. However, usually it can be left out.

When a file is CREATEd, it is automatically opened in APPEND mode (obviously; there is nothing to be read as yet). However, creating new data sets is much rarer than accessing a pre-existing data set, and so OPEN is more common than CRE ATE.

As an example, to open the file created in the previous sub-section for reading, the command would be

OPEN handle1 = "file1" FOR READ VARINDXI;

which would give a file handle in handle1, and four scalar indexes: iname, iage, isex, and iwage, set to 1, 2, 3, and 4 respectively.

 ASCII Input

Input can be taken from ASCII (i.e. normal alphanumeric text) files using the LOAD command. The LOAD command is augmented by the addition of square brackets that indicate the ASCII nature of the file

LOAD varName[] = fileName; or

LOAD varName[r, c] = fileName;

In the first case, GAUSS will load the contents of fileName into the column vector varName, which can then be checked for size and reshaped. This is the preferred option for loading ASCII files. Items can be numeric or tex t and should be separated by spaces or commas. Line breaks are treated as white space: GAUSS does not use them to distinguish rows. Text items longer than eight characters will be truncated.

The second form loads the file into a r by c matrix. If there are too many elements in the file for the matrix, then the extra ones will not be read; if the file does not contain enough data items, then the ones found will be repeated u ntil the matrix is full.

ASCII Output

Producing ASCII output files is no different from displaying on the screen. GAUSS allows for all output to be copied and redirected to a disk file. Thus anything which appears on the screen also appears in the disk file. To produce an A SCII file therefore requires that (i) an output file is opened; (ii) PRINT is used to display all the information to go into the output file (iii) the output file is closed when no more output is to be sent to it.

The relevant command to begin this process is OUTPUT:

OUTPUT FILE = fileName ON; or

OUTPUT FILE = fileName RESET;

Both will instruct GAUSS to send a copy of everything it displays, from that point onward, to the file fileName. If fileName does not already exist, then these two are identical; but if the file does exist, then the first form en sures that any output is appended to the existing contents of the file, while the second empties the file before GAUSS starts writing to it. If no file name is given, then GAUSS will use the default "output.out". There is no default extension for output f iles.

Once a file has been opened, it can be closed and opened any number of times by using

OUTPUT ON; or

OUTPUT OFF; or

OUTPUT RESET;

 

These commands will all work on the last recorded file name given. The FILE=fileName bit could be included here as well if the user wishes to swap between different output files; generally, however, only one output file is used for a pr ogram, and so naming the file explicitly is superfluous.

 

Statistical Consulting Lab Links

 

Mission of the Statistical Consulting Lab

Recent Projects

Vet and Unix Information

Statistical Software

Data Resources

SCL Tips

Requests for Help and Discussion

  • Software Related
  • Research Related

Fall Schedule for Lunch Seminars

Statistical Links

Home to the Statistical Consulting Lab Page