GASP Updated Standard Format Document (v1.1)

by
Scot Dyer (Lincoln, Nebraska), David Epstein (Warwick),
Derek Holt (Warwick) and Paul Sanders (Warwick).
Date: 12 August 1995.

PHILOSOPHY

We give the details of some file formats for finite state automata, for group presentations, for rewriting systems, and for some related objects. Why file formats? The main reason is that if one writes one's programs so that they always expect input and always give output in the agreed format, then the programs can cooperate with each other. Moreover, definite rules make it easy to write filters which convert one format into another. Software for which such rules are not fixed ahead of time is likely to be impossible to convert.

We have decided to use the syntax of the GAP package. This is a widely used package for computational group theory developed at Aachen. Our hope is that programs developed using our file formats will thereby become part of GAP, an established system. This will make our programs accessible to people who have not yet heard of GASP. Using the GAP format make it easier to apply other parts of the GAP package before and/or after applying our programs. Another advantage is that details of design of language, including aspects such as flow of control, are left to others, and we are freed to concentrate on aspects which are more central to our interests.

Fixing the format eases the use of Unix pipes, and information can also be exchanged using files.

We have tried to design the formats so that they are easily readable by humans. In particular, our format is in readable ASCII characters. Human readability makes it much easier to check on correctness of input and output. I/O usually takes a very small proportion of the time of a computational group theory package. If fast I/O is required for some purpose, this will presumably be in binary and subject to a very different set of criteria from the ones we have adopted in this document.

INTRODUCTION

This document is the development of some ideas which were discussed in the preliminary GASP (Groups, Automata and Semigroups Project) meeting in August 1994. This version was written at the WASGAS (Workshop on Algorithms and Software for Groups, Automata and Semigroups) meeting in August 1995, and contains a number of minor additions and amendments that have been introduced as a result of the experiences of programmers using the format during the past year.

A formal description of the grammar has been written by Paul Sanders.

For those completely unfamiliar with GAP, we have included a brief introduction to those parts of GAP syntax which are relevant to our file format in Appendix A, at the end of the document. Careful study of the examples below might also be sufficient. GAP is free and available from a number of sites via anonymous ftp. Further information is included in Appendix B.

A small penalty we have to pay for using GAP rather than our very own format is that the GAP format may be a little more difficult for humans to read than our own format would be. However this is a small price to pay compared with the advantages. Some of us may develop our own formats which are easier for humans to read, and filters to and from GAP. However the main line will be the GAP format.

Those who have already developed their own formats for finite state automata, are encouraged to write filters to and from GAP.

It must be stressed that a file format is not the same thing as a data structure in a particular language. It is perfectly possible for a program to read in a "dense" file format and then decide to use a sparse data structure, or vice versa.

This file format is intended to be extensible. If our file format is used by others, they are likely to be interested in different applications, and different fields will be necessary. Programmers using this file format should bear this possibility in mind, particularly when writing parsers. The program you wrote yesterday should not crash when the file format is extended tomorrow. Your program's response to an unknown field should be to issue a warning, and to continue to do its computation as well as it can.

Future changes to the file format should take into account the need for compatibility. As far as possible, data files which run under one format should continue to run under future formats. We should not make changes lightly.

Our philosophy is to keep our file format as simple as possible within these constraints. Our design is intended to deal with current applications and with applications we intend to develop in the near future. The value of an agreed file format is precisely that the number of possibilities is limited. Catering for all possible future extensions is counter-productive, in that time is wasted on ideas which will never be implemented; we also believe that looking forward too far will result in an unnecessarily complicated format. The shape of a file format needs to be guided by experience.

We start with two important generalities. As in GAP, if the symbol '#' occurs, not within a quoted string, then that symbol and and all symbols up to and including the next newline should be ignored by input routines. (This is the method used for inserting comments within files.)

If the backslash symbol '\' occurs followed immediately by a newline, then both are ignored. (This is the method used to break lines containing very long constructs.)

FORMAT FOR A FINITE STATE AUTOMATON (FSA)

Each of the following fields is required in each GAP record representing an FSA. Each field name is followed by a type and a body of motivation and description of its function. Examples are given later in the document under the heading FSA EXAMPLES. The fields must occur in the order listed below, except that the 'alphabet' and 'states' field can occur in either order, as can the 'initial' and 'accepting' fields. To try and ensure that existing code doesn't break when the grammar is extended, we allow fields of the form to occur anywhere in the record ( with a couple of exceptions ) provided that the result is still a legal GAP record and the identifier is not already used by us as a field identifier. Programs should be capable of reading over such fields and ignoring them, possibly printing a warning message.

The naming convention used here is to use mixed case -- the first mnemonic component in the name of a field name is in lower case, and each of the subsequent mnemonic components begins with a capital letter.

These lists represent respectively the initial states of the automaton and the accepting states of this automaton, encoded according to the correspondences given by the states field between actual states and natural numbers. Note that the GAP expression [1..n], which is an abbreviation for the list of integers i satisfying 1 <= i <= n in their natural order, may be used here.

(Note that it is meaningless, though grammatically correct, for the list of accept states to include an integer not corresponding to any state. Programs are entitled to assume that their input is not only grammatical, but also meaningful. Correspondingly greater care should be taken that output from programs is always meaningful. Programs which receive meaningless input are entitled to give meaningless output---it may take too long to check for meaningfulness. Programs designed to accept human input should check for more than grammatical correctness.) table : record

FINITE SET RECORDS

The purpose of a finite set record is to provide a one-to-one correspondence between an initial segment of the natural numbers and a finite set which will often have additional structure. All finite set records contain at least the following two fields, which must come in this order before other required fields. Other fields, not in the list of required fields may occur in any position after the 'type' field. Programs should be capable of reading over such fields and ignoring them, possibly printing a warning message.

Finite Set Record Format Types

FSA EXAMPLES:

fsa_1 := rec ( );

fsa_2 := rec (

);

#Note that the language of fsa_1 is identical to the language of fsa_2
#(i.e. all strings starting with the second symbol), and
#fsa_2 is a minimized form of fsa_1.
#The following two examples are nondeterministic automata accepting the
#same language. They are identical, but different formats are used for
#their transition tables.

fsa_3 := rec(

);

fsa_4 := rec(

);

fsa_5 := rec (
# This is the word-acceptor for the free group on two generators a,b
# A and B are the inverses of a and b respectively.
# The accepted language consists of all strings that do not have
# aA, Aa, bB or Bb as a substring.

);

fsa_6 := rec (
# This is the word-difference machine arising in the automatic structure
# of the free group on two generators.
# It is included here to illustrate the product set record type.

);

FORMAT FOR A REWRITING SYSTEM (RWS)

Some or all of the following fields are required in each GAP record representing an RWS. Each field name is followed by a type and a body of motivation and description of its function. Examples are given later in the document under the heading RWS EXAMPLES. The fields that are present must occur in the order listed below except that the ordering and generatorOrder fields can come in either order. Other fields not required by the format may occur in any position after the "isRWS" field ( which must be the first field of the record ). Programs should be capable of reading over such fields and ignoring them, possibly printing a warning message. In particular, there are many possible fields that might be used to control the running of a Knuth-Bendix program, such as limits on the total number of rewriting rules, on the maximal lengths of stored rules, or on the maximal lengths of overlaps to be processed. We shall not attempt to list all such possibilities in detail.

RWS EXAMPLES

1. #A knot group
rws_1 := rec( );

2. #monoid presentation of F(2,7) - should produce a monoid of length 30
#which is the same as the group, together with the empty word.
rws_2 := rec(

);

3. # Note that the generator _H, which stand for a subgroup, has no inverse in # this example.
rws_3 := rec(

);

APPENDIX A - AN INTRODUCTION TO GAP SYNTAX

GAP syntax is highly regular, with only a few basic building blocks and ways of structuring data.

GAP identifiers are one of the basic building blocks employed by our format. Their definition is almost exactly that of identifiers in any other language - i.e. a string of alphanumeric symbols or underscores starting with a letter or underscore.

Note that, in the format description above, we have used the concept "identifier" in a slightly wider sense, to include a GAP identifier having a suffix of a natural number preceded by a decimal point ('.') For example: gp11_b.12 ). Such expressions are used in GAP to denote generators of algebraic objects (and actually represent field values of a record).

An identifier may not be a reserved word (as usual).

GAP has integer values, which are represented exactly the way you would expect.

GAP has ASCII strings as a basic component, delimited by double-quotes.

GAP allows one to write expressions in a group using identifiers to denote group elements and denoting the group's multiplication by '*' and powers by '^'. The special identifier 'IdWord' stands for the identity element. Inverses may be represented by raising an element to the -1 power. Such group expressions are referred to as 'words' in this document.

The most basic aggregate data type in GAP is the list. Lists are delimited by square brackets ('[' and ']'), and within elements of the lists are comma separated. GAP lists may be heterogeneous, containing any valid GAP values as elements. GAP lists may be partial in the sense that they may omit values for some elements.

Finally, GAP allows data to be aggregated with the individual pieces named by GAP identifiers rather than merely being distinguished positionally, as in a list. The vehicle for this is the GAP record. Syntactically, this begins with the GAP reserved word 'rec' followed by a comma separated sequence of field assignments in round brackets ('(' and ')'). Each field assignment takes the form of a GAP identifier, the special symbol ':=' and the GAP value to be associated to that name, which may be any legal GAP value, including another record.

APPENDIX B

(This is an extract from the GAP "README" document.)

How to get GAP