We have decided to use the syntax of the GAP package. This is a widely used package for computational group theory developed at Aachen. Our hope is that programs developed using our file formats will thereby become part of GAP, an established system. This will make our programs accessible to people who have not yet heard of GASP. Using the GAP format make it easier to apply other parts of the GAP package before and/or after applying our programs. Another advantage is that details of design of language, including aspects such as flow of control, are left to others, and we are freed to concentrate on aspects which are more central to our interests.
Fixing the format eases the use of Unix pipes, and information can also be exchanged using files.
We have tried to design the formats so that they are easily readable by humans. In particular, our format is in readable ASCII characters. Human readability makes it much easier to check on correctness of input and output. I/O usually takes a very small proportion of the time of a computational group theory package. If fast I/O is required for some purpose, this will presumably be in binary and subject to a very different set of criteria from the ones we have adopted in this document.
A formal description of the grammar has been written by Paul Sanders.
For those completely unfamiliar with GAP, we have included a brief introduction to those parts of GAP syntax which are relevant to our file format in Appendix A, at the end of the document. Careful study of the examples below might also be sufficient. GAP is free and available from a number of sites via anonymous ftp. Further information is included in Appendix B.
A small penalty we have to pay for using GAP rather than our very own format is that the GAP format may be a little more difficult for humans to read than our own format would be. However this is a small price to pay compared with the advantages. Some of us may develop our own formats which are easier for humans to read, and filters to and from GAP. However the main line will be the GAP format.
Those who have already developed their own formats for finite state automata, are encouraged to write filters to and from GAP.
It must be stressed that a file format is not the same thing as a data structure in a particular language. It is perfectly possible for a program to read in a "dense" file format and then decide to use a sparse data structure, or vice versa.
This file format is intended to be extensible. If our file format is used by others, they are likely to be interested in different applications, and different fields will be necessary. Programmers using this file format should bear this possibility in mind, particularly when writing parsers. The program you wrote yesterday should not crash when the file format is extended tomorrow. Your program's response to an unknown field should be to issue a warning, and to continue to do its computation as well as it can.
Future changes to the file format should take into account the need for compatibility. As far as possible, data files which run under one format should continue to run under future formats. We should not make changes lightly.
Our philosophy is to keep our file format as simple as possible within these constraints. Our design is intended to deal with current applications and with applications we intend to develop in the near future. The value of an agreed file format is precisely that the number of possibilities is limited. Catering for all possible future extensions is counter-productive, in that time is wasted on ideas which will never be implemented; we also believe that looking forward too far will result in an unnecessarily complicated format. The shape of a file format needs to be guided by experience.
We start with two important generalities. As in GAP, if the symbol '#' occurs, not within a quoted string, then that symbol and and all symbols up to and including the next newline should be ignored by input routines. (This is the method used for inserting comments within files.)
If the backslash symbol '\' occurs followed immediately by a newline, then both are ignored. (This is the method used to break lines containing very long constructs.)
The naming convention used here is to use mixed case -- the first mnemonic component in the name of a field name is in lower case, and each of the subsequent mnemonic components begins with a capital letter.
The alphabet field gives the labels of the transitions between states and the state field gives the states of the automaton.
(Other flags which occur in files should probably be ignored by programs, with the printing of a warning message. This allows individual programmers to use their own local flags without having to introduce them officially into the format.)
"DFA"
(Note that it is meaningless, though grammatically correct, for the list of accept states to include an integer not corresponding to any state. Programs are entitled to assume that their input is not only grammatical, but also meaningful. Correspondingly greater care should be taken that output from programs is always meaningful. Programs which receive meaningless input are entitled to give meaningless output---it may take too long to check for meaningfulness. Programs designed to accept human input should check for more than grammatical correctness.) table : record
The transitions themselves are specified in the field named "transitions". The 'format' field must be the first field of the table record - other fields not in the list of required fields may occur anywhere else. In the case of a sparse format, an optional field named 'defaultTarget' may be specified ( provided that it occurs after the format field and before the 'transitions' field ). This is explained below. If present, it should be after the 'format' field and before the 'transitions' field.
Let n be the number of states and let m be the size of the alphabet. In all cases, the transition field is a list with n entries, the i-th entry representing all transitions from state with the number i. However, the interpretation of the contents of this i-th entry varies according to the format of the transition table, and is given in the following.
"sparse"
Note that a "sparse" table may describe a DFA or an NFA, depending on whether there is at most one transition labeled by a given x, 1 <= x <= m. (In addition, to be deterministic, there must be a unique initial state, and no epsilon-transitions.)
If the optional 'defaultTarget' field is defined, then its value should be an integer t satisfying 1 <= t <= n, corresponding to some element of the state set. The meaning is that for any letter number x of the alphabet and any state number i, if no transition is specified in the table with source i and label x, then there is such an transition with target t.
The 'type' string should carry an implicit algorithm which has as input a positive integer corresponding to an element of the set and as output a humanly readable string corresponding to that element. This algorithm should be recorded in the documentation for the particular finite set record format and not in the record itself.
In the "identifiers" type, each name must be either a GAP
identifier (alphanumeric string starting with a letter or underscore),
or a GAP identifier followed by a '.' and a positive
integer
In the "strings" type, each name must be a GAP string,
which is essentially any string of characters enclosed in
double-quotes.
Identifiers, words and strings all print as their GAP
input representation.
The 'format' field must be set to one of the two strings "dense"
and "sparse". In the dense format, the 'names' field is simply a list
of length equal to the size of the set, of which the x-th entry is the
name of the element of the set with the number x.
In the sparse format, the 'names' field is a list of entries of the
form [x,z], meaning that the name of the element of the set with the
number x is z. (Since all elements of the set have a unique name,
this list will again have length equal to the size of the set.)
Note that in the previous two paragraphs, the words 'dense'
and 'sparse' do not convey the true intentions of designers of
this file format. The motivation for having these two
possibilities is that the first is more compact and easy to
understand for a small alphabet, whereas when the alphabet is
huge, the second will help the
human to get right the correspondence between the ordinal number x
of the letter in the alphabet and the name z. The motivation
for the choice of word 'dense' and 'sparse' is to make
the usage consistent with other parts of this document, and
therefore easier to remember.
In the "words" type, each name must be a GAP word.
A GAP word is either equal to 'IdWord' (representing the empty word)
or to an expression of the form
The 'labels' field contains a finite set record representing the set of labels.
The 'format' field contains a GAP string, either "dense" or "sparse", and describes the format of the 'setToLabels' field.
The 'setToLabels' field describes the labeling in terms of a (partial) function. Let m be the cardinality of the finite set itself (i.e. the value of its 'size' field), and p be the cardinality of the 'labels' set in the following:
In the "dense" format, the 'setToLabels' field takes the form of a (partial) list with m entries, each entry z satisfying 0 <= z <= p. The i-th entry in the list specifies the number of the label assigned to the i-th element of the finite set. It may be blank or 0, indicating no label.
In the "sparse" format, the 'setToLabels' field takes the form of a list of two-element lists [i,z], with 1 <= i <= m and 1 <= z <= p. Each entry signifies that element number z of the 'labels' set labels element number i of the finite set. It is meaningless to label the same element of the base alphabet two different ways.
Elements of a labeled set print as X-Y where X is the number of the element of the set being printed, and Y is the printed representation of the label, or the empty string if that element of the set is unlabeled.
Example:
Three fields beyond 'size' and 'type' are required: 'arity', 'padding' and 'base'. 'arity' and 'padding' can occur in either order, but they must precede 'base'.
The positive integer n must be specified in the 'arity' field of the record.
The padding symbol must be specified in the 'padding' field of the record, and it should be an identifier or string - in the case of identifier, a single underscore `_' is recommended.
The base set must be specified in the 'base' field of the record, and it should itself be a finite set record. The base set should not include the padding symbol. The base set has its natural order, and the padding symbol is normally ordered after the elements of the base set.
The algorithm to convert from the natural number representing the state to its printing form is to number the n-tuples lexicographically, and then use their list representations.
Example :
fsa_2 := rec (
#Note that the language of fsa_1 is identical to the language of fsa_2
#(i.e. all strings starting with the second symbol), and
#fsa_2 is a minimized form of fsa_1.
#The following two examples are nondeterministic automata accepting the
#same language. They are identical, but different formats are used for
#their transition tables.
fsa_3 := rec(
fsa_4 := rec(
fsa_5 := rec (
# This is the word-acceptor for the free group on two generators a,b
# A and B are the inverses of a and b respectively.
# The accepted language consists of all strings that do not have
# aA, Aa, bB or Bb as a substring.
fsa_6 := rec (
# This is the word-difference machine arising in the automatic structure
# of the free group on two generators.
# It is included here to illustrate the product set record type.
Some of the orderings may require a further field to complete the definition of the orderings on strings, and this should come here. For example, the "wtshortlex" ordering requires a field 'weight', which is a list of integers, specifying the weights of the generators. The "wreathprod" ordering requires a 'level' field, which specifies the levels of the generators.
2.
#monoid presentation of F(2,7) - should produce a monoid of length 30
#which is the same as the group, together with the empty word.
rws_2 := rec(
3.
# Note that the generator _H, which stand for a subgroup, has no inverse in
GAP identifiers are one of the basic building blocks employed by our
format. Their definition is almost exactly that of identifiers in any
other language - i.e. a string of alphanumeric symbols or underscores
starting with a letter or underscore.
Note that, in the format description above, we have used the concept
"identifier" in a slightly wider sense, to include a GAP identifier
having a suffix of a natural number preceded by a decimal point ('.')
For example: gp11_b.12 ). Such expressions are used in GAP to denote
generators of algebraic objects (and actually represent field values of
a record).
An identifier may not be a reserved word (as usual).
GAP has integer values, which are represented exactly the way you would
expect.
GAP has ASCII strings as a basic component, delimited by double-quotes.
GAP allows one to write expressions in a group using identifiers to
denote group elements and denoting the group's multiplication by '*'
and powers by '^'. The special identifier 'IdWord' stands for the
identity element. Inverses may be represented by raising an element to
the -1 power. Such group expressions are referred to as 'words' in
this document.
The most basic aggregate data type in GAP is the list. Lists are
delimited by square brackets ('[' and ']'), and within elements of
the lists are comma separated. GAP lists may be heterogeneous,
containing any valid GAP values as elements. GAP lists may be partial
in the sense that they may omit values for some elements.
Finally, GAP allows data to be aggregated with the individual pieces
named by GAP identifiers rather than merely being distinguished
positionally, as in a list. The vehicle for this is the GAP record.
Syntactically, this begins with the GAP reserved word 'rec' followed by
a comma separated sequence of field assignments in round brackets ('('
and ')'). Each field assignment takes the form of a GAP identifier,
the special symbol ':=' and the GAP value to be associated to that
name, which may be any legal GAP value, including another record.
If you get GAP, we would appreciate it if you could notify us, e.g., by
sending a short e-mail message to 'gap@math.rwth-aachen.de', containing
your full name and address, so that we have a rough idea of the number of
users. We also hope that this number will be large enough to convince
various agencies that GAP is a project worthy of (financial) support. If
you publish some result that was partly obtained using GAP, we would
appreciate it if you would cite GAP, just as you would cite another paper
that you used. Again we would appreciate if you could inform us about
such a paper.
We distribute the *full source* for everything, the C code for the
kernel, the GAP code for the library, and the LaTeX code for the manual,
which has at present about 1100 pages. So it should be no problem to get
GAP, even if you have a rather uncommon system. Of course, ports to non
UNIX systems may require some work. We already have ports for IBM PC
compatibles with an Intel 80386 or 80486 under MS-DOS, Windows, or OS/2,
for the Apple Macintosh under MPW (we hope to provide a standalone port
soon), for the Atari ST under TOS, and for DEC VAX and AXP under VMS.
Note that about 4 MByte of main memory and a harddisk are required to run
GAP.
GAP 3.4 (currently at patchlevel 1) can be obtained by anonymous *ftp*
from the following servers.
'ftp.math.rwth-aachen.de':
The 'ftp' directory contains the following files. Please check first
which files you need, to avoid transferring those that you don't need.
'README': the file you are currently reading.
'gap3r4p0.zoo':
More files are in the following *subdirectories*:
'bin':
rws_3 := rec(
isRWS := true,
);
ordering := "wreathprod",
tidyint := 10, # a control parameter not explicitly defined in the format
generatorOrder := [a,b,A,B,_H,x,X],
level := [2,2,2,2,1,1,1],
inverses := [A,B,a,b,,X,x],
equations := [
[_H*a*b,x*_H], [b*a,a*b]
]
APPENDIX A - AN INTRODUCTION TO GAP SYNTAX
GAP syntax is highly regular, with only a few basic building blocks and
ways of structuring data.
APPENDIX B
(This is an extract from the GAP "README" document.)
How to get GAP
GAP is distributed *free of charge*. You can obtain it via 'ftp' and
give it away to your colleagues.
Lehrstuhl D fur Mathematik, RWTH Aachen, Germany (137.226.152.6);
directory '/pub/gap/'.
'dimacs.rutgers.edu':
DIMACS, Rutgers, New Brunswick, New Jersey (128.6.75.16);
directory '/pub/gap/'.
'math.ucla.edu':
Math. Dept., Univ. of California at Los Angeles (128.97.4.254);
directory '/pub/gap/'.
'dehn.mth.pdx.edu':
PSU Mathematics Department, Portland State Univ (131.252.40.89);
directory '/mirror/gap/'.
'pell.anu.edu.au':
School of Mathematical Sciences, Australian National Univ.
(150.203.33.4); directory '/pub/gap/'.
'ftp' to the server *closest* to you, login as user 'ftp' and give your
full e-mail address as password. GAP is in the directory 'pub/gap'.
Remember when you transmit the files to set the file transfer type to
*binary image*, otherwise you will only receive unusable garbage. Those
servers will always have the latest version of GAP available.
This file contains the *complete* distribution
of GAP version 3 release 4 current patchlevel 0.
It is a 'zoo' archive approximately 8 MByte big.
'unzoo.c':
A simple 'zoo' archive extractor, which should be
used to unpack the distribution. The 'utils'
subdirectory contains ready compiled executables
for common systems.
This directory contains *executables* for systems
that dont come with a C compiler or where another
C compiler produces a faster executable. The
'KERNELS' file tells you which executables are
here.
'split':
This directory contains the complete distribution
of GAP 3r4p0 in several 'zoo' archives. This
allows you to get only the parts that you are
really interested in. The 'SPLIT' file tells you
which archive contains what.
'utils':
This directory contains several utilities that
you may need to get or upgrade GAP, e.g., 'unzoo'
and 'patch'. The 'UTILS' file tells you which
files are here.