Prolog Terms

The generic name for all forms of Prolog data is "term". The data your program works with is all terms of one form or another. The program itself is made up of terms. Prolog execution is simply the repetitive matching of patterns in these terms. This section describes the various forms of terms. They are:

Internationalization
Atoms
Strings
Variables
Numbers
Structures
Lists
Character Lists
Operators
Comments

Internationalization

Amzi! provides full support for national languages, using both multibyte (superset of ASCII) and Unicode (wide) characters. This applies to all textual elements, including atoms, strings, characters and character lists, and applies to all I/O predicates.

Internally, all characters in Amzi! are represented as Unicode (wide 16-bit) characters. You can work directly with Amzi! using Unicode files and Unicode I/O, from both the IDE and the Logic Server.

You can work with Amzi! using the multibyte character interface. All characters are translated between Unicode and their multibyte equivalent on input and output. The translation is based on the locale of the host computer.

The multibyte files, characters and I/O are supported in the IDE and through the Logic Server.

Atoms

Atoms are the fundamental building blocks of Prolog. Syntactically, they look like character strings, but internally they are represented by integer values. This is why Prolog can very quickly unify (compare) atom values.

An unquoted atom name can be composed of all character values or all graphic values (not both), as long as it doesn't begin with either an underscore ('_') or uppercase ASCII letter, ('A'-'Z'). (These beginning characters indicate a variable, not an atom. See below.) It may not include white space.

The syntax rules can be changed to allow uppercase initial letters for atoms, but still not underscores, by setting the Prolog flag, upper_case_atoms to on.

Roughly speaking for ASCII (see details below, plus full range of national language symbols):

Character values are letters and numbers plus the symbols underscore ('_') and dollar sign ('$).
Graphic values are the punctuation and math symbols (+,;-/ etc.), but not underscore ('_') or dollar sign ('$') or quoting symbols (" ' `)
White space is space, tab, etc.

By using single quotes, an atom name can be composed of any symbols. Use two single quotes to indicate an embedded single quote in the atom name.

Examples of legal and illegal atom names are:

cats            % ok
dogs            % ok
cats and dogs   % wrong, contains white space
cats-and-dogs   % wrong, contains graphic symbol
cats_and_dogs   % ok, '_' is a character in Prolog
cats$dogs       % ok, '$' is a character in Prolog
Cats            % wrong, begins with upper case ASCII (a variable)
_dogs           % wrong, begins with underscore (a variable)
==>             % ok, all graphic characters
;;$=            % wrong, '$' is not a graphic.
;;+             % ok
'Anything ;; - At all'    % ok, in single quotes
'New York'      % ok, in single quotes
'can''t'        % ok, with '' indicating ' inside atom

Symbol Set for Atoms

Character Symbols

ASCII values considered characters for atom names are:

a-z
A-Z
0-9
underscore ('_')

The dollar sign ('$') is also allowed as a character in Amzi! Prolog to maintain compatibility with old releases, but it is not ISO standard, and will not be supported in future releases.

Character values above code 127 include all those symbols that would ordinarily be considered a character, as opposed to either white space or a graphic (arrows, math symbols, etc.). They might be entered as either wide (Unicode) characters, or translated from a locale-specific multi-byte character encoding. In either case they are stored internally as Unicode characters.

Unicode character values (in hexadecimal) above the ASCII values are:

00C0 - 1FFF (many phonetic language symbols)
3040 - D7FF (Chinese, Japanese, Korean (CJK) ideographs)
E000 - FFEF (except FEFF) (Private use, additional CJK and Arabic symbols)

See the sample program babel for an example of the use of CJK symbols.

Graphic Symbols

ASCII values considered graphic characters for atom names are:

# & * + - . / : < = > ? @ \ ^ ~

The backslash ('\') is not ISO standard, but is part of the graphic set for Amzi! Prolog, and is used in bit manipulation operators. The dollar sign ('$') should be part of the graphic characters, but is not for backward compatibility reasons. This should change in a future release. When the preprocessing flag is turned on, a number sign ('#') in the first column is interpreted as a preprocessor symbol, and not a graphic character which might be part of an operator name.

Unicode graphic values (in hexadecimal) above the ASCII values are:

00A1 - 00BF (symbols for copyright, fractions, some currency, etc.)
2010 - 303F (symbols for punctuation, math, arrows, box drawing, currency, etc.)

See the sample program logic for an example of the use of Unicode math symbols.

White Space Symbols

ASCII white space is any value below 0020 (space, tab, new line, etc.).

Unicode white space (in hexadecimal) above the ASCII values is:

2000 - 200F (special spacing symbols)
FFF0 - FFFE and FFEF (reserved symbols)

Entering Wide Characters

Wide symbol values can be entered in atom names either with an editor that supports them, cut and pasting from a character map accessory, by using a locale-specific version of an operating system, or by directly entering them using escape sequences.

For example, to create an operator made out of two lighting bolt type arrows using escape sequences:

:- op(555, xfx, '\x21af\\x21af\').
main :-
   write(clouds '\x21af\\x21af\' earth).

The wideW version of the Windows IDE supports wide characters in the editor and listener. A better way to create the above program is to set the font to Lucida Sans Unicode, and then use the Windows character map accessory to select the lightning bolt (21AF) symbol and cut and paste it in the program, without the single quotes.

Escape Characters

Certain special characters may be embedded inside quoted atoms, character lists or strings by use of the escape character (backslash \) and a token. The handling of escape characters has been enhanced to be more in keeping with the emerging ISO standard, which is close to the C standard specification.

Enabling and Disabling Escape Processing

The use of a backslash (\) as an escape character can be irritating, especially in applications that use Windows path names, as it requires using double backslashes (\\) for directory separation.

There are two options for dealing with this problem. If the problem is only with Windows path names, then that can be avoided by using the forward slash (/) as a directory separator. Amzi! Prolog will accept either in a path name, so applications can run cross-platform.

The other option is to turn escape character processing off. A flag setting called string_esc allows you to enable or disable processing of escape characters. To turn it on:

?- set_prolog_flag(string_esc, on).

To turn it off:

?- set_prolog_flag(string_esc, off).

It can also be set from the .cfg file:

string_esc=[on|off]

The default setting is 'on'. When string_esc is off, then escape sequences, such as '\n' become simply '\n' rather than a newline.

Escape Codes

When escape processing is enabled, the backslash (\) is the escape character and does not become part of the string (it 'escapes' normal processing). The character(s) following the backslash then has special meaning. This allows expressing character code symbols that are not available on the keyboard input device.

Escape processing dates back to teletype days, providing a way to generate control characters for line feeds, backspaces, tabs, etc. Today the newline (\n) is probably the most common sequence to use, although use of the Prolog nl/0 predicate removes much of the need for it.

The escape character is also used to take away the meaning of quotes. For example, in a quoted atom name, if you want to include a single quote, you can use backslash quote. So to encode 'isn\'t' and 'isn''t' are both ways of encoding embedded single quotes in atom names. The same applies to backquotes for strings and double quotes.

A new use of escape sequences is encoding Unicode (wide 16-bit) characters in ASCII text, that is then loaded internally as the correct wide character. For example: \x21AF\ is a lightning bolt arrow character.

The special meaning characters that can follow a backslash are as follows:

a: alert (bell) character
b: backspace
f: formfeed
n: newline
r: carriage return
t: horizontal tab
v: vertical tab
ooo: up to three octal digits representing a character, requires closing \ as well.
xhhhh: up to four hex digits representing a character, requires closing \ as well.
\: a single backslash character

Any other character following a backslash is just the character.

Escape sequence processing can be turned off by setting the Prolog flag string_esc to off.

Using Escape Codes

When are backslashes interpreted as escapes? Anytime the Prolog term reader is invoked. This includes responses to the read/1 predicate, query terms entered at the listener prompt, query terms built using the string functions in the API, and code in a file that is either interpreted or compiled. The escape causes the Prolog reader to convert the escape sequence into the correct character(s) in the input string, atom, or character list.

When are they not considered escapes? Once the string has been read it stays converted. Some I/O predicates, such as read_string/1, do not use the Prolog term reader and process backslashes as plain backslashes.

For example:

?- read(X).
'\x63\\x63\'.
X = cc

?- read(X).
`\x63\\x63\`.
X = `cc`

?- read_string(X).
\x63\\x63\
X = `\x63\\x63\` 

?- read_string(X), string_term(X,T).
'\x63\\x63\'
X = `'\x63\\x63\'`
T = cc

Strings

A string is an alternate way of representing text. Unlike atoms, which are stored in a table and are represented in terms as integers, strings are represented as the string itself. They are useful for textual information that is for display purposes only.

Strings unify on a character-by-character basis with each other, and with atoms.

A string is denoted by text enclosed in matching backquotes (`). Strings may also have embedded formatting characters exactly like atoms (as described in the section on Escape Characters). For example:

`This is a long string used for\ndisplay purposes`

To represent the backquote (`) within a string use two backquotes (``).

Strings are primarily used to represent text which is being used for I/O, and not unification. For example, a clause representing customers might have the customer name as an atom, for fast unification, but have the customer's address information as a string just for output purposes.

Internally all strings are stored as Unicode (wide) character strings. This means the full Unicode character set can be used when reading and displaying information, and in Prolog source code.

Strings do not occupy space in the atom table and the space they occupy is automatically collected and reused by the system once the string is no longer needed. Consequently, strings can be more memory-efficient for large quantities of textual information.

There are predicates for working with strings, converting to and from other types, finding substrings, etc.

Variables

Variables in Prolog are called 'logical' variables. They are not the same as conventional program variables, which typically refer to an element in memory of a specific type. Logical variables are wild cards for pattern matching (unification) and take on values as the result of unifying with other Prolog terms.

A variable is represented by a series of letters, numbers and the underscore character. It must begin either with an uppercase character or the underscore character (unless the flag upper_case_atoms is on, in which case variables must begin with an underscore). The following are valid variable names:

Var             Var_2
_var_3          X
Leona           Ivan

Two Prolog variables with the same name represent the same variable if they are in the same clause. Otherwise they are different variables (which just happen to have the same name). That is, the scope of a variable name is the clause in which it appears.

A special case is the anonymous variable. It is represented as a single underscore (_) and is used in situations where the value the variable might take is of no interest. Typically it is a placeholder in structure arguments, for arguments of no interest.

For example, a query looking for a customer's phone number, in a clause that has lots of other information, might look like this:

?- customer('Leona', _, _, _, PHONE, _, _).
PHONE = 333-3333

Numbers

The ISO standard recognizes two types of numbers: integers and, for decimal numbers, floats.

Integers

Amzi! supports integers according to the ISO standard. Amzi! integers are 32 bits on 32-bit operating systems.

Integers can optionally be entered using hexadecimal, octal, binary or character code notation. To do this, precede the number with 0x, 0o, 0b or 0'. They are interpreted as 32-bit unsigned integers which then map into the 32-bit signed internal integer representation. (It is an error to try to express a number larger than 32 bits using 0x, 0o, or 0b formats.) For example:

?- X = 16.
X = 16 

?- X = 0o7777.
X = 4095 

?- X = 0b111111.
X = 63 

?- X = 0'a.
X = 97 

?- X = 0xff.
X = 255 

?- X = 0xabcdef.
X = 11259375 

?- X = 0b1001.
X = 9 

% the following show 32-bit integer limits

?- X = 0xffffffff.
X = -1 

?- X = 0x80000000.
X = -2147483648 

?- X = 0x7fffffff.
X = 2147483647

Integers expressed in decimal notation, if larger than an internal integer, will be promoted to the appropriate decimal type. For example:

For example:

?- X = 200000000000000000000000000000.
X = 2.000000e+029 
yes

Negative Numbers

Negative numbers are entered with a preceding minus (-) sign. Care must be taken to ensure that there is no space between the minus and the number for single numbers. This is because the minus (-) is an operator, and (-) something can ambiguously be interpreted as a structure.

This ambiguity is not a problem when minus (-) is used as an operator with two arguments. For example:

?- display(-3).
-3

?- display(- 3).
-(3)

?- display(2 - 3).
-(2,3)

?- display(2 -3).
-(2,3)
yes

Decimal Numbers

For decimal numbers, such as 3.3, Amzi! supports three options, including ISO standard floats:

single precision floating point numbers
double precision floating point numbers
infinite precision real numbers

Real numbers allow for any number of digits on either side of the decimal places. They are ideal, and very efficient, for most common and business applications, as well as being perfect for excursions into the realms of large prime numbers, calculating the digits of pi, and other mathematical games.

Floating point numbers adhere to the ISO standard, and are best suited for scientific applications. Amzi! offers both single precision and double precision floating point numbers. Single precision numbers are more efficient for applications with lots of calculations, but double precision numbers carry more digits of precision.

An explicit exponent indicator, r for reals, e for floats, can be used to force a number as either real or float. For example:

For example:

?- X = 112233445566778899.998877665544332211e.
X = 1.122334e+017 

?- X = 112233445566778899.998877665544332211r.
X = 112233445566778899.998877665544332211r 
yes

Floating point numbers are stored as single or double precision based on the setting of the Prolog flag floats, which can be set as a configuration parameter as well. The choices are single or double.

The built-in atom inf can be used in mathematical expressions to represent a floating point number larger than the largest possible representation.

When the system encounters a decimal number without an explicit exponent indicator, it stores the number as real or float based on the setting of the Prolog flag decimals, which can be set as a configuration parameter as well. The choices are real or float.

The flags decimals and floats also determine how results are stored in mixed mode mathematical expressions. For details see the section on numeric types in math.

Structures

Structures are the fundamental data types of Prolog. A structure is determined by its name (sometimes called the principal functor) and its arguments. The functor is an atom and the arguments may be any Prolog terms, including other structures. A structure is written as follows:

name(arg1, arg2, ... , argn)

There must be no space between the name and the opening parenthesis "(". The number of arguments in a structure is called the arity.

An atom is really a degenerate structure of arity 0.

The maximum arity of a structure is 4095.

Structures are used to represent data. Following are some examples of a structure whose functor is 'likes' and whose arity is 2.

likes(ella, biscuits)
likes(zeke, biscuits)
likes(Everyone, pizza)

Here are some more complex nested structures.

file(foo, date(1993, 6, 15), size(43898))
tree(pam, left( tree( doyle, left(L2), right(R2) ) ), right(R1))
sentence( nounphrase( det( the ), noun( dog )), verbphrase( verb( sleeps )) )

Structures are also the heads of Prolog clauses, and the goals of the bodies of those clauses. For example:

friends(X, Y) :- likes(X, Something), likes(Y, Something).

All Prolog really does is match up structures with each other.

Lists

Lists are used to represent ordered collections of Prolog terms. Lists are indicated by squared brackets '[' and ']'.

A list with a known number of elements can be written down, separated by commas within the brackets. The elements can be arbitrary Prolog terms, including other lists, e.g.:

[1, 2, 3]
[alpha, 4]
[ f(1), [2, a], X]

Here the first list has three elements, the numbers 1, 2 and 3. The second list has two elements. The third list also has three elements. The first is a structure of arity 1, the second a sub-list with two elements, and the third element is a variable called X.

Logically, a list can be considered to have two elements:

HEAD - the first element of the list
TAIL - a list of the remaining elements in the list

This way of viewing a list is very useful for recursive predicates that analyze lists. At each level of recursion, the HEAD can be used, and the TAIL passed down to the next level of recursion. It is important to remember that the TAIL is always another list.

This pattern is represented with a vertical bar: [HEAD | TAIL].

We can unify this pattern with a list to see how it works:

?- [HEAD|TAIL] = [a,b,c,d].
HEAD = a
TAIL = [b, c, d]

More than one element in the HEAD can be specified:

?- [FIRST, SECOND|TAIL] = [a,b,c,d].
FIRST = a
SECOND = b
TAIL = [c, d]

An empty list is represented by empty square brackets, []. It is important to note that [] is NOT a list, but an atom. However, it is what is used for the TAIL when there is no tail.

?- [W, X, Y, Z|TAIL] = [a,b,c,d].
W = a
X = b
Y = c
Z = d
TAIL = []

[] is then a useful element for recognizing that a recursive list predicate has reached the end of the list. For example, consider the following predicate that writes each element of a list on a new line:

write_list([]).   % empty list, end.
write_list([A|Z]) :-
   write(A),   % write the head
   nl,
   write_list(Z).  % recurse with the tail

Using it:

?- write_list([apple, pear, plum, cherry]).
apple
pear
plum
cherry

yes

While lists are stored more efficiently than structures, you can think of a list as a nested structure of arity two. The first argument is the head, the second argument is the same structure representing the rest of the list. The special atom, [], indicates the end of the nesting. This nature of lists can be seen using the display/1 predicate, where a dot (.) is used as the functor of the structure.

?- display( [a, b, c] ).
.(a, .(b, .(c, [])))

While the normal list notation is easier to read and write, sometimes it is useful to think of the structure notation of lists when trying to understand predicates that manipulate lists.

Character Lists

Lists whose elements are character codes are often used in Prolog, especially in parsing applications. Prolog recognizes a special syntax to make this use more convenient.

A string of characters enclosed in double quotes (") is converted into a list of character codes. Use two double quotes("") to indicate an embedded double quote character. The following examples (using member/2 from the list library) illustrate its use, and also show predicates for converting between character lists and atoms and strings:

For example:

?- X = "abc".
X = [0w0061, 0w0062, 0w0063] 
yes

?- member(0'b, "abc").
yes

?- atom_codes(abc, X).
X = [0w0061, 0w0062, 0w0063] 
yes

?- atom_codes(A, "abc").
A = abc 
yes

?- string_list(S, "abc").
S = `abc` 
yes

?- string_list(`abc`, L).
L = [0w0061, 0w0062, 0w0063] 
yes

?- [0'a, 0'b, 0'c] = "abc".
yes

?- member(0'b, "abc").
yes

?- member(0x62, "abc").
yes

?- member(98, "abc").
yes

Character Constants

Because Unicode characters are unsigned ints and are often referred to by their hexadecimal value, character constants are represented by the a two byte hexadecimal code, using 0w to indicate a wide character.

atom_codes/2, string_list/2 and the single character code notation (0'c) all use the character constants. For example:

?- atom_codes(duck, X).
X = [0w0064,0w0075,0w0063,0w006b] 
yes

?- atom_codes(X, [0w64, 0w75, 0w63, 0w6b]).
X = duck 
yes

To create an atom with Japanese characters:

?- atom_codes(X, [0wf900, 0wf901, 0wf902]).

Character constants can be used in arithmetic and will unify with integers.

Comments

Comments may appear anywhere in the source code. They are preceded by a % sign. All text following the percent up to the end of the line is considered part of the comment.

Also, although it is non-standard, Amzi! Prolog allows multi-line comments encased in C-style delimiters, /* and */.

Operators

Operators provide a way of specifying an alternate input/output syntax for structures with one or two arguments. For example, they allow:

likes(leona, water)
swims(leona)

to be written and read as

leona likes water
leona swims

There are a number of predefined operators in Prolog, such as: +, /, *, -. This is what makes it possible to write 3 + 4 rather than +(3,4).

In order to do this we have to inform Prolog via an operator declaration that a certain name may optionally be used before, after, or in between its arguments; we speak of name as being an operator. Even if name is declared to be an operator, it can still be used in the usual structure notation.

We emphasize that declaring operators only serves to alter the way structures may look on input or output. Once inside the Prolog system, all structures are kept in the same internal form.

If an operator is declared to be used between its two arguments, we say it is an infix operator. If it is to be used before its single argument then it is a prefix operator; if it is to be used after its argument it is a postfix operator. Operators may be declared to be both infix and either pre- or post- fix, in this case they are called mixed operators.

Just declaring the "fix" of an operator is not enough however since this can lead to ambiguities. For example suppose that + and - have been declared to be infix operators. Consider:

a + b - c

What is the second argument of +? It might be b, in which case the term is

'-'( '+'(a, b), c)

or it might be the whole term b - c, in which case the term is

'+'(a, '-'(b, c))

These are very different terms so which should Prolog choose?

One way to force an interpretation is to use parentheses. So if we wanted the first interpretation we would write:

(a + b) - c

If we wanted the second we should use:

a + (b - c)

exactly as in high school algebra. However we still wish to agree on a consistent interpretation in the absence of overriding parentheses.

Prolog solves this problem by requiring two extra pieces of information about each operator at the time of its declaration: precedence and associativity.

Precedence

The first piece of information required for each operator (whether pre, in or post -fix) is a number between 1 and 1200 called the precedence of the operator.

When combining different operators together, the principal functor of a term represented by a series of operators is the operator with highest precedence.

For example, suppose + is defined to have precedence 500 and * is defined to have precedence 400. Consider:

a + b * c

We start reading from the left. + has higher precedence, so it must be the principal functor of the constructed term. Therefore the term must be:

'+'(a, '*'(b, c))

This corresponds naturally to the high school algebra rule "do multiplications first".

Associativity

The other piece of information required is the operator's associativity. Not only does this specify the "fix" of the operator but it also handles the ambiguity remaining in operator usagenamely how to handle consecutive operators of the same precedence.

The associativity of an operator can be one of the following atoms:

xfx             yfx             fx              xf
xfy             yfy             fy              yf

where x and y stand for the arguments and f stands for the operator. Thus:

?f?: is an infix operator
?f: is a postfix operator
f?: is a prefix operator

The meaning of x versus y is a little more subtle. x means that the precedence of the argument (i.e., the precedence of the principal functor of the argument) must be less than the precedence of f. y means that the precedence of the corresponding argument may be less than or equal to the precedence of f.

op(Precedence, Associativity, Operator)

op/3 is used to define an operator's Precedence, and Associativity. Precedence must be bound to an integer between 0 and 1200. Associativity must be bound to one of the atoms fx, fy, xf, yf, xfx, xfy, yfx and Operator to either the atom which is to be made an operator or a list of such atoms (in which case all the atoms are given the same specified associativity/precedence).

For example:

?- op(500, yfx, +).

so now:

a + b + c

must be the same as:

(a + b) + c

Operators can have at most one infix declaration and one declaration of either pre- or post-fix in force at any time. Subsequent operator declarations supersede earlier ones. For example:

?- op(500, xfy, +).     % + is an infix operator
?- op(1200, fx, +).     % + is now both infix and prefix.
?- op(1200, xf, +).     % .. but is now infix and postfix

The final argument in op may be either a single atom or a list of atoms. In the latter case all the atoms are given the same specified associativity and precedence.

current_op(Precedence, Associativity, Operator)

current_op/3 unifies its arguments with the current operator definitions in the system. On backtracking it returns all the operator definitions that unify with the arguments.

Predefined Prolog Operators

The following Prolog operators are declared at initialization time. They can be subsequently redefined by using the op predicate (but it's not a good idea because they are used by the Prolog system).

:- (op(1200, xfx, [:-, -->])).
:- (op(1200, fx, [?-, :-])).
:- (op(1100, fx, [import, export, dynamic, multifile, discontiguous, sorted,
                 indexed])).
:- (op(1100, xfy, ';')).
:- (op(1050, xfy, ->)).
:- (op(1000, xfy, ',')).
:- (op(900, fy, [\+, not, once])).
:- (op(900, fy, [?, bug])).
:- (op(700, xfx, [=, \=, is, =.., ==, \==, =:=, ~=, =\=, <, >, =<, >=, @<,
                 @>, @=<, @>=])).
:- (op(600, xfy, :)).
:- (op(500, yfx, [+, -, /\, \/, xor])).       % moved \ here from 900 fy
:- (op(400, yfx, [rem, mod, divs, mods, divu, modu])). % new ones and mod
:- (op(400, yfx, [/, //, *, >>, <<])).
:- (op(200, xfx, **)).
:- (op(200, xfy, ^)).
:- (op(200, fy, [+, -, \])).