Definite Clause Grammars (DCGs)

Definite Clause Grammars (DCGs) are convenient ways to represent grammatical relationships for various parsing applications. They can be used for natural language work, for creating formal command and programming languages.

For example, DCG is an excellent tool for parsing and creating tagged input and output streams, such as HTML or XML. The index and table of contents in this documentation are generated by a Prolog program that uses DCG to parse the HTML, looking for headings and index entries.

See Adventure in Prolog or Programming in Prolog by Clocksin and Mellish for tutorial information on DCGs and the difference lists used to implement them.

Parsing with Difference Lists

Parsing is analyzing an input stream. Difference lists are powerful tools for parsing applications, in which the input stream is represented by difference lists.

Difference lists are pairs of lists used to represent the list of elements (tokens, words, character codes, ...) being parsed. The actual list being represented is the 'difference' between the first list and the second list. For example, [the, cat] might be represented by the two lists [the, cat, chases, the, mouse] and [chases, the, mouse]. Or more generally, [the, cat | X] and X.

In parsing applications, the first list contains the elements to be parsed. Different parsing predicates find what they are looking for at the front of that first list, and unify the second list with what's left to parse.

In the example sentence above, a predicate looking for a subject ('the cat') at the beginning of a sentence, would find it and return the rest of the sentence ('chases the mouse'). That could then be fed into another predicate that looks for a verb, finds it and returns what's left. This continues until there is nothing left.

In other words, the difference lists are chained together in the parsing application.

The full sentence would be represented by the full list and an empty list. That is, the difference is the full first list. So the difference lists for our test sentence are [the, cat, chases, the, mouse] and [].

A natural language parsing application tests to see if a sentence is grammatically correct. It does this using difference lists to chain together various grammatical catagories. Here is a simple example:

Each of the subgoals, also links its difference lists together. For example:

And the 'terminal' case is when there are one or more specfic elements to be identified at the head of the first list. For example, 'terminals' for the simple natural language parser would be the words in the allowed vocabulary:

In each case, the word is picked off the front of the first list, and the tail unified with the second list.

Experimenting with the above predicates:

Generating with Difference Lists

In the previous section, we used the grammar rules written with difference lists to test if a sentence is a grammatically correct sentence.

Difference lists can also be used to generate grammatically correct sentences from the given vocabulary. In this case, the first list doesn't have the list to be parsed, but instead has a variable, which will be unified with the generated list.

This behavior can be seen by experimenting with the test grammar above.

For example:

DCG Syntax

Definite Clause Grammar (DCG) is a Prolog preprocessor that takes DCG grammar rules, and adds linked difference lists to the goals.

DCG provides a syntax for writing more readable grammar parsing rules, without including the linked difference lists. The DCG preprocessor takes the DCG rule and translates it into pure Prolog, adding the linked difference lists.

The syntax of DCG is:

Using DCG notation, the sentence parsing rules of the previous section can be written:

They can be consulted and used exactly as the difference list version. That is, each of the rules has the two extra list arguments added to it, and the terminal cases are appropriately translated as well. Note that, to call a DCG predicate from a non-DCG predicate, you must add the two hidden list arguments.

You can see the result of the DCG preprocessor using listing/1:

Notice the generated linked difference lists.

When debugging DCG applications, the debugger will be using the translated versions of the rules, with the difference lists.

The DCG preprocessor is a built-in predicate in Prolog, which is implemented in Prolog. It is called automatically when clauses are consulted or compiled. It can also be called directly.

expand_term(DCGclause, PROLOGclause)

The predicate that translates a DCG statement into conventional Prolog is expand_term/2. It is called automatically when a clause is consulted, reconsulted, or compiled. But it can be called directly as well. The source code is also available in the sample dcgxpand.pro.

The first argument is a DCG clause, the second is the generated Prolog clause.

For example:

Applications of DCG and Difference Lists

If you consider grammar rules as being rules defining structural relationships, then difference lists can be used in any application that is either analyzing (parsing) or generating based on rules for structural relationships.

Language Translation

We can have our grammar for sentences generate a parse tree, which can be thought of as representing universal concepts of language, such as all(?) languages have the concepts of subject, verb and object, although they are expressed in different orders.

Here's the English sentences, wrapped in a module, and generating a parse tree.

Trying it:

We can write a similar module for Spanish. Note that modifiers and nouns must agree in gender in Spanish, and an extra argument in the subject and object rules enforces that restriction.

We can now write a generic translate predicate, that uses the modules for different languages.

And try it:

Bill of Materials

In Programming in Prolog Clocksin and Mellish describe the use of difference lists for generating a list of the required basic parts for building something. This is the classic bill-of-materials application, where the rules for the relationships between components can be thought of as grammar rules. The application can be expressed using DCG.

Trying it we can find the parts we need to build a bicycle:

Command Language

DCG can be used to create a command language for an application. In this example, there is a simple airline flight reservation system in Prolog, that accepts commands to list planes, book a passenger on flight, or list passengers on a flight.

Trying it:

XML

XML uses tags in an input stream to define structure. It is an excellent way to pass information between applications and across networks. The XML library uses DCG to translate between XML and Prolog terms in general. It is easy to create application-specific XML translators as well.

An XML parser/generator works on a character level, rather than a word level. Here is some DCG that recognizes tags, in angle brackets, and makes atoms out of characters that form words. Note that double quotes denotes a list of character codes.

Note also that we need two DCG rules for word/3, one for parsing, one for generating, so that the correct bindings are used in atom_codes/2.

Trying it:

Consider the flight reservation application in the previous example. It might be an embedded component in a Logic Server application designed for network use. In that case, an XML front end would have make it easy to integrate with other network components. It would have only one entry point, which would take as input an XML string, and return as output an XML string.

Tags for input include command (com), operation (op), and arguments (arg). Tags for output are a list (list) and items (item).

Here is a different version of the flight reservation application, designed for XML input and output.

Given that its awkward to type XML, here is a predicate that tests the program, using the same inputs as in the previous non-XML example.

Trying it:

DCG Terminals

In the examples above we showed DCG terminals being generated as difference lists. This is actually done in two steps. The DCG preprocessor first translates the rule to call a built-in predicate, dcg$terminal/3. It then generates the difference lists.

For example:

It is possible to override this normal DCG behavior on terminal handling. dcg$terminal/3 first looks for an optional user written override of terminal handling, by calling dcg_terminal/3. If it exists, it is used instead.

By creating a custom dcg_terminal/3, you can generate the terminals in a different way, and the two parts of the difference list can be of any structure you might like. This is because the difference list arguments are simply passed down, and are only actually used when processing terminals.

One practical application of this is for character parsing of large files. It is not practical to read a large file into memory as a Prolog list of characters, but it is possible to read it in as a list of strings representing the lines in the file.

If the difference lists input to the DCG are the list of strings representing lines, then a customized version of dcg_terminal/3 can be used to convert just the line being worked on into a list of characters.

This is the approach used by the program that parses the HTML files of this documentation, looking for tags to create index entries. It uses a colon to separate the list of characters currently being used, and the list of lines. Whenever there are not enough characters to satisfy a terminal, more are added from the next line.

The main DCG predicate is called parse_html in that application, and it is called with difference lists representing the lines in the input file.

The actual DCG parsing rules are all written exactly as they normally would.

Parsing Files

The internal dcg_terminal recognizes the case of a file parsed as a list of strings, which is to be parsed as a list of characters. That is, if you want to parse a file as a character list you don't need to create a custom dcg_terminal.

To use this format, you first read the file in as a list of strings, and store each line in the structure: line(LineNumber, String). The following code can be used to do that.

To use the list of lines in a grammer, make the call to the top predicate of the grammar using the format file(FileName, [], Lines). Here is an example from a program that reads Prolog clauses from a file.

In your parsing predicates, you can use a trick to get the current line number. You write a predicate, get_current_line/3 that you expand yourself, rather than letting the DCG do it. It makes use of the fact that the second argument to the file/3 structure is a list of characters of the line currently being parsed.

Here is an example of it in use.

Copyright ©1987-2011 Amzi! inc. All Rights Reserved. Amzi! is a registered trademark and Logic Server is a trademark of Amzi! inc.