Copyright © 1999-2001 by the Xerox Corporation and Copyright © 2002-2007 by the Palo Alto Research Center. All rights reserved.
Packed rewriting is significant, since an ambiguous sentence gives rise to more than one structure and it is possible for the number of structures to be in the hundreds or even thousands. However, the set of structures for a given sentence is packed into a single representation making it possible for common parts to appear only once. Thanks to the fact that the structures are packed, common parts of alternative structures can often be operated on by the transfer system as a unit, the results being reflected in each of the alternatives.
The transfer system operates on a source f-structure, represented as a set of (transfer) facts, to transform it incrementally into a target structure. The operation controlled by a transfer grammar consists of a list of rules whose order is important because each rule has the potential of changing the situation that the subsequent rules will encounter. In this respect, the rules are like phonological rules of the Chomsky/Halle variety. In particular rules can prevent later rules from applying by removing material that they would otherwise have applied to (bleeding) or they can enable the application of later rules by introducing material that they need (feeding). In this respect, this is different from other systems that have been proposed for transfer that build new structures based on observation, but not modification, of existing ones. In this system, as the process continues, the initial source-language structure takes on more and more of the properties of a target-language structure. Source facts that no rule in the sequence applies to simply become part of the target structure. Notice that it follows from what we have said that the transfer process never fails. Even if not a single rules applied, the output would simply be identical to the input. (See note).
To introduce the notation for writing transfer rules, we will consider a fairly simple example. For an even simpler example, see the XLE transfer walkthrough. More advanced transfer rule constructs are discussed in a later section. (Note: The rule notation was changed in June 2004 to remove some of the irritations of the original prolog syntax. The system can still use rules in the old prolog syntax, which is documented here.)
Suppose that the we wanted to translate the English sentence Mary sleeps into the French Marie dort. We would first parse the English sentence to get an English f-structure. Since transfer rules do not operate directly on f-structures, this must first be converted to a set of transfer facts, which are then provided as input to the transfer rules. The input facts are then converted to output transfer facts by the transfer system. The output transfer facts are then converted into a (French) f-structure, which can be fed into the French generator to produce a sentence.
The first thing to consider is the nature of transfer facts, and then how to convert f-structures to transfer facts. Transfer facts typically consists of a predicate, an opening parenthesis, a comma separated list of arguments, and a closing parenthesis, e.g.
predicate(argument1, argument2, argument3)
It is also possible to have atomic facts with no arguments, e.g.
Predicates should be atomic symbols, containing no commas, periods or parentheses. For reasons that will become clear below, predicates should not begin with the characters +, -, @, *, or, %. Arguments can be either atomic or non-atomic (i.e. embedded predicates), and may begin with +, -, @, or *. Atoms should not contain whitespace or any of the following characters:
. , ; | "Should you need to include such characters in an atom, you can escape them by preceding them with a backquote (`). To include a backquote in an atom, escape it as well (i.e. ``).
The following is the corresponding list of transfer facts:PRED(var(19),sleep), The outermost structure
ANIM(var(2),+), <SUBJ>
MOOD(var(3),indicative), <TNS-ASP>
PROPER(var(4),name) <SUBJ, NTYPE>
To see how these facts correspond to the f-structure, it is first necessary to grasp the convention lying behind the use of var(n) arguments. These are to be interpreted as standing for f-structure nodes / indices. Thus the outermost node/index labeled 19 in the f-structure is represented by var(19), and the value of the SUBJ attribute labeled as 2 is represented by var(2). There are two other f-structure nodes that do not receive explicit labels in the graphical representation, namely the values of the TNS-ASP and NTYPE attributes. These are assigned the indices var(3) and var(4) in the transfer facts. (Note: Unfortunately, in practice the numeric labels assigned to the graphical f-structure representations rarely if ever correspond so directly to the numeric values assigned to the indices in the transfer facts. Thus label 19 might map onto a transfer index var(0) and label 2 might map onto an index var(1). This mismatch is the result of a similar labeling mismatch between graphical and prolog representations of f-structures. For ease of exposition, in this example we assume that 19 maps onto var(19), and so on.)
Looking at all the facts with var(19) as their first argument, we can see that most of them correspond directly to attribute value pairs in the f-structure. The fact that f-structure 2 is the value of the SUBJ attribute of 19 in the f-structure is represented as
SUBJ(var(19), var(2))
The fact that the value of the PASSIVE attribute of 19 is - is represented as
PASSIVE(var(19), -)
The fact that the valueTNS-ASP attribute of 19 is a complex structure (indexed var(3)) is represented as
TNS-ASP(var(19), var(3))
The PRED, arg and lex_id facts do not each correspond to an attribute-value pair. Instead, they decompose the single PRED-SemanticForm attribute-value pair into individual components. A semantic form consists of four elements: the predicate (e.g. sleep); a unique semform identifier (not explicitly shown in the graphical f-structure representation, but present nonetheless); an ordered sequence of thematic arguments; and a (possibly empty) sequence of non-thematic arguments. The attribute-value pair for 19
PRED sleep'<[2:Mary]>'
In the prolog f-structure file that the parser writes and which provides input to the transfer system, this would be written as
eq(attr('PRED', var(19)), semform(sleep, 3, [var(2)], []))
where the four arguments to the semform are the predicate, the semform id, the list of thematic arguments, and the (empty) list of non-thematic arguments. This gets broken down into three transfer facts,
PRED(var(19), sleep)
says that the PRED FN of 19 is sleep.
lex_id(var(19), 3)
says that the semform id of 19's semform is 3.
arg(var(19), 1, var(2))
says that the first thematic argument of 19's semform is 2. Note that this fact has three arguments, rather than the usual two, since thematic arguments are numbered due to the fact that there may be more than one for a given semform.
Two more complicated examples of semantic forms and their corresponding transfer facts follow. The f-structure of the sentence "Place the scanner on the table" contains the semantic form
PRED 'place<[9:pro], [35:scanner], [58:on]>'
In the prolog f-structure file that the parser writes, this appears as (assuming no renumbering of indices):
cf(1,eq(attr(var(0),'PRED'), semform(place,1,[var(9),var(35),var(58)],[])))
In the transfer system, this is represented as follows:
The three thematic arguments each give rise to a separate arg fact, with the numbers 1, 2 and 3 indicating which are the first, second and third argument.
The main clause of the sentence "To replace the print head, you need to perform maintenance tasks.", has one argument and one non-argument, as follows:
PRED(var(0),need),Here, nonarg is used to indicate the (first and only) non-thematic argument. Semantic forms without any thematic and/or non-thematic arguments are indicated by the absence of any arg and/or nonarg facts
Notice that the ADJUNCT of the OBJ of the sentence is a structure enclosed in curly brackets, the standard representation for sets. In the transfer system, sets are represented by the in_set predicate. For the above example, it would be like this![]()
We have seen the transfer facts that are derived from the f-structure for the sentence Mary sleeps. For reference in the following example, once again these facts are
We now consider rules for rewriting these facts. In this section we only discuss the basic constructs of transfer rules. More advanced topics are discussed later.
Let us jump in, and look at the following (somewhat contrived) transfer grammar
" PRS (1.0) " " Rule file must begin with this line in order to pick up current definitions of the transfer rule syntax" " Comments are enclosed between double quotes" "Give the transfer rule set a name:" ruleset = simple_example. "---------------------------------------------------- Rule 1: Rewrite the verbal pred sleep to dormir ----------------------------------------------------" PRED(%X, sleep), +VTYPE(%X, main) ==> PRED(%X, dormir). "---------------------------------------------------------- Rule 2: Delete the progressive attribute if present tense ----------------------------------------------------------" +TENSE(%X, pres), PROG(%X, %%) ==> 0. "------------------------------------------------------------- Rule 3: Remove the indicative mood attribute for declarative statements, and replace declarative by decl -------------------------------------------------------------" STMT-TYPE(%X,declarative), +TNS-ASP(%X,%TA), MOOD(%TA,indicative) ==> STMT-TYPE(%X,decl). "----------------------------------------------------------- Rules 4 and 5: Rewrite Mary to either Marie or Maria -----------------------------------------------------------" PRED(%X,Mary) ?=> PRED(%X, Marie). PRED(%X,Mary) ==> PRED(%X, Maria).
The first thing that a grammar must do is declare which rule syntax it is assuming. This is specified by the first non-blank line in the rule file that has the comment "PRS (1.0)", which stands for Packed Rewrite Syntax Version 1.0. If the first non-blank line does not specify which version of the synatx to use, the system will assume that the old style prolog syntax is being used.
Once the version of the syntax has been specified, the grammar must be given a name. This allows the system to have multiple sets of rules, with different grammar names, loaded at one time. The rules are named at the top of the file by the statement
ruleset =.
It is also possible to use the older form of the declaration
grammar =.
(The names fs_triples, sem_triples and kr_triples are reserved to identify system-defined rules that map f-structures, semantic representations and KR representations, respectively, to a triples format used in running regression tests. You should not use these names unless you intend to alter the behavior of test suite comparisons). Note that this statement must be terminated by a period, as must the rules that follow it.
The first rule rewrites the English semantic predicate "sleep" as the French predicate "dormir". There are a number of things to notice about this rule. The first is that it contains variables. A variable is written as an atom that starts with a percent sign (%). Thus %X, %TA and %% from the rules provided above are all variables. Variables are used to match arguments, or parts of arguments, in transfer facts. Thus the rule pattern PRED(%X,sleep matches the transfer fact PRED(var(19), sleep), setting the variable %X to the value var(19). All other occurrences of the variable %X in the same rule will get instantiated to the value var(19) as a result of the match. (The scope of a variable is limited to a single rule: Occurrences of the same variable in different rules are not linked, and instantiating a variable in one rule will not affect any of the variables with the same name in other rules.)
The rule itself consists of a lefthand side, a rewrite arrow, and a righthand side. The lefthand side is a comma separated sequence of patterns that are intended to be matched against individual transfer facts. The rewrite arrow can be either ==> (obligatory rewrite) or ?=> (optional rewrite). For an obligatory rule, if all the patterns on the lefthand side can be matched against facts, then the (instantiated) patterns on the righthand side must be added to the set of transfer facts. For an optional rule, there is a choice: either apply the rule or not. This choice has the effect of forking the transfer rewriting process along two separate and independent paths. We will only consider the effects of optional rules when we discuss rules 3 and 4 from the example above.
To see how rule 1 operates, let us match the first pattern on the left hand side with the transfer fact PRED(var(19), sleep). This has the effect of instantiating the rest of the rule as shown (with the already matched pattern shown italicized)
PRED(%X, sleep), +VTYPE(%X, main) ==> PRED(%X,dormir)
PRED(var(19), sleep)
to give
PRED(var(19), sleep), +VTYPE(var(19), main) ==> PRED(var(19), dormir)
Note how all occurrences of %X are instantiated to var(19).
The second pattern on the lefthand side of rule 1 is now an exact match with the input fact VTYPE(var(19), main). The significance of the + sign preceding the pattern is as follows. Normally, when a fact matches a pattern on the lefthand side of a rule, that fact is "consumed" in the sense that it is removed from the set of transfer facts. (Or rather, it is removed once all the other patterns on the lefthand side have been successfully matched and the rule is applied.) Thus by matching the fact PRED(var(19), sleep) against the pattern in the rule, this facts is taken out of the input set of transfer facts. The + sign preceding a pattern means match against a fact, but don't consume it. That is, retain it in the set of transfer facts. So, after matching the second pattern, we have a complete match of the lefthand side, and know that application of the rule will remove PRED(var(19), sleep) but keep VTYPE(var(19), main) in the set of facts.
The righthand side of the matched rule now serves as an instruction to add a new transfer fact to the set of facts, PRED(var(19), dormir). Thus, if we look at the input facts affected by this rule
PRED(var(19), sleep), VTYPE(var(19), main)
we can see that after applying the rule the relevant output facts are
PRED(var(19), dormir), VTYPE(var(19), main)
That is, we have removed PRED(var(19), sleep) and replaced it by PRED(var(19),dormir).
The modified set of transfer facts obtained from running Rule 1 serves as input to Rule 2 (repeated below for convenience):
+TENSE(%X, pres), PROG(%X, %%) ==> 0.
The first pattern matches the fact TENSE(var(3), pres). The second pattern, now instantiated to PROG(var(3), %%), matches the fact PROG(var(3), -). The %% is an anonymous variable. This matches in the same way as an ordinary variable, but does not lead to any instantiation of the variable. Thus multiple occurrences of %% in a rule can match with different items. The effect of the pattern in this rule is to find the progressive attribute for var(3) without caring about the value of the attribute represented as %%.
The use of anonymous variables to represent "don't care" values is strongly encouraged. When the rule compiler finds an instance of a normal variable that has just a single occurrence in a rule, it issues a warning message naming the offending singleton variable. If you are consistent about the use of anonymous variables to represent "don't care" values, these singleton variable warning messages are a useful way of detecting possible typos in your rules. A very common mistake is mistyping variable names (e.g. SUBJ and Subj) when they are both intended to refer to the same item. Such typos usually result in at least one of the variables being a singleton, and the warning message will alert you to this. However, if you consistently use singleton non-anonymous variables for "don't care" values, messages that might alert you to the presence or typos will get drowned out in a flood of innocuous warnings. A variable name beginning with a double percent (e.g. %%temp) is a non-anonymous singleton variable. The rule compiler will not complain about singleton occurrences of such non-anonymous singletons. But multiple occurrences of these variables within a rule are treated as linked.
The righthand side of the second rule, 0, is how we say that no new facts are to be added as the result of applying the rule. Thus, applying the rule means that we match and keep
but match and discardTENSE(var(3),pres)
PROG(var(3), -)
In other words, the rule removes the single fact PROG(var(3), -) from the list of transfer facts, and passes the updated set of transfer facts on as input to the next rule.
Rule 3 illustrates the use of an intermediate variable, %TA, to link structures.
STMT-TYPE(%X, declarative), +TNS-ASP(%X, %TA), MOOD(%TA, indicative) ==> STMT-TYPE(%X, decl).
The rule is meant to apply to declarative statements with indicative mood, as determined by checking whether %X has an a STMT-TYPE attribute with the value 'declarative' and a TNS-ASP attribute that in turn has a MOOD attribute with the value 'indicative'. The variable %TA is used to link the mood to the statement type via the intermediate TNS-ASP structure. The effect of the rule is two-fold: it removes the MOOD attribute if it is indicative from the TNS-ASP of any declarative statement (while keeping the rest of TNS-ASP in place thanks to the + preceding the pattern), and it reformats "declarative" as "decl".
This is one of those cases where misspelling %TA would be a problem. If we had accidentally broken the link between variables by writing
STMT-TYPE(%X, declarative), +TNS-ASP(%X, %TA), MOOD(%T_A, indicative) ==> STMT-TYPE(%X, decl)
the rule would depart from the original intention of affecting only declarative statements with indicative mood. Instead, it would remove all indicative mood attributes, regardless of whether they belong to declarative statements, so long as there is a declarative statement type somewhere in the sentence (which would be rewritten as 'decl'). However, this version of the rule would cause a warning about %TA and %T_A being singleton variables, which would alert us to the link between variables being broken by a misspelling.
Rules 4 and 5 operate as a pair, stating that "Mary" can translate as either "Marie" or "Maria":
PRED(%X,Mary) ?=> PRED(%X, Marie). PRED(%X,Mary) ==> PRED(%X, Maria).
The first rule is optional, and says that we have a choice of replacing "Mary" by "Marie" (when we apply the rule), or leaving "Mary" as it is (when we don't apply the rule). The second rule is obligatory, and says that we must replace "Mary" by "Maria". However, the second rule will only match when we opt not to apply the first, optional, rule. This is because applying the first rule will remove the fact PRED(var(2),Mary) from the list of facts passed on as input to the next rule. And with this fact removed, the second rule no longer matches anything. But if we do not apply the first rule, then the fact PRED(var(2),Mary) remains in the list of facts, and the next rule obligatorily rewrites "Mary" to "Maria".
If we wanted to have three options for translating "Mary": "Marie", "Maria" or leave it as "Mary", then we would use the following
PRED(%X, Mary) ?=> PRED(%X, Marie). PRED(%X, Mary) ?=> PRED(%X, Maria).
The effect of making the second rule optional is to further split the possibility space. The first rule splits the possibilities in two: either leave "Mary" in place or replace it with "Marie". Under the first option, where "Mary" is left in place, the second rule provides a second split: either continue to leave "Mary" in place or replace it with "Maria". This gives three possibilities in total.
Rule 3 | Rule 4 | |
Mary | --> Marie | |
Mary | --> Mary | --> Maria |
--> Mary |
(Note: You don't need to understand this section on contexted rewriting in order to write transfer rules. But it will probably help, and it will also assist you in interpreting the output of the transfer system, especially when it is being run in debug mode.)
Optional rules, as we have just seen, have the effect of splitting up the space of transfer alternatives. The transfer system is able to handle these splits efficiently (so that the alternatives do not multiply out) by making use of contexted rewriting. Up to now, it has deliberately not been mentioned that transfer facts are contexted. This is because in our simple example, all the input facts existed in a single, true context. However, with the application of optional rules, we now have some facts that only exist in the context where the optional rule is applied, some facts that only exist in the context where the rule is not applied, and other facts that exist in either context. At the start of applying rule 4, we have the fact PRED(var(2),Mary) existing in the single true context, alongside all the other input facts. We can represent this as follows (where 1 stands for the true context):
cf(1, PRED(var(2),Mary)) cf(1, GEND(var(2), fem)) ....
After matching this fact against
the lefthand side of rule 3, the single true
context gets split into two disjoint sub-contexts (call them A1
and A2). In context A1 the rule is applied, and the fact PRED(var(2),Mary)
is replaced by PRED(var(2),Marie). In context A2 the
rule is not applied, so we keep PRED(var(2),Mary) but do not
add PRED(var(2),Marie). However, all the other facts
are unaffected, and continue on in the true context, which covers both
A1 and A2. The transfer system represents this situation in two parts,
as follows. First, it records that the optional rule has affected the
space of possible choices, so that the single unitary choice has now
been split into two
disjoint parts. This corresponds to an equivalence
1 <-> xor(A1, A2)which says that the true context is logically equivalent to the disjunction of the two pairwise disjoint subcontexts, A1 and A2. Secondly, the system records which facts hold under which contexts, thus
cf(A1, PRED(var(2),Marie))These contexted facts provide input to the next rule. Let us assume that this is the optional version of the rule that rewrites "Mary" to "Maria". We can see that the lefthand side of the rule only matches with a fact that holds in the A2 context. This means that under context A2, and A2 only, we have the option of applying the rule or not. This splits context A2 into two disjoint parts (call them B1 and B2). In context B1 the second rule is applied, and "Mary" is replaced by "Maria". In context B2 the rule is not applied, and "Mary" remains in place. The resulting choice space and set of contexted facts is now
cf(A2, PRED(var(2),Mary))
cf(1, GEND(var(2), fem))
1 <-> xor(A1, A2)
A2 <-> xor(B1, B2)
cf(A1, PRED(var(2),Marie))
cf(B1, PRED(var(2),Maria))
cf(B2, PRED(var(2),Mary))
cf(1, GEND(var(2), fem))
Hopefully, this brief description of contexted rewriting will help in understanding of the effects of optional rules. However, optional rules do not conceptually depend on having contexted rewriting. You can also think about them in uncontexted terms. Thus whenever you match an optional rule, the output facts are split into two separate and complete copies, one where the rule is applied and the other where it is not. Independent, non-interacting rewriting processes apply to each copy and continue in parallel, perhaps splitting further as other optional rules are encountered. Thus all transfer rules should be explicable in terms of what they do when applied to a single, uncontexted set of facts. Put another way, whether the inputs to your rules are going to be contexted or uncontexted should make no difference to the way that you write these rules.
This last point is important when thinking about how to write rules that can manipulate packed f-structure input. It says that you should not be thinking about this in the first place! Just write the rules so that they apply correctly to any single, unambiguous f-structure that can be unpacked from the f-structure chart. The transfer system implementation in terms of contexted rewriting will then take care that these rules do the right thing when applied to the packed chart. This also means that it is safe (and a lot easier!) to develop and debug rules by applying them only to single selected (i.e., unpacked) f-structures.
There are two devices for notational abbreviation that can make transfer rules clearer and easier to write: templates and macros. Templates are parameterized abbreviations for entire rules, or even sequences of rules. Macros are parameterized abbreviations for commonly occurring sequences of patterns.
To illustrate the use of templates, consider the following (simplified) rule for translating a pair of nominal preds:
PRED(%X, man), +NTYPE(%X, %%) ==> PRED(%X,homme).
The above transfer rule checks that the "man" PRED has an NTYPE attribute, so that it doesn't accidentally apply to instances of the verb "to man" (e.g., Man the barricades!). It then replaces "man" with "homme". This is a commonly occurring pattern; since many English nouns are homonymous with verbs, it is necessary to do an NTYPE check when translating them. However, it would be tedious and error prone to write different copies of the above rule over and over again for different noun-noun pairs. So instead, we can define a template as follows:
noun_noun(%English, %French) :: PRED(%X, %English), +NTYPE(%X, %%) ==> PRED(%X, %French).
We can then write a series of highly abbreviated rules by calling this template with different values for the %English and %French parameters:
@noun_noun(man, homme). @noun_noun(woman, femme). @noun_noun(girl, fille). ....
When the templates are called during rule compilation, these rules will expand out to
PRED(%X, man), +NTYPE(%X,%%) ==> PRED(%X, homme). PRED(%X, woman), +NTYPE(%X,%%) ==> PRED(%X, femme). PRED(%X, girl), +NTYPE(%X,%%) ==> PRED(%X, fille).
A significant advantage of using templates is that if you realize that you need to alter the way you handle noun-noun translation, it is only necessary to edit a single template definition, and not multiple instances of the rules it generates.
Templates can also define sequences of rules. We could, for example, define a template called 'intrans_refl' for verbs in English that translate into French as ordinary transitives when they occur with a direct object in English but as reflexives when they occur without an object.
intrans_refl(%English, %French) ::
PRED(%X, %English), +OBJ(%X, %%) ==> PRED(%X, %French);
PRED(%X, %English) ==> REFLEXIVE(%X, +), PRED(%X, %French).
An example template call would be
PRED(%X, stop), +OBJ(%X, %%) ==> PRED(%X, arrêter).
PRED(%X, stop) ==> REFLEXIVE(%X, +), PRED(%X, arrêter).
The first rule will consume any instances of "stop" where an object is present, and replace it by "arrêter". Since the fact PRED(%X, stop) is consumed by applying the first rule, the second rule cannot apply if the first rule matches. Thus only if the verb "stop" occurs without an object will the second rule replace it by the reflexive version of "arrêter". Note how the arguments to the template, %English and %French, are shared between the rule instances, unlike the variable %X, which is separately scoped within the individual rules.
The definition of a template must occur in the grammar somewhere preceding its first use, or call. The definition takes the following form (the final period being of crucial importance):
template :: Rules.
If the definition involves a sequence of more than one rule, the rules are separated by semicolons. Calls to templates can (optionally) be marked by preceding the template name with an at-sign (@).
Macros provide a shorthand for sets of patterns (or patterns and other macro calls) rather than sequences of rules. The following are examples of a macro definitions:
pronoun(%X, %Person, %Number, %Case) := PRED(%X, pro), PERS(%X, %Person), NUMBER(%X, %Number), CASE(%X, %Case). verb_subj(%X, %Verb, %Subject) := PRED(%X, %Verb), SUBJ(%X, %Subject). verb_subj_obj(%X, %Verb, %Subject, %Object) := verb_subj(%X, %Subject), OBJ(%X, %Object).
Using these definitions, we might write the following (not quite correct) rules to enable us to translate the intransitive verb "I know" in English as a transitive verb with a third person singular dummy object in French--i.e., "Je le sais".
@verb_subj_obj(%X, know, %Subj, %Obj) ==>
@verb_subj_obj(%X, savoir, %Subj,%Obj).
@verb_subj(%X, know, %Subj) ==>
@verb_subj_obj(%X, savoir, %Subj, %Obj), @pronoun(%Obj, 3, sing, acc).
The first rule picks up any transitive uses of the verb "know" and translates it as "savoir", preserving both the subject and the object. Given the ordering of the rules, the second rule will only fire if there is a use of "know" without an object (since the first rule will have consumed all occurrences of "know" with an object). In this case, we create a transitive version of "savoir", with a third person singular pronoun as an object. The second rule expands out to the following
PRED(%X, know), SUBJ(%X, %Subj) ==>As we will see directly below, the above rule for creating a transitive version of "savoir" is not quite complete. But to see why, we must look at how transfer facts get converted back to f-structures.
PRED(%X, savoir), SUBJ(%X, %Subj), OBJ(%X, %Obj),
PRED(%Obj,pro), PERS(%Obj, 3), NUMBER(%Obj, sing), CASE(%Obj, acc).
When converting f-structures to transfer facts, nearly all f-structure attribute-value pairs correspond to binary f-structure facts, e.g.
19:[SUBJ 2:[...]]
SUBJ(var(19), var(2))
The same is true when converting transfer facts back into f-structures: most binary facts correspond directly to attribute-value pairs. The exception, in both directions, comes up in the case of semantic forms. As noted before, the semantic form values of the PRED attribute are decomposed into PRED, sf_id, arg and non_arg facts. Thus when converting back to f-structures, these facts must be re-assembled to construct a PRED-SemanticForm attribute-value pair. In most cases, this is just a simple inverse of the conversion from f-structures to transfer facts. But sometimes transfer rules alter the basic argument structure of semantic forms, and in these cases the conversion back can be more involved. We have just seen one example of how a transfer rule can alter argument structure, when adding a pronominal object in translating intransitive "know" into transitive "savior". Arguments can also be removed, with passivization being a common case.
The most common mistake in rules adding a new argument to a semantic form is to neglect the arg and nonarg transfer facts. The rule we previously gave for converting the English intransitive verb know into the French transitive verb savoir commits this error (repeated below for convenience):
PRED(%X, know), SUBJ(%X, %Subj) ==> PRED(%X, savoir), SUBJ(%X, %Subj), OBJ(%X, %Obj), PRED(%Obj, pro), PERS(%Obj, 3), NUMBER(%Obj, sing), CASE(%Obj, acc).
The rule creates a new object for the verb, but it does not include the new object in the list of the verb's thematic arguments. Grammar writers may have an implicit obliqueness hierarchy in mind, so that e.g. objects always correspond to the second thematic argument in semantic forms. But no such hierarchy is hard-wired into the conversion from transfer facts back to f-structures; it must therefore be made explicit. Thus the correct rule should be
PRED(%X, know), SUBJ(%X, %Subj) ==> PRED(%X, savoir), SUBJ(%X, %Subj), OBJ(%X, %Obj), PRED(%Obj, pro), PERS(%Obj, 3), NUMBER(%Obj, sing), CASE(%Obj, acc), arg(%X, 2, %Obj).
where the new object has been explicitly included as the second thematic argument by adding the last line arg(%X, 2, %Obj). It is not necessary to say anything about the subject being the first thematic argument. If nothing is done to consume it, the input fact arg(%X, 1, %Subj) will be passed through transfer to preserve the required output fact. The same is true of the input lex_id fact; provided nothing consumes or alters the fact, the French semantic form will have the same semform id as the English semantic form.
If the additional arg pattern is not included in the rule's righthand side, a semantic form will still be created for the French output. However it will produce a semantic form like
PRED savoir'<[2:je]>'
where the presence of the object grammatical function is not reflected in the semantic form. Strictly speaking, there is nothing wrong with semantic forms like this except that the XLE will not be able to generate from it because the OBJ is ungoverned.
The know-savoir rule does not only introduce an additional thematic argument; it creates a whole new node and semantic form for the pronominal object. What is the name of this new f-structure node, what is the semform id of its semantic form, and what are its thematic and non-thematic argument lists? The rule does not appear to say anything about this.
First, what are the thematic and non-thematic arguments of the new f-structure node? Because the rule does not add any arg or nonarg facts for the object, these lists are taken to be empty when converting back to f-structure. To create non-empty argument lists we would have to explicitly add arg and nonarg facts.
Second, what is the name of the new-fstructure node? Note how the rule contains a variable, Obj, on the righthand side that does not occur on the lefthand side. Whenever a new variable is introduced on the righthand side of a rule, it will be instantiated to a brand new constant of the form var(n), where n is an integer that does not clash with any previously encountered f-structure node number. This instantiation, unlike the next, is performed by the transfer system when the rule is applied.
Third, what is the new semform's id? When composing PRED-SemanticForm attribute-value pairs, the transfer fact to f-structure conversion first looks for any facts of the form PRED(%X, P)and collects all the lex_id, arg and nonarg facts pertaining to %X. If there is no lex_id fact for %X, then the conversion process will create one, using a brand new numerical identifier that does not clash with any previously encountered semform ids.
Deleting an argument generally requires removing all facts pertaining to the argument. Given the way that semantic forms are recomposed, it is not strictly necessary to remove the lex_id, arg and nonarg facts that constitute the removed argument's semantic form, provided that it's pred fact is removed. However, you should take care to remove arg and nonarg facts from any higher level semantic forms that included the deleted item as an argument. This removal can be problematic if the deleted item was not the last argument in the list. This is because the numbers of all the arguments following the deleted item should be reduced by 1 to take account of the deletion. This can be very cumbersome to express with the current transfer rule formalism.
One special case has been taken care of, however. Passive verb phrases without an agentive by-phrase usually give rise to a semantic form where NULL is the first thematic argument, and the subject is the second argument. A passivization transfer rule might thus delete all facts to do with the active subject, including the fact that it was the first thematic argument to the active verb. Rather than renumbering subsequent arg facts, or including an explicit arg(%X, 1, 'NULL') fact, it is sufficient just to delete the arg fact for the active subject.
When recomposing semantic forms, whenever there is a (non)arg(X, N+1, Arg) fact, but no (non)arg(%X, N, Arg) fact, then a (non)arg(%X, N, 'NULL') fact will automatically be created. So, by just deleting an active subject, we will get the required NULL first argument in the semantic form. However, NULL values will also be given to all arguments missing from the middle of a list, which may not be what you want.
It is possible to write rules that produce transfer output that cannot be converted back to f-structures. Apart from arg and nonarg facts, only binary facts can be converted to attribute value pairs. For unary and n-ary facts (n>2), rather than have conversion fail, dummy attribute-value pairs of the form
eq(attr(null, '$unconvertible_attribute'),)
will be included in the f-structure, where
include(subfile1).The path names of the included rule files are taken relative to the location of the file that is including them. The included files should not specify their own grammar names.
STMT-TYPE(%X,declarative), +TNS-ASP(%X,%TA), -MOOD(%TA,%%) ==> STMT-TYPE(%X,decl).says that statement type "declarative" must be rewritten as statement type "decl", but only if its tense and aspect does not have a mood attribute. The minus sign prefixing a pattern makes it a negated pattern. As so often with negation, it all turns out to be much more complex than it seems at first sight, and caution is required.
1 <-> xor(A1, A2)These facts correspond to a structure having at least two clauses with declarative statement types (var(19) and var(7)). Moreover, var(19) is (a) ambiguous between being declarative or imperative, and (b) when declarative is further ambiguous between having a mood attribute or not.
A1 <-> xor(B1,B2)
cf(A1, STMT-TYPE(var(19),declarative)),
cf(A2, STMT-TYPE(var(19),imperative)),
cf(1, TNS-ASP(var(19),var(3))),
cf(B1, MOOD(var(3), indicative)),
cf(1, STMT-TYPE(var(7),declarative))
cf(1, TNS-ASP(var(7), var(8))),
cf(1, MOOD(var(8), indicative))
STMT-TYPE(var(7),declarative), +TNS-ASP(var(7),var(8)), -MOOD(var(8),%%) ==> STMT-TYPE(var(7),decl).Since the two matching facts are in context 1, this partial match takes place in context 1. The negative pattern now checks that there is no mood attribute for var(8). However, there is one in context 1, so the negative pattern only matches in the false context 0. Thus, with these instantiations the rule only matches in context and(1, 0): i.e. it fails.
STMT-TYPE(var(19),declarative), +TNS-ASP(var(19),var(3)), -MOOD(var(3),%%) ==> STMT-TYPE(var(19),decl).This partial match holds in context and(A1, 1) = A1 (since the statement type in context A2 does not match the first pattern). The negative pattern now asks us to check that there is no mood attribute for var(3). However, in context B1 there is such an attribute. Thus the negative pattern only matches in context not(B1). This means that overall the lefthand side matches, with the instantiations shown, in context and(A1, not(B1)). We can simplify the description of the matching context somewhat. Context A1 is split into two parts, B1 and B2. The only way we can be in A1 but not in B1 is if we are in B2. Thus and(A1, not(B1)) = B2 .
forall X. not( P(X) ) i.e. no X is a PMatching a negative pattern early, before its variables are instantiated by other positive patterns, is like treating the variables as being bound by a quantifier within the scope of the negation, as in the second formula. Matching a negative pattern late, where all the variables that can be instantiated by positive patterns have been instantiated, puts the variables' quantifiers outside the scope of the negation. In looking at rules involving negation, we therefore have to be clear about the intended scope of quantified variables with respect to negation. And the intended scope is that
not( forall X. P(X) ) i.e. not every X is a P, or some X is a not-P
1 <-> xor(A1, A2)
A1 <-> xor(B1,B2)
cf(B1, STMT-TYPE(var(19),declarative)),
cf(B2, STMT-TYPE(var(19),decl)),
cf(A2, STMT-TYPE(var(19),imperative)),
cf(1, TNS-ASP(var(19),var(3))),
cf(B1, MOOD(var(3), indicative)),
cf(1, STMT-TYPE(var(7),declarative))
cf(1, TNS-ASP(var(7), var(8))),
cf(1, MOOD(var(8), indicative))
verb_intrans(%X, %Verb, %Subject) :=The first macro says that an intransitive verb is one that has a subject but lacks an object, while the second says that a transitive verb is one that has both a subject and and object. Using these macros, it does not matter what order we write the know-savoir rules in:
PRED(%X, %Verb), SUBJ(%X, %Subject), -OBJ(%X,%%).
verb_trans(%X, %Verb, %Subject, %Object) :=
PRED(%X, %Verb), SUBJ(%X, %Subject), OBJ(%X,%Object).
@verb_intrans(%X, know, %Subj) ==>The first rule, for intransitive "know" will be blocked if the verb has an object, and does not rely on having consumed all transitive instances of the verb before the rule is matched.
@verb_trans(%X, savoir, %Subj, %Obj), pronoun(%Obj, 3, sing, acc).
@verb_trans(%X, know, %Subj, %Obj) ==>
@verb_trans(%X, savoir, %Subj,%Obj).
verb_subj(%X, %Verb, %Subject) :=
PRED(%X, %Verb), SUBJ(%X, %Subject).
verb_subj_obj(%X, %Verb, %Subject, %Object) :=
@verb_subj(%X, %Subject), OBJ(%X, %Object).
@verb_subj_obj(%X, know, %Subj, %Obj) ==>Of course, this way of using rule ordering and consumption of facts only works for obligatory rules. If the transitive translation were optional, then the subsequent intransitive rule would match transitive verbs to which the first rule had not been applied.
@verb_subj_obj(%X, savoir, %Subj,%Obj).
@verb_subj(%X, know, %Subj) ==>
@verb_subj_obj(%X, savoir, %Subj, %Obj),
@pronoun(%Obj, 3, sing, acc).
The macro call @complex_term(A, A1 ... Ak) is equivalent to A(A1 ... Ak), however it provides partial relief from the requirement that functors must always be written as literal atoms. A user can write a variable name as the first argument of complex_term provided it is known that that variable will have an atomic constant as its value at the time when the rule calling the macro comes to be compiled. Thus, for example, one may define the following macro
qp(%P, [%X, yes]) ==> qp(%P, [%X, +]).Here, %P ranges over predicates (such as SUBJ, PROG, PRED). The arguments to the predicate are enclosed between square brackets. In this case, we are looking only at predicates that take two arguments.
qp(SUBJ, [%X, %Y])both match exactly the same facts, the system will inspect every single input fact in the first case, but only the SUBJ facts in the second case.
SUBJ(%X, %Y)
lhs_condition, {get_new_facts(%ListofFacts)} ==> rhs_new_fact(1), $splice(%ListFacts).The special (rhs only) operation $splice([F1, ... Fn]), will splice its list of facts into the righthand side of the rule. In the example above, the procedural call muct return a list of facts enclosed between square brackets. Assuming that %List gets instantiated to [f1, f2, f3], then the righthand side resulting from the rule application will be
rhs_new_fact(1), f1, f2, f3It is also possible to use $and(f1,f2,f3) instead of $splice([f1,f2,f3]).
<macro> :=
<left-handed definition> * <right-handed definition>.
non_singleton(%Set) :=The procedural attachment is shown between braces, and calls on prolog to check that M1 is not equal to M2 (\+ is prolog's notation for negation, and it uses = for identity/unifiability). Without the negative equality check, we could pick the same member twice from a singleton set. It is important to know that procedural attachments are the last patterns to be matched in a rule, when all the variables have been instantiated. You should not rely on procedural attachments to instantiate variables.
+in_set(%M1, %Set), +in_set(%M2, %Set), {\+ %M1 = %M2}.
in_set(%Atom, %%), {\+ %Atom = var(%%)} ==> 0.This tests that the element Atom is not of the form var(...), which characterizes f-structure nodes, and then deletes the fact.
strip_trailing_underscore(%Constant_, %Constant).Strip_trailing_underscores strips off the final underscore from an atomic expression (if present). Concat_preds concatenates two atoms together with an underscore. New_constant creates a brand new constant consisting of a specified atomic Stem, and a number N chosen to make the new constant unique. In addition to these procedures, you can also make use of standard prolog procedures, the most useful of which are probably:
concat_preds(%Constant1, %Constant2, %Constant1_Constant2).
new_constant(%Stem, %StemN).
%X = %Y.Negation is prolog is expressed as \+.
member(%Item, %List).
append(%List1, %List2, %List3).
procedural_attachments = code_file.where code_file is the name of the file containing the prolog code (as with including rule files, the path name of the code file is taken relative to the location of the rule file declaring it).
:- assert(instantiation_pattern(not_equal(+, +))).where the + signals that the argument must be instantiated. If it does not matter whether an argument is instantiated or not, use - instead.
not_equal(X, Y) :- \+ X = Y.Note that prolog variables begin with uppercase letters or underscores, and that % is the prolog comment character.
Prolog operators:
Prolog predicates:
Here is a more specific, if not entirely plausible, example. Suppose
that the grammar contains the following:
( PRED(%X, chair) ?=> PRED(%X, fauteuil);
PRED(%X, chair) ?=> PRED(%X, chaise);
PRED(%X, chair) ==> PRED(%X, siège) )
&& specify_number(%X).
Pattern1, (PatternA | PatternB | -(PatternC, PatternD)), Pattern2 ==> RHS.matches when (i) Pattern1 and Pattern2 match, and (ii) either PatternA, or PatternB match, or PatternC and D do not both match. Parentheses are used to delimit the scope of operators. While conjunction (comma) should bind more tightly than disjunction (bar), it is wise to explicitly bracket complex formulas, thus:
(a, b | x, y) == ((a,b) | (x,y))Disjunction and negation are not permitted on the righthand sides of rules. To capture the effect of a disjunctive righthand side, e.g.
Pattern ==> RHS, (RhsA | RhsB | RhsC).you should write a sequence of rules:
Pattern ?=> RHS, RhsA.
Pattern ?=> RHS, RhsB.
Pattern ==> RHS, RhsC.
Pattern1, (PatternA | PatternB | PatternC), Pattern2 ==> the following union of templates
disjunctive_lhs ::The 0 on the righthand side of the disjunction rules ensures that these rules contribute nothing (other than possibly variable instantiations) to the unioned rules. (Note that the templates can be defined in either order, or unioned in either order: it is just the order of the Pattern{A|B|C} rewrites in the disjunctive_lhs template that is important.)
PatternA ?=> 0;
PatternB ?=> 0;
PatternC ==> 0.
rewrite1 ::
Pattern1, Pattern2 ==> RHS.
@rewrite1 && @disjunctive_lhs.
disjunctive_rhs ::The 0 on the lefthand side means that the disjunction rules contribute nothing to the lefthand side of the rule unions.
0 ?=> RhsA;
0 ?=> RhsB;
0 ==> RhsC.
rewrite2 ::
Pattern ==> RHS.
rewrite2 && disjunctive_rhs.
|- locative(in).where the |- notation signals the declaration of a non-resourced fact. These facts can then be invoked in the normal way by rules, e.g.
|- locative(at).
|- locative(near).
PRED(%X, %Prep), locative(%X), ... ==> ....No plus sign is required in front of the non-resourced fact (and in fact should not be included), since it will not in any case be consumed. It is important that at least one declaration of a non-resourced fact occurs before the first time any fact of that type is mentioned in a rule. Otherwise, the rule compiler will not recognize the fact as being non-resourced.
PRED(%X, %UnknownPrep), PTYPE(%X, +), ... ==> locative(%UnknownPrep).Dynamic facts are introduced with the /- operator. It is rare to find circumstances where you would want to use dynamic non-resourced facts.
|- locative(in).
|- locative(at).
|- locative(near).
:- instantiation_pattern(locative(+)).where instantiation patterns are defined in the same way as for procedural attachments.
f1, ![f2, (f3 | f4), {p1}]!, f5 ==> f6.will result in the system deciding whether to match f1, f5 or the more complex expression first. But when it decides to match the complex expression, this will first match f2, then the disjunction (f3 | f4), and then the procedural attachment.
and(%P, %Q) *=> %P, %Q.Suppose the input is and(a, and(b,c)). The first application of the rule will break this down into a plus and(b,c). With a non-recursive rule, this is how matters would remain. However, the *=> indicates that the rule is to be reapplied to its output, until eventually there is nothing left to which it can apply. Reapplication of the rule breaks the conjunction and(b,c) down into its component parts. At this point there is nothing left to which the rule can apply, and the recursion terminates. We are left with the input a, b, c.
"First, set up an empty seed for the list of conjuncts. Although the empty seed is produced multiple times, they will all be merged by the system into a single fact."With the input conjunct(a), conjunct(b), conjunct(c)these rules will produce an output of the form and([b,c,a]), where the order of the elements in the list is arbitrary. Assuming that the iterator gathers the inputs in order a, c, b (this order is arbitrary), then rule applications proceed as follows: The first application places a onto the head of the empty list, consuming the facts conjunct(a) and and([]), replacing them with and([a]). The second iteration consumes conjunct(c) and places c onto the head of the list in and([a]), consuming both to give and([c, a]). The final iteration places b onto the head of the conjunct list. Since there are no more input conjuncts left, the iteration terminates.
+conjunct(%P) ==> and([]).
"Now the recursion/iteration: The iterator before the ** collects the list of conjunct(%P) facts available before any rule applications take place. The rule following the ** is applied to each of the conjuncts in turns. As you recurse / iterate down the list of the conjuncts, the output from the last rule application provides the input to the next.
Note that [%P|%Cs] is the prolog-style notation for consing %P onto the head of list %Cs"
conjunct(%P) **
" PRS (filename) "where filename states the location of the dcg file relative to the rule file.
phi(CStrNode, FStrNode)taken directly from the prolog c-structure representations and specification of the root category in the list of f-structure properties. These facts can be manipulated in the normal way by transfer rules.
subtree(CStrNode, NodeLabel, LeftDtrNode, RightDtrNode)
terminal(CStrNode, TermLabel, SufaceFormIds)
surfaceform(SurfaceFormId, SurfaceForm, LVertex, RVertex)
:- set_transfer_option(include_cstr, 1).The options available are currently, with non-default values shown:
:- set_transfer_option(include_cstr, 1).
include c-structure facts
:- set_transfer_option(include_root_category, 1).
include specification of root category of c-structure, in form cf(1, rootcatgory(%C))
:- set_transfer_option(include_proj, 0).
include f-structure projections
:- set_transfer_option(include_eqs, 1).
include equalities from unnormalized f-structures as first order transfer facts (not recommended)
:- set_transfer_option(treat_subsumes_as_eq, 0).
when set, treats any subsumption relations in the f-structure where the nodes are mutually subsuming as equalities, and ignores any other subsumptions.
:- set_transfer_option(include_fstr_properties, 1).
include all the items from the input f-structure's properties as facts of the form cf(1, fstr_property(%P))
:- set_transfer_option(extra, [cf(1, Fact1), ... cf(1, Factn)]).
include specifed additional facts in transfer
:- set_transfer_option(conflict_resolution, 0).
turn off the conflict resolution mechanism (see below)
:- set_transfer_option(conflict_resolution_limit, fail_after(100)).
when conflict resolution mechanism is on, controls limits and behavior for detecting when the conflict is liable to be too large to resolve. The default value is ignore_after(30), which means that if more than 30 rule applications are in conflict, the conflict will be ignored. The example value fail_after(100) means that if more than 100 rule applications are in conflict, then transfer of the structure will be terminated since it is liable to time out anyway. (See below)
:- set_transfer_option(normalize, 1).
ensure that equalities in input are normalized before running transfer
:- set_transfer_option(prune_final_choice_space, 0).
clean out any parts of the final choice space that have no facts sitting under them
:- set_transfer_option(include_rule_traces, 1).
include rule trace information as pseudo f-structure facts (see below)
:- set_transfer_option(xfr_history_limit, <Integer>).
length of queue used to store history of previous transfer representations (see below)
prolog "set_transfer_option(include_cstr,1)."Another possibility is to include atoms like include_cstr or include_root_category in the list of options passed to the transfer command.
prolog "set_transfer_option(include_root_category, 0)."
ADJUNCT(%X, %Y), in_set(%Z, %Y) ==> ADJUNCT_REL(%X, %Z).Application of this rule to the input
ADJUNCT(var(1), var(2))would lead to the output
in_set(var(3), var(2))
ADJUNCT_REL(var(1), var(3))But what happens if there are two or more members of the adjunct set? According to the rule above, each set member will try to consume the ADJUNCT fact. But this fact can only be consumed once, and then it is gone. We should not resolve this conflict over resources by placing an ordering over set members (in the way that we place an ordering over transfer rules). For any ordering over facts is going to be aribtrary. In any case, the intent of the rule is to replace all adjunct set members with ADJUNCT_RELs, not just the first one that gets to a match.
cf(1, ADJUNCT(var(1), var(2)))becoming
cf(1, in_set(var(3), var(2)))
cf(1, in_set(var(4), var(2)))
cf(1, in_set(var(5), var(2)))
cf(A1, ADJUNCT_REL(var(1), var(3)))While this is the appropriate way of resolving the resource conflict for the rule as written, it probably does not reflect the intention of the rule writer, which was to have all the ADJUNCT_RELs in the same, true, context.
cf(A2, ADJUNCT_REL(var(1), var(4)))
cf(A3, ADJUNCT_REL(var(1), var(5)))
+ADJUNCT(%X, %Y), in_set(%Z, %Y) ==> ADJUNCT_REL(%X, %Z).That is, consume all the in_set facts without consuming the ADJUNCT fact, and once this is done remove the adjunct fact.
ADJUNCT(%%, %%) ==> 0.
+==>These signal the usual obligatory, optional or recursive rules, but with conflict resolution turned off for the scope of the rule.
set_transfer_option(include_rule_traces, 1).will ensure that traces are included as fstructure facts. These take the form (one for each application of each rule)
cf(C, in_set(rule_trace(RuleNum, ApplicationNum, 'LHS', 'RHS'),where RuleNum is the number of the rule, ApplicationNum counts which instance of a rule application gave rise to the trace, LHS is the instantiated lefthand side of the rule, and RHS is the instantiated righthand side of the rule.
attr(var(0), 'RULE-TRACE)))
:- index_on(Predicate, Arity, ArgNum).For example, suppose you have the following indexing declaration and rule:
:- index_on(PRON-FORM, 2, 2).The declaration says that the second argument of the 2-place PRON-FORM fact is indexed. The rule will only be picked up for possible matching if the input contains a PRON-FORM fact whose second argument is "he".
PRON-FORM(%X, he), CASE(%X,nom) ==> PRON-FORM(%X, il).
:- index_on(PRED, 2, 2).which means that rules containing PREDS will only be picked up and tried if there is an exactly matching PRED fact in the input. In a translation system, this means that rules that don't apply to words not occurring in the input sentences will not in general be processed.
Special purpose transfer facts can be used to specify xml output from the transfer system; when used in conjunction with the xml_file output mode when calling transfer, this will construct xml.
The following is a simple example of some transfer facts and the xml it specifies.
top_xml_element(id1)which produces the xml
xml_element(id1,tag1, [attr(attribute1, value1), attr(attribute2,value2)])
xml_sub_element(id1, id2)
xml_sub_element(id1, id3)
xml_element(id2, tag2, [],[xml_elem(tag4,[attr(a4,v4)])])
xml_element(id3, tag2, [attr(a3,v3)])
<tag1 attribute1="value1" attribute2="value2">
<tag2 >
<tag4 a4="v4"/>
<tag2 a3="v3"/>
Going through this example in more detail:
Some additional notes, advice, and warnings:
What happens if you attempt to construct xml from a packed/ambiguous transfer structure? The rules for constructing xml_element and xml_sub_element facts will place these facts under different parts of the choice space, in the normal way. How does this get interpreted on writing out the xml?
The xml write-out deals with choices and ambiguity by adding extra amb attributes to elements. The default (unspecified) value for this attribute is 1. If an xml-subelement applies in a different part of the choice space from its parent (note that this will always be a sub-part of the parent choice space), than an amb attribute will be added to the sub-element. If the sub element is in the same part of the choice space as the parent (even if that choice is not 1), then the amb attribute will be omitted, hence defaulting to 1.
There are two modes in which the values of the choices can be written out in xml. By default, the values will be arbitrary integers (which are in fact the names of the pointers to choice spaces inside transfer, but which have no special meaning in the xml). However, if a prolog call to
setp(print_full_xml_choices(1)).has been made, then the choices will be printed out in full, readable form and boolean combinations of choice variables. Be warned that this can be very verbose.
Parse probablities are also added to the xml if a prolog call has been made to
enable_choice_probabilities(1).This will add prob and prob_bucket attributes to every element.
The use of state facts is intended as a clearer, more flexible replacement for the transfer history mechanism described below.
The transfer_seq/Arity family of prolog calls to invoke transfer have corresponding state_transfer_seq/Arity+2 calls, whose extra two argument thread a list of state facts into and out of the call to transfer. State facts take the form
state(Term)When a list of state facts is threaded in, they are all added to the transfer facts in the true choice space. Transfer rules can then access and modify these facts, just like any other transfer facts. At the end of transfer, all state/1 transfer facts in the true choice space are collected together and provide the output state list. At present, these output state facts are also left within the transfer facts; this decision may need to be revisited.
Rules updating state facts might want to make use of the <1> operator to ensure that they are placed in the true choice.
To set up the xle to run transfer invoke xle in the standard way and
run the command
This will add extra commands to the menus in the f-structure and fs-chart windows. You also need to load a transfer grammar by means of the command load-transfer-rules (if you do not do so, the default f-structure to triples rules will be used). The following interaction is typical:
~ xle
XLE loaded from xle.
XLEPATH = /project/xle/current
Type 'help' for more information
% create-transfer
% create-parser /project/pargram/english/standard/english.lfg
% load-transfer-rules /tilde/crouch/transfer_rules/
Initializing prolog engine
Loading prolog image at /project/xle/current/bin/transfer.sav
% parse {The boy stood on the burning deck.}
The Commands menu for the f-structure window will contain
the following additional items
The Commands menu for the fschart window will also have
Transfer and Translate commands.
tdbg. |
Turn on basic tracing of transfer rules. |
tdbg(Monitor). |
Turn on tracing of specified Monitor. |
monitors. |
List available debugging monitors |
monitoring. |
List currently active debugging monitors |
no_tdbg. |
Turn off all debugging monitors. |
no_tdbg(Monitor). |
Turn off specified debugging monitor |
transfer_timing(1). |
Print timing information about transfer. transfer_timing(0) turns it off. |
transfer(In, Out,
InMode, OutMode). |
Transfer In to Out. InMode/OutMode can be fs_file, in which case In/Out is the name of a prolog f-structure file. Or it can be xfr_file, in which case In/Out is the name of a prolog file containing the transfer predicates. |
timed_transfer(I,O,IM,OM). |
Time limited version of
transfer |
set_transfer_timeout_limit(T1,T2,T3) |
Set the time limits (in
milliseconds) for time_transfer. T1 is the maximum time allowed
for transfering un-normalized f-structures, T2 the maximum time
allowed for normalizing them, and T3 the maximum time allowed
for transfer the normalized structure. |
reload_rules. |
Reload the previously loaded transfer rule file |
reload_rules(File). |
(Re)load the rules in File. |
print_compiled_transfer_rules. |
This causes the transfer rule grammar to be
listed with all templates and macros expanded and in a form that the
system could accept as input. This is
useful for verifying that templates and macros have been expanded in
the intended way. |
set_active_transfer_grammar(Id). |
It is possible to have multiple transfer
grammars loaded at the same time. To make one of these the active one,
use this command and specify the grammar name (Id) of the rule set; i.e
the identifier in the grammar = Id. declaration, and not the name of
the rule file. By default, the most recently loaded set of rules is
active |
restore_previous_transfer_grammar. |
Re-activate the previous transfer grammar
--- keeps on popping the stack |
The definitions of the "verb" rule set in the above example arise
because part of the grammar file looks somewhat like this:
The appearance of a single word—in this case "verb"—where a definition or a rule would otherwise be expected causes the next rule in the file to be added to the set with that name. If the name is preceded by a "+" sign, as in the second instance above, then not only the next rule is added to the set, but all rules starting with the last addition to the set and extending down to that point. In the above example, we suppose that the "add -> ajouter" rule is the 32nd in the file and we have seen that this is the first rule to be added to the set. Next, the rule "button -> bouton", number 34 is added, and finally, all rules between there and "can -> pouvoir", rule 38, are also added.
The utility of rule sets lies in the fact that sets with certain names are treated specially by the transfer system. In particular, the set named "active" contains the rules that will be invoked when the grammar is applied to a set of predicates. As the above example illustrates, all the rules in a grammar become members of this set when the grammar is loaded. After a grammar has been loaded, however, the active set can be redefined by typing the appropriate command to the xle shell. To remove rules 23 through 26 for example, it is necessary to redefine that active set so that it has its original contents minus these two rules. This is accomplished by the following command to the xle shell:
active | This set contains the rules that will be used when the transfer component is applied to a set of predicates. When a grammar is read into the system, it is automatically set to cover all of the rules. It is sometimes convenient to remove certain rules from the set after a grammar has been loaded or to activate only a small set of rules for testing purposes. |
compile | Giving this a value other than the empty set causes detailed information on each rule to be displayed as the rule is loaded into the system. The listing shows in detail what templates and macros are involved in building each rule and how they are expanded. |
detail | Details on the attempt to match rules in this set against the current set of predicates is displayed. |
input | Before the attempt is made to apply any rule in this set, the complete current predicate set is displayed. |
match_rule | For the rules in this set, details beyond those given for the detail set are displayed as the matching process is carried out. |
rule | When the transfer system considers a rule in this set for application to the current predicate set, it displays the rule and, if it matches, shows what predicates it matches against. Note that there can be many rules that are not considered at all because they can be eliminated early for lack of a key predicate. |
success | A message will be displayed when any rule in this set succeeds. |
names(s) is either a single name, beginning with a lower-case letter, or entirely enclosed in single quotation marks, or is a set of names in square brackets and separated by commas. These are the names to which the monitoring command will give a new value. The new value associated with a particular name may be defined, partially or completely, in terms of the current values of any names, including those being defined.
member(s) is an expression over range specifiers. There are four kinds of elementary range specifier, namely:
It is also possible to translate directly, using a translate
command similar to XLE's parse command:
translate {Ed slept.}
If transfer rules and parser and generator grammars have not
been loaded, then default ones will be used, as specified in the file
$XLEPATH/bin/trnalsate.tcl. To use non-default settings, either
edit the file or use the command create-translator with
arguments specifying the grammars, rules and gen-adds. For
example, to edit the file
The first three settings provide (full) file names giving the
locations of the parsing grammar, the generation grammar, and the
transfer grammar. The fourth argument is a string in double
quotation marks of the form "addonly word1, word2 ... wordn", where the
words are the names of attributes in the generation grammar whose
values may be left unspecified in the files that are output by the
transfer component and input by the generator.
To use non-default settings, the command create-translator can
be run with arguments specifying the grammars, rules and gen-adds. For
% parse "The printer stops."Then use the pull-down menus on the f-structure and/or fs-chart windows.
~ transfer server localhost 2458This option is probably only useful if you are masochistic enough to try setting the XLE up with a transfer server
Initializing prolog engine.
Loading prolog image at /project/xle/current/bin/transfer.sav.
Starting server on port 2458
transfer --inStem /project/nltt-2/TESTDATA/10-01-02/S --inMode fs_file --outStem /tmp/T --outMode xfr_file --from 1 --to 700 --rules /tilde/thking/
transfer --inMode fs_file --outMode xfr_file --rules /tilde/thking/ --inFiles /project/nltt-2/TESTDATA/10-01-02/S*.pl --outStem /tmp/T_ --select
% Transfer initialization file
% define an initialization procedure:
init :-
set_transfer_timeout_limit(0, 100000, 1000000),
set_transfer_option(include_cstr, 1),
set_transfer_option(no_select, 0),
set_transfer_option(include_proj, 0).
% Run the initialization procedure
:- init.
triples transfer --inMode fs_file --inStem /project/nltt-2/TESTDATA/10-01-02/G --outStem /tmp/T --from 1 --to 700 --rules /tilde/thking/
triples match --matchMode best --sourceMode fs_file --sourceStem /project/nltt-2/TESTDATA/10-01-02/G --targetMode dep_file --targetStem /project/nltt-2/new-depbank/gold1-700-files/G --from 1 --to 700 --rules /tilde/thking/triples_rules.plThis matches the f-structure with the dependency files, in each case trying to find the source analysis that best matches the target dependency. Note that in matching dependency structures, we cannot assume that the numerical indices in source and target will be identical.
% create-listenerYou should also make sure that you have a separate interactive triples process running, and run the triples command
~ triples interactiveAs the waiting message indicates, the triples listener is now just waiting to receive input from the XLE, and is expecting to find it in the file indicated (located in the user's home directory --- this means that the XLE and triples processes do not have to be running on the same machine provided that both machines have access to the user's home directory). If you parse a sentence from the XLE you can access the listener menu items. To load a set of transfer rules, you can click on the "Reload rules" button. Two things should happen. (1) If this is the first interaction between the XLE and the listener, the XLE should print out a message confirming that it has successfully established file-based communication, or a message indicating a failure to communicate through what it thinks is the correct file. By comparing the names of the files the listener is expecting to use and the files the XLE is expecting to use, one can try to diagnose any problems. (2) The listener window should display a prompt asking you to enter the file name of the rules you want to load. If you just hit return, the previously loaded file will be reloaded. Or otherwise, you can specify a new rule file to load.
Initializing prolog engine.
Loading prolog image at /project/xle/current/bin/triples.sav.
This is a prolog reader. Rules of prolog syntax apply.
Type halt. to exit; help. for information
prolog> triples.
%Waiting for XLE communication on /tilde/crouch/.transpipe1 or /tilde/crouch/.transpipe1 ...
~ transfer interactiveThe command xfr. is used to set the listener running. The transfer listener shows the output of transfer in transfer predicate notation, before it is converted back to f-structure. You can also control whether or not the input transfer predicates are displayed by means of the show_xfr_input or dont_show_xfr_input commands.
Initializing prolog engine.
Loading prolog image at /project/xle/current/bin/transfer.sav.
This is a prolog reader. Rules of prolog syntax apply.
Type halt. to exit; help. for information
prolog> xfr.
%Waiting for XLE communication on /tilde/crouch/.transpipe1 or /tilde/crouch/.transpipe1 ...
~ xleThis will load the grammars, and create then an iconified xterm window named Transfer_Server. If you open this window you should see something like the following
XLE loaded from xle.
XLEPATH = /project/xle/current.
Type 'help' for more information.
% create-translation-server-menu "/project/pargram/english/homecentre/english-hc.lfg" "/project/pargram/french/homecentre/french-hc.lfg" "/project/pargram/trans/eng_to_fre_rules" ""
Loading prolog image located at transfer.savSometimes the Transfer_Server window only blinks into existence and then dies. In this case, it is worth parsing a sentence and clicking on one of the transfer server menu items. This will often succeed in starting the server up again.
Starting transfer server on localhost 2548
Processing "nl,write(Transfer system is ready ...),nl,nl"
Transfer system is ready ...
Processing "force_load_rules(/project/pargram/trans/eng_to_fre_rules)"
active: no monitors.
active: [r(0,0)].
sentenv DYLD_LIBRARY_PATH $XLEPATH/libif the environment variable is undefined, or
setenv DYLD_LIBRARY_PATH $XLEPATH/lib:${DYLD_LIBRARY_PATH}otherwise. (There is no need to set the corresponding LD_LIBRARY_PATH variables under linux or solaris).
tcsh> limit dataThis indicates that the maximum size of the data-segment is only 6 Mb. To remove the limit, do
datasize 6144 kbytes
bash> ulimit -d
tcsh> limit datasize unlimited
datasize unlimited
bash> ulimit -d unlimited
bash> ulimit -d
Note: limit
is a shell built-in in csh
It may have a different name in other shells. This unfortunately means that you don't have access to the source code. But you can load the images into prolog and develop additional functionality based around them:
$XLEPATH/bin/transfer.sav & transfer
$XLEPATH/bin/triples.sav & extract
$XLEPATH/bin/extract.sav & triples
For this to work, you will need to be running the same version of sicstus as the image was saved under, which will usually be the latest release. You can tell which version of sicstus was used by looking at $XLEPATH/bin/sp-{Version} (e.g. $XLEPATH/bin/sp-3.11.0 means sicstus version 3.11.0). This directory contains the runtime prolog system, which (a) can be distributed without a sicstus license, and (b) is required in the same directory as the transfer, triples and extract shell commands (see sicstus release notes on distributing runtime systems).
| ?- restore('$XLEPATH/bin/transfer.sav').
transfer_seq/5, state_transfer_seq/7, transfer_seq/4, state_transfer_seq/6, transfer_seq/3, state_transfer_seq/5, transfer_seq_charlist/5, state_transfer_seq_charlist/7, transfer/4, transfer/5, transfer/2, timed_transfer/4, timed_transfer/5, transfer_facts/2, transfer_input/3, transfer_input/4, transfer_output/3, transfer_output/4, transfer_files/7,transfer_files/6,transfer_files/5,transfer_files/4, transfer_timing/1, set_transfer_timeout_limit/3, set_transfer_option/2, load_rules/1, reload_rules/0, reload_rules/1, force_load_rules/1, print_compiled_transfer_rules/1, monitor/1, monitor/2, monitor/3, monitoring/0, monitoring/1, monitoring/2, monitors/0, tdbg/0, tdbg/1, full_tdbg/0, vfull_tdbg/0, no_tdbg/0, xfr/0,xfr_help/0,main/0, show_xfr_input/0, dont_show_xfr_input/0, run_transfer_reload/0Conversion between f- and transfer structures:
xfr2fs/2,write_xfr/1, write_xfr/2,Transfer interface predicates:
write_xfr_no_trace/1, write_xfr_no_trace/2
listen/2, communication_pipes/3XLE library calls:
start_server/2, sever_loop/1
my_prolog_loop/0, my_prolog_loop/2,
print_help/0, call_string/1
xle_exec/1, xle_exec/2, silent_xle_exec/3,
xle_exit_all/0, check_xle_running/0
init_xle/2, create_parser/2, parse_sentence/4,Miscellaneous utilities:
next_graph_solution/3, free_graph_solution/1,
reset_storage/1, create_graph/2, create_generator/2,
generate_from_graph/5, print_net_as_regexp/4,
read_prolog_graph_file/3, print_prolog_graph_file/2,
make_new_choice_disjunction/3, create_disjunction/4,
get_choice/4, conjoin_clauses/5, disjoin_clauses/4,
subtract_clause/4, negate_clause/4, not_clause/3,
assert_nogood/2, assert_nogood/7,
evaluate_clause/3, evaluate_choices/2, covers_clause/3,
get_edge_solutions/2, first_dnf_solution/3, next_dnf_solution/3,
set_solution_choice_values/2, xle_true_context/1, xle_false_context/1,
use_primary_choice_space/0, use_alternate_choice_space/1,
reset_choice_space/0, reset_choice_space/1,
create_fs_choice_space/3, xle_context/3,
xle_safe_context/3, select_choice/1,
set_choice_values/2, unpack_choice_space/2,
xle_unpack_fstr/3, collect_true_facts/3,
name_internal_choices/2, name_internal_equivs/2,
ext2int_contexts/3, ext2int_contexts/5,
int2ext_contexts/5, named2ext_contexts/6,
named2int_contexts/5, int2named_contexts/5,
ext2named_contexts/6, write_fs/1, write_fs/2,
write_cf_list/1, write_cf_list/2,
write_named_context/1, write_named_context/2,
fs2graph/2, graph2fs/2
strict_member/2, strict_memberchk/2,
vartail/2, vt_append/2, vt_member/2,
list_to_vtlist/2, null_vtlist/1,
unkey/2, concat_list/2, generated_symbol/2,
time_call/1, time_call/2,
time_msg/1, time_msg_if/2, reset_time_msg/0,
setsys/2, getsys/2, setp/1,
pp_debug/0, nopp_debug/0,
pp/1, pp_underscore/1,
format_if/3, format_if/4,
get_option/3, get_optional/3, get_option_list/4,
add_dir_slash/2, file_name_concat/3,strip_file_suffix/3,
file_suffix/2, dir_and_file/3,
atom_to_num/2, assert_number_of_digits_in_file_numbers/4,
transfer(+In,-Out,+InMode,+OutMode,+Options) transfer(+In,-Out,+InMode,+OutMode) transfer_seq(+In,-Out,+InMode,+OutMode,+RuleSequence) Applies loaded transfer rules to In to produce Out. InMode and OutMode specify the format of the input and output. Options is a list specifying any further manipulations of In and/or Out to be carried out before/after transfer. transfer/4 calls transfer/5 with Options=[] InMode/OutMode can be one of : fs_file | fs | xfr_file | xfr | xle_graph where fs_file means In/Out is the name of a prolog f-structure file fs means In/Out is a prolog f-structure, i.e. fstructure(Sentence,Proprerties,Choices,Equivalences,FS,CS) xfr_file means In/Out is the name of a prolog transfer-structure file xfr means In/Out is a prolog transfer structure, i.e. xfr(Choices,Equivalences,Equalities,Facts,Doc) xle_graph means In/Out is an integer serving as an aligned pointer to an xle-internal f-structure (see XLE library calls). OutMode can additionally include: xml | xml_file RuleSequence is an atom, comprising a space separated sequence of ruleset names, specify the sequence of rulesets the input must be passed through to create the output Options are no_select: Ignore any choice selections marked on the input, and apply transfer to the whole packed input include_cstr: Include c-structure facts along with f-structure facts. include_root_category: Include the root category along with f-structure facts, in case you want to rewrite it. include_proj: Include f-structure projections. include_eqs: Include any un-normalized equalities from the input f-structure in with the transfer Facts. Normally, these equalities are only included with the transfer Equalities. By including them in with the facts, transfer rules can explicitly manipuate equalities. (This is not highly recommended, since the same equalities are still added to the Equalities, and the transfer system matches facts with reference to these equalities) extra([Fact|Facts]) Add the specified extra facts to the transfer input. The facts must take the form cf(Context, Predication) where Context is a boolean context, probably 1. Predication is a basic transfer fact. Typically, however, options are specified in the individual rulesets.
state_transfer_seq(+In,-Out,+InMode,+OutMode,+StateIn,-StateOut,+RuleSequence) Just like transfer_seq/5, except that it passes in a list of state facts (of the form: state(Fact)) and collects a list of state facts from the output of transfer transfer_state_facts(+XfrStructure, -StateFacts) Given a transfer structure, will collect together any state facts in the transfer facts.A note about transfer structures. The transfer system is intended to provide a general purpose contexted rewriting system, with a contexted f-structure to f-structure rewriting system as a special case. Therefore the input and output to transfer are transfer structures. F-structure input/output must be mapped to/from transfer structures. A transfer structure is a 5-tuple
Documentation is a list of arbitrary prolog terms, providing whatever
additional documentation is deemed necessary (cf Properties in prolog
fstructures). This includes a term, number_of_solutions(N),
which reports the number of solutions in the packed transfer
structure.Transfer output also includes a list of rule traces as part
of the documentation, where the rule traces record which rules were
applied how; this is intended to support such things as stochastic
selection of transfer output. The form of a rule trace is
ApplicationNum, LHS, RHS, MatchCtx, ApplyCtxs)
Load the specified transfer rule file. load_rules/1 will not load
anything if a transfer rule file has already been loaded.
reload_rules/1 will load, even if a rule file of the same or of a
different name has already been reloaded, overwriting any previously
loaded rules. This function also catches any exceptions (e.g.
specified file does not exist) and prints out an error message before
failing. Because it catches exceptions, the C interface to transfer
uses this procedure to load rules. force_load_rules/1 is like
reload_rules, but does not catch exceptions
Re loads the previously loaded rules file (catches exceptions)
Prints the compiled / expanded transfer rules to File
As for transfer/5 and transfer/4, except that time
out limits are imposed. There are three limits for
(i) un-normalized transfer, (ii) normalization of transfer
input, and (iii) normalized transfer. In the first
instance, transfer is run on un-normalized inputs.
If this times out, and the input was un-normalized
(i.e. contained equalities), then the input is normalized,
and transfer is run once again on the normalized input.
If the input was already normalized (i.e. no equalities),
then nothing more happens after the first timeout.
Setting a time limit of 0 for un-normalized transfer ensures
that input is automatically normalized prior to transfer
set_transfer_timeout_limit(+UnNormalizedXfr, +Normalization, +NormalizedXfr)
Set the time out limits (in CPU ms) for timed_transfer.
Default is set_transfer_timeout_limit(10000,100000,100000)
set_transfer_option(+Option, +Value)
Sets the default values for the Options argument to transfer,
to be used whenever this argument is not explicitly provided, e.g.
transfer/4. Options and values are:
no_select 1 | 0
include_cstr 1 | 0
include_proj 1 | 0
include_eqs 1 | 0
extra [List of extra facts]
Level is an integer specifying the level of detail at which
timing messages about transfer should be printed. Default is 0
(no messages), 1 is a sensible alternative.
Calls timed_transfer(In,Out,fs_file,fs_file)
Applies timed_transfer to a succession of numbered files
InStem<From>.pl to InStem<To>.pl
writing the results to
OutStem<From>.pl to OutStem<To>.pl.
If files are missing from the numbered sequence, will
print a message saying the file is missing and continue.
Applies timed_transfer to each file in InFiles. Writes result to file named
by (a) stripping any directory off InFile to get the base file name, and then
(b) prefixing the base file name with OutStem. If a file in the list does not
exist, will print a message saying the file is missing and continue.
Convert In to prolog transfer structure Xfr (with internal xle choice space).
InMode and Options are as for transfer/5
Convert prolog transfer structure Xfr to Out.
OutMode and Options are as for transfer/5
Run transfer on input prolog transfer structure to produce
output prolog transfer structure.
Turn on basic tracing of transfer rules. Equivalent to
Turn on debugging for specified transfer Monitors. Monitors can
either be a single monitor name or a list of monitor names. Available
monitors can be found using monitors/0. Equivalent to
List all documented transfer monitors. These are documented via
the (multifile) predicate user:monitor_doc(MonitorName,DocString).
List all the monitors for which debugging is currently turned on,
showing their range sets
List range sets for specified monitors
monitoring(+Monitors, +RangeSpec)
Change RangeSpec for specified monitors
monitor(+MonitorCondition, +Then)
monitor(+MonitorCondition, +Then, +Else)
Perform Then action if MonitorCondition holds, otherwise Else
Action. Then and Else are arbitrary prolog goals (typically print
statements), MonitorCondition is an Expr of the following form
Expr ::= (Expr, Expr) Conjunction
Expr ::= (Expr; Expr) Disjunction
Expr ::= MonitorName Named range set
Expr ::= r(L, H) Explicit range
Expr ::= Integer =r(Integer, Integer)
Expr ::= Prolog Arbitrary goal
Turn on debugging for [rule,match_rule,detail] (full) or
[rule,rule_input,rule_output,match_rule,detail,garbage] (vfull)
This starts up an xle listener loop (see listen/2,
communication_pipes/3), which is useful for debugging transfer
rules. This allows separate XLE and transfer
processes (running on the same machine) to communicate via files.
The tcl command install-xfr-listener issued to the XLE process
(either at the command line, or via an xlerc file) will set up
additional buttons on the fstructure and fs-chart Tk windows.
These are
fstructure window:
Transfer, Load Transfer Rules, Debug, Debug off, Break
fs-chart windwo
Clicking on these buttons will cause two files to be written,
~/.transpipe0 and ~/.transpipe1 (where ~ is the user's home
directory). (You can change the name of these communication files
in translate.tcl). The prolog listener loop polls for the
existence of these two files. When they are written, the listener
reads them, deletes them, acts on their contents, and then returns
to polling.
~/.transpipe0 normally contains the f-structure or fs-chart from
the window in which the Transfer button was pressed, or is empty.
~/.transpipe1 is the command file, which contains a prolog term
specifying the action to be performed. Possible contents of
transpipe1 are:
break. Leave the listener loop
ready. Transfer contents of transpipe0 and display results
debug. Run tdbg/0.
no_debug. Run no_tdbg/0.
reload. Run run_transfer_reload/0.
You can also break out of the listener loop (in a prolog
development system) by hitting ^C. However, in a prolog runtime,
as brought up by running "transfer interactive", ^C aborts the
whole process
This prompts the user to enter the name of a rule file to be
loaded. If the user just hits return, the previously loaded file
is reloaded.
These commands alter the behaviour of how the xfr loop displays
transfer input
Prints a help message
This is used to read in arguments from the command line when
transfer.sav is called on as part of the transfer shell
command. You probably don't ever want to use this...
Write a prolog transfer structure Xfr to a Stream or to
user_output, printing choices in external prolog format. Xfr can
have either an internal or external choice space. The
no_trace versions suppress the printing of any rule trace
information. Note that no terminating period is printed
Write a prolog f-structure FS to a Stream or to user output, printing
choices in external prolog format. FS can have either an internal
or external choice space. Note that no terminating period is printed
Converts a prolog transfer structure (internal or external
choice space) to a prolog f-structure (internal choice space).
Converts a prolog f-structure (internal or external
choice space) to a prolog transfer structure (internal choice space).
fsfacts2preds(+FS, +CS, -Preds, -Eqs, +ChoicePtr)
Given contexted f-structure and c-structure facts FS and CS, in
internal context notation under choice(ChoicePtr), will convert
them to a list of transfer facts, Preds, and extract out any
un-normalized equalities. (CS is currently ignored).
There are a number of different ways of getting the transfer component to interface with the xle:
a) Embed transfer directly within the XLEThe socket- and pipe-based interfaces are still a little fragile, and the direct embedding of transfer within XLE makes it harder to include new transfer functionality, and is of course next to impossible for prolog programmers to debug. The listener loop is handy for developing and debugging new functionality, but is probably not ideal for a final applications.
b) Have XLE start up a separate transfer server and communicate with it via sockets (the original mode of interaction)
c) Have transfer start up a separate XLE server, and communicate with it via pipes
d) Have separate transfer and XLE process communicate via files though a listener loop, like xfr
From within the xle you can also invoke the Tcl command
int load_prolog_image()
This loads a prolog image into the xle. If the
extern char *prolog_image is non null, then the saved image located
at the full path name given by prolog_image will be loaded. If it
is null, then it will load the first it finds of
$XLEPATH/bin/transfer.sav, $XLEPROLOGPATH/bin/transfer.sav, and
$PWD/transfer.sav (note: XLEPROLOGPATH will probably not be
From Tcl, you can set the prolog_image variable by, e.g.
set prolog_image /tilde/crouch/xle/transfer/transfer.sav
int prolog(char *command)
This passes a command string to prolog, which prolog executes. The
command string should obey prolog syntax, except that there should
be no terminating period. Values assigned to prolog variables in
the command string will not be available --- i.e. only the side
effects of the prolog command are relevant.
The corresponding Tcl command is, e.g.
prolog "X is 2+2, write(X), nl, nl"
Note that it is OK to include a final period in the Tcl command
string, though this is not obligatory.
Both commands are implemented via means of the prolog predicate
call_string/1, which evaluates its string argument as a prolog goal,
and catches any exceptions.
int load_transfer_rules(char *file)
Call reload_rules to load the transfer rule file
From Tcl, you are probably best off doing
prolog "reload_rules('/tilde/crouch/xle/transfer/rules')"
Graph *fs_transfer(Chart *chart, Graph *fsIn);
Passes fsIn through transfer and returns the result.
From Tcl
create-parser /project/pargram/english/standard/english.lfg
create-generator /project/pargram/english/standard/english.lfg
translate_sentence "Ed slept." $defaultparser $defaultgenerator
will not only transfer, but also generate.
This will place you inside a rudimentary prolog read loop (the same one as is used in the interactive versions of the transfer and triples shell commands). From here you have direct access to the prolog runtime system distributed alongside the XLE. Note, however, that runtime systems do not provide a prolog debugger, and can only consult new prolog code, not compile it. This may be different if you have your own licensed development system, and replace $XLEPATH/bin/sp-{Version} with a soft link to $SP_PATH/lib (see sicstus release notes for distributing run time systems).
prolog "my_prolog_loop"
From the XLE side, the file $XLEPATH/bin/translate.tcl contains a bunch of possibly unmaintained code that fires up a prolog process passing the Host and Port as command line arguments, and passing them through to start_server, e.g.
This will open a prolog socket on the specified Host and Port, end
enter a loop that reads and acts on commands written to the stream
associated with the socket.
where the contents of might be something like
sicstus -l -a localhost 2453
main :- prolog_flag(argc, [Host,Port]), start_server(Host,Port).
:- main, halt.
This will fire up an xle process if one is not already running, and open
a pipe to it, as follows
exec('xle',[pipe(Input),pipe(Output),Err], PID),
The identity of the process Id (PID) and Input and Output pipes are
asserted as
What is done with the error stream can be controlled by
:- assert(xleinterface:xle_err(_,null)). % suppress errors (default)
%:- assert(xleinterface:xle_err(_,std)). % write errors to std
The exec commands write their strings to the Input pipe, and read
what the XLE returns from the Output pipe. The FormatString
versions of the exec commands use prolog's format/3 to write to
Input. You should always include a new line character in the format
string (the CommandString version automatically inserts a new
line). Normally you would want to call check_xle_running before
executing a command, just to check it is still running.
silent_xle_exec reads the response XLE writes to Output into the
ResultString. The other two commands print Output directly to
user_output. The silent option is useful if you want to have the
XLE running completely hidden in the background, and have prolog
munge over the results.
To interact with the XLE via Tk windows etc, call
xle_exec("set no_Tk 0~n",[]).
This forces the XLE to open up its usual windows. To prevent XLE
from showing its windows, use xle_exec("set no_Tk 1~n",[])
Clicking on command buttons in the Tk windows will not have any
effect on the transfer process, which remains resolutely on control
of things. However, you can use xle_exec to issue Tcl commands that
get the XLE to write results to specified files, and then have
prolog read the contents of the files, e.g.
get_fs_from_xle(FS) :-
xle_exec("print-fs-as-prolog ~a~n",['']),
open('', read, Stream),
This looks up all the process ids of active XLE processes, and
shuts the processes down. A common problem is that the XLE does
not like one of the commands it has been sent. It does not usually
complain straight away, but next time you issue a command you get
an exception complaining about a format error in trying to write -1
where a character was expected. Under these circumstances, your
best bet is just to shut the xle process down, and start up all
over again.
The user defined predicate my_listener reads the Data and Command Files. If the Command file contains what should be taken as an instruction to break out of the loop (e.g. the command "break."), then the Break flag is set to true. This acts as an instruction to listen/2 to break out of its loop.
:- module(my_module, [go/0]).
my_listener(DataFile,CommandFile,Break) :-
Command = process ->
Break = fail,
Command = break ->
Break = true
go :- listen(my_module, my_listener).
:- communication_pipes('.mydatafile', '.mycommandfile', '.my_ping').
To initiate the listener start the prolog process, start the XLE process, call install-my-listener-menu in XLE to set up the right buttons, call go/0 from the prolog process, parse a sentence in the XLE process, click on one of the newly installed command buttons. The order in which processes are started up does not matter. However, clicking on listener buttons will have no effect if go/0 is not running.
proc install-my-listener-menu {} {
global fsCommands fsChartCommands
add-item-to-xle-menu \
{command -label "Process" \
-command "data-to-listener $self" \
-doc "Runs process on fs."} \
add-item-to-xle-menu \
{command -label "Break " \
-command "command-to-listener break." \
-doc "Breaks out of listener."} \
# Specify communication pipes (must coincide with set_communication_pipes
set datapipe [glob ~/]/.mydatafile
set commandpipe [glob ~/]/.mycommandfile
set pingpipe [glob ~/]/.my_ping
set pinged 0
proc command-to-listener {command} {
global pinged
global datapipe
global commandpipe
global pingpipe
set cmd1 "echo '$command' > '$datapipe'"
set cmd2 "echo '$command' > '$commandpipe'"
exec sh -c $cmd1
exec sh -c $cmd2
if {$pinged == 0} {
check-ping-result $datapipe $pingpipe
proc data-to-listener {window} {
global pinged
global datapipe
global commandpipe
global pingpipe
print-fs-as-prolog $datapipe $window
set cmd "echo 'process.' > '$commandpipe'"
exec sh -c $cmd
if {$pinged == 0} {
check-ping-result $datapipe $pingpipe
# The function check-ping-result is defined in translate.tcl
The following procedures map between XLE-internal, Prolog-external and Named-external context notations. Note that in converting collections of facts, compound terms are inspected to find occurrences of cf(Context, Pred) expressions at any level. However, no descent is made inside such expressions. This allows you to convert two discrete collections of facts at the same time, e.g. [FS_Facts, CS_Facts]
create_fs_choice_space(+PrologChoices, +PrologEquivs, -XLEChoices)
ext2int_contexts(+PrologChoices, +PrologEquivs, -XLEChoices)
PrologChoices and PrologEquivs are the choice space and set of
equivalences such as might be taken from a prolog f-structure (in
external, prolog variable context form). XLEChoices is a pointer
to the XLE internal choice space (graph).
This procedure will create a new chart if one is not already in
existence using create_parser('',Chart). Otherwise it picks up the
existing chart and resets its storage. It then goes through the
prolog equivalences instantiating any context selections made.
Finally it constructs a choice space within the chart, essentially
by calling create_disjunction/4 whenever a new choices is
encountered, and instantiating prolog variables representing
contexts to the corresponding pointers to XLE contexts. As a side
effect of calling this procedure, prolog variables in contexted
facts will also be instantiated to their corresponding context
xle_safe_context(+Boolean, +XLEChoices, -XLEContext)
xle_context(+Boolean, +XLEChoices, -XLEContext)
Given a boolean context expression (with context variables
instantiated to XLEContext pointers), and the XLE choice space,
this converts the boolean expression into a pointer to a
context. This is the simplest procedure to use for creating new
boolean combinations of context. The boolean connectives permitted
are and(C1,C2), or(C1,C2), not(C1). However, negation tends to
be an expensive operation --- you would be well advised to use the
lower level subtract_clause/4 if possible.
The safe procedure (a) checks that all context variables are indeed
instantiated, and (b) binarizes all n-ary boolean expresssions so
that and(C1,C2,C3,...Cn) becomes
and(C1,and(C2,and(C3,...and(Cn-1,Cn)..))). The non-safe procedure
performs none of these checks. It thus provides a more efficient
way of evaluating boolean expressions against the choice space, but
will cause unpredictable results if the boolean expression is
either not ground or not binarized.
Returns the true or false XLE contexts. These are in fact the
integers 1 and 0 respectively.
Sometimes it is necessary to keep more than one choice space around
at a single time. By calling use_alternate_choice_space(Id) before
creating a choice space (e.g. with create_fs_choice_space), a
choice space will be set up under an alternative chart, identified
by Id. By calling use_primary_choice space, you will revert to using the
principal chart (Id = 0). By making sure you pass the pointer to
the correct XLEChoice to functions such as xle_context, you
manipulate several choice spaces at the same time. For example
create_fs_choice_space(PlgChoices0, PlgEquivs0, XleCS0),
create_fs_choice_space(PlgChoices1, PlgEquivs1, XleCS1),
xle_context(and(C0_1, C0_2), XleCS0, C0_12),
xle_context(or(and(C1_1,C1_2), C1_3), XleCS1, C1_123),
% reset and overwrite XleCS1:
create_fs_choice_space(PlgChoices2, PlgEquivs2, XleCS2),
Resets the storage / choice space either for the chart identified
by Id, or for whichever chart chart is current as identified by
either use_primary_choice_space or use_alternate_choice_space.
unpack_choice_space(+XLEChoices, -Solution)
Successive backtracking through this will unpack a sequence of
choice space solutions. When there are no more solutions left,
will set Solutions = 0.
Typical calling sequence:
unpack_choice_space(ChoiceSpace, Solution),
do_something(Solution, ChoiceSpace, ContextedFacts),
solution == 0,
xle_unpack_fstr(+PackedFStr, +XLEChoices, -UnPackedFStr)
This uses unpack_choice_space to backtrack through unapckings of an
fschart (in internal, XLE choice form). It is defined as follows:
xle_unpack_fstr(PackedFstr, ChoiceSpace, Fstr) :-
PackedFstr = fstructure(Sent,Props,_Choice,_Eqv,PFS,PCS),
Fstr = fstructure(Sent,Props,[],[],FS,CS),
\+ Solution = 0,
collect_true_facts(PFS, ChoiceSpace, FS),
collect_true_facts(PCS, ChoiceSpace, CS).
collect_true_facts(+PackedFacts, +XLEChoices, -TrueFacts)
Collects whichever cf(Ctx, Pred) facts are in the true context
(i.e. Ctx evaluates to 1). Typically this is called after a
particular solution has been imposed on the choice space.
collect_true_facts([cf(C,Fact)|Facts],ChoiceSpace,TrueFacts) :-
Value == 1 ->
TrueFacts = [cf(1,Fact)|TrueFacts1]
otherwise ->
TrueFacts = TrueFacts1
count_solutions(+XLEChoices, -Num)
Num is the number of solutions encoded in the XLEChoices.
The following functions are more or less directly derived from the corresponding C-functions, described in the relevant .h files. Bear in mind that when a C-function returns a result, this is reflected by an additional final arguments in the corresponding prolog predicate. A few functions place additional C wrappers around the C library functions to make them easier to access from prolog. These are listed first
ext2int_contexts(+PrologChoices, +PrologEquivs, +PrologFacts,
-XLEChoices, -IntFacts)
This is like ext2int_contexts/3, except that in addition
xle_safe_context is applied to all the contexted Facts in
PrologFacts. This produces contexted facts in internal context
format, where each context is a pointer rather than a boolean
The inverse of ext2int_contexts. Given a set of contexted facts,
IntFacts in internal context form, returns an external prolog
choice space, list of equivalences (usually empty) and contexted
facts in external prolog form, with boolean combinations of prolog
variables for contexts.
Like int2ext_contexts, except that instead of prolog variables for
context choices, they are replaced by mnemonic names, e.g.
[choice([cv('A',1), cv('A',2)], 1),
choice([cv('B',1), cv('B',2)], cv('A',1)],
[cf(cv('A',1), SUBJ(var(0), var(1))]
This is used, e.g. when writing contexted structures to a prolog
file, so that context variables get their familiar names like A1,
A2, instead of the arbitrary prolog variables of external format like
_12983, _12985.
Inverse mapping is
ext2named_contexts(+ExtChoices, +ExtEquivs, +ExtFacts,
-NChoices, -NEquivs, -NFacts)
Replaces prolog context variables by their mnemonic names in
copies of the external structures.
Inverse mapping is
name_internal_choices(+IntChoices, -NChoices)
Given an internal XLEChoice e.g. 109367, returns a named choice
space, e.g.
[choice([cv('A',1), cv('A',2)], 1),
choice([cv('B',1), cv('B',2)], cv('A',1)],
name_internal_equivs(+IntChoices, -NChoices)
Given an internal XLEChoice e.g. 109367, returns a named list of
equivalences, usually = []
name_internal_context(+XLEContext, -NamedContext)
Given a pointer to an XLE context, return a named boolean
exression, e.g. and(cv('AQ', 1), cv('B',5))
write_named_context(+Stream, +NamedContext)
Writes the named context either to Stream or user_output so that
it looks as though it contains prolog variables, e.g.
and(cv('AQ', 1), cv('B',5))
==> and(AQ1, B5)
write_named_context_no_commas(+Stream, +NamedContext)
Writes the named context either to Stream or user_output so that
it looks as though it contains prolog variables, but without
commas (used for triples notation)
and(cv('AQ', 1), cv('B',5))
==> and(AQ1 B5)
Given a list (i.e. not a compound expression) of named contexted
facts, named choice definitions, named selections, and/or named
definitions, will print out a list of the same, but using
write_named_context throughout.
fs2graph(+PrologFStructure, -XleFSGraph)
graph2fs(+XleFSGraph, -PrologFStructure)
Converts between prolog fstructures and pointers to XLE internal
representations of f-structures. This is not stable at the
moment. It is implemented at present by writing structures to
files and reading them back in again using
print_prolog_graph_file(File,XleFSGraph) and
The trouble with this is that reading in a prolog file to create
an FS Graph is dependent on having the Chart set up for the
correct parser/grammar. However, prolog's Charts are usually
created with a null grammar, so that the file gets read in
The following are listed without description, see the relevant C header files:
These call the C functions read_prolog_graph and print_prolog_graph
(which require stream arguments) after having first opened the File
to create the stream.
Calls C generate_from_graph, but returns a pointer to the WordNet
rather than the WordNet itself
Calls C print_net_as_regexp, but inputs a pointer to the WordNet
rather than the WordNet itself
assert_nogood(+XLEChoices, +XLEContext)
Calls C assert_nogood, but with NULL pointers to all the items
documenting the nogood. (Note: asserting nogoods can be expensive)
init_xle/2, create_parser/2, parse_sentence/4,
next_graph_solution/3, free_graph_solution/1,
reset_storage/1, create_graph/2, create_generator/2,
make_new_choice_disjunction/3, create_disjunction/4,
get_choice/4, conjoin_clauses/5, disjoin_clauses/4,
subtract_clause/4, negate_clause/4, not_clause/3,
evaluate_clause/3, evaluate_choices/2, covers_clause/3,
get_edge_solutions/2, first_dnf_solution/3, next_dnf_solution/3,
set_solution_choice_values/2, select_choice/1,set_choice_values/2,
Like member and memberchk, except that Item is strictly identical
(==) to some element in list
Item is strictly identical to some subexpression of Expr
vartail(+VTList, -VariableTail)
Returns the variable tail of a variable tail list, e.g.
Appends two variable lists by setting the tail of List1 = List2
vt_member(Item, +VTList)
Gets members of variable tail lists
Converts an ordinary list to a variable tail list
VTList is an empty variable tail list
Removes the keys from a key-sorted list
concat_list(+ListOfAtoms, -Atom)
Concatenates all the atoms (and/or integers) in ListofAtoms
together to create a new atom
Generates a unique atom with the specified prefix
Time the Goal, repeated either 1 or Repetitions times
Conditional format --- only print if current format level is less
than or equal to Level
Set current format levels
Print String followed by CPU time (in secs) since either last call
to time_msg, or last call to reset_time_msgs. Conditional version
depends on current format_level
Call statistics(runtime,_) to reset timings
Sets the value of a parameter. Retracts all previous parameter
settings and then asserts user:Parameter
setsys(Parameter, Value)
getsys(Parameter, Value)
An alternative to setp (and not integrated with it). Used only to
control pretty printer (below).
Pretty print the expression (can be a bit ropey on printing
anonymous variables). You can control the line length with
setsys(pagewidth, N)
and the print depth with
setsys(ppdepth, M)
where N and M are integers
Set the prolog debugger to use pp rather than write, and unset it.
get_option(Option, Value, ArgList)
get_optional(Option, Value, ArgList)
Value is the item following Option in the ArgList. get_option
prints an error message if Option is not present in the list,
whereas get_optional fails silently
get_option_list(Option, Values, ArgList, Terminators)
Values is the sublist of ArgList lying between Option and the
first member of Terminators. Prints an error message if Option is
Returns the suffix following the final . in File
dir_and_file(+DirFile, -Dir, -File)
Splits DirFile into its slash terminated directory and the file
Concatenates directory to file (Dir must be slash terminated)
Adds a trailing slash to Dir, if missing.
Removes the specified suffix from the filename
Converts e.g. '3' to 3. Useful for manipulating command line
arguments, where numerical items get read as atoms not integers.