HPSG Parser Interface
The Standard Interface for HPSG Parsers defines the interface
for grammars, parsers, and other applications (including will).
Currently, this interface presupposes binary and unary rules and no
empty categories (CHP and TNTP follow this). The interface consists
of the following:
Grammar Interface defines the interface between a grammar and
a parser. Grammar writers must write the following interface to give
necessary data to a parser.
sentence_to_word_list(+$SENTENCE, -$MORPHs)
$SENTENCE: 'string'
$MORPHs: 'list' of 'bot'
This lets $MORPHs be the list of morphemes of the sentence, $SENTENCE.
lexical_entry(+$MORPH, -$SIGN)
lexical_entry(+$MORPH, -$NAME, -$SIGN)
ext_lexical_entry(+$MORPHLIST, +$TAIL, -$SIGN)
ext_lexical_entry(+$MORPHLIST, +$TAIL, -$NAME, -$SIGN)
$MORPH: 'bot'
$SIGN: 'bot'
$LEXID: 'bot'
$MORPHLIST: 'list'
$TAIL: 'list'
These let $SIGN be a lexical entry for $MORPH (or elements
described with the difference list ($MORPHLIST, $TAIL)). You
can provide $NAME for identifying each lexical entry (it is
stored in the LEX_NAME\ feature in 'edge_info'.
When you use backtracking, you can enumerate all the solutions
of $SIGN.
id_schema_binary($NAME, $LEFT, $RIGHT, $MOTHER, $HEAD, $DCP1, $DCP2)
id_schema_unary($NAME, $DAUGHTER, $MOTHER, $DCP1, $DCP2)
$NAME: string
$MOTHER, $LEFT, $RIGHT, $HEAD, $DAUGHTER: bot
$DCP1, $DCP2: pred
Specify schemata of HPSG grammars. $NAME can be used for identifying
each schema. The sign of the schema is specified by $MOTHER, where its
daughters are specified by $LEFT and $RIGHT ($DAUGHER in unary rules).
You can specify a definite clause program (DCP) $DCP1, $DCP2,
which is evaluated when the schema application succeeds.
If you do not need DCP, write "true".
In TNTP, $DCP1 is executed in compile time, and $DCP2 in phase2 parsing.
$HEAD is instansiated when compiling with one daughter.
'*grammar_name*'(+$NAME)
'*grammar_version*'(+$VERSION)
$NAME: string
$VERSION: string
a name and a version of the grammar
'*dtrs_paths*'(+$PATHs)
'*ignored_paths*'(+$PATHs)
$PATHs: list
The lists of paths for daughters and ignored paths in checking equivalence
in factoring. Ignored path's values are stored in SIGN_PLUS\ feature,
while daughters are not.
'*limit_words*'(+$NUM)
'*limit_edges*'(+$NUM)
$NUM: integer
The limit number of words (morphemes) and edges.
'*left_bracket*'(+$LB)
'*right_bracket*'(+$RB)
$LB, $RB: bot
The feature structure used as a left bracket and a right bracket
in the morpheme list.
parse(+$SENTENCE)
$SENTENCE: string
Parse a sentence $SENTENCE and store the results into the CKY table.
parse_with_brackets(+$SENTENCE)
$SENTENCE: string
Parse a sentence $SENTENCE and store the results into the CKY table.
The brackets specified by '*left_bracket*' and '*right_bracket*' are
considered to restrict the shape of the parsing tree.
parse_word_list(+$WORDLIST)
$WORDLIST: list
Parse a preprocessed sentence $WORDLIST and store the results into the CKY table.
'*parser_mode*'(+$MODE, +$VALUE)
$MODE, $VALUE: string
A parser-dependent switch for changing the behavior of the parser
Parser Input defines the access for parser outputs.
Parsers are assumed to store the parsing results in the following CKY
table-like data structure:
(0,n)
.......
................
(0,2) (1,3) ... (n-2,n)
(0,1) (1,2) (2,3) ... (n-1,n)
w_0 w_1 w_2 w_{n-1}
Each pair (i, j) corresponds to the cell in the CKY table which covers
the i-th word to j-th word. The edges are stored in each cell, where
they are indexed by an integer (an ID number). You can get an edge by
specifying this ID.
Applications can use the following predicates to get the result of
parsing.
apply_id_schema_binary(+$ID, -$MOTHER, -$LEFT_DAUGHTER, -$RIGHT_DAUGUTER)
apply_id_schema_unary(+$ID, $MOTHER, -$DAUGHTER)
$ID: integer
$MOTHER, $LEFT_DAUGHTER, $RIGHT_DAUGHTER, $DAUGHTER: bot
Get the feature structure of the $ID-th schema. $ID is specified by
the APPLIED_SCHEMA_ID\ of the 'nonterminal' type.
edge_id_list(+$I, +$J, -$LIST)
$I, $J: integer
$LIST: list of integers
Get the list of IDs of a specified cell ($I, $J) in the CKY table.
top_edge_id_list(-$LIST)
$LIST: list of integers
Get the list of IDs of the parsing result. The same as
edge_id_list(0, $LEN, $LIST), when $LEN is equal to the sentence length.
edge(+$ID, -$SIGN)
$ID: integer
$SIGN: bot
Get the feature structure of a specified ID.
edge_infos(+$ID, -$LIST)
$ID: integer
$LIST: list of 'edge_info'
Get the additional information of the edge of a specified ID.
The definition of 'edge_info' is like this:
edge_info <- [bot] +
[SIGN_PLUS\bot].
terminal <- [edge_info] +
[LEAF_CELL\',' (0),
LEX_NAME\bot (4),
LEAF_ID\integer (5)].
nonterminal <- [edge_info] +
[APPLIED_SCHEMA\bot (4),
APPLIED_SCHEMA_ID\integer (5)].
nonterminal_binary <- [nonterminal] +
[L_DTR\integer(0),
R_DTR\integer(1),
L_CELL\','(2),
R_CELL\','(3)].
nonterminal_unary <- [nonterminal] +
[U_DTR\integer(0),
U_CELL\','(2)].
get_variable('*word_list*', $LIST)
$LIST: list
The result of preprocessing.
get_variable('*sentence*', $SENTENCE)
$SENTENCE: string
The current sentence to be parsed.
get_variable('*sentence_length*', $LEN)
$LEN: integer
The length of the current sentence.
get_variable('*edge_number*', $NUM)
$NUM: integer
The numbers of edges of the current sentence.
get_variable('*preprocessing_time*', $TIME)
$TIME: integer
The preprocessing time of the current sentence (msec.)
get_variable('*lexicon_lookup_time*', $TIME)
$TIME: integer
The lexicon-lookup time of the current sentence (msec.)
get_variable('*parsing_time*', $TIME)
$TIME: integer
The parsing time of the current sentence (msec.)
'*parser*'(+$NAME)
'*parser_version*'(+$VERSION)
$NAME, $VERSION: string
The name and the version of the parser.
Section 5 describes the interface for graphical user
interface called will.
filter_sentence_id(+$INPUT, -$OUTPUT)
$INPUT, $OUTPUT: list of integers
Filter the ID of the sentence in $INPUT, and output to $OUTPUT.
word_to_string(+$MORPH, -$STRING)
$MORPH: bot
$STRING: string
Map an element of the morpheme list into a string.
schema_to_string(+$NAME, -$STRING)
$NAME: bot
$STRING: string
Map a name of a schema into a string.
sign_to_string(+$SIGN, -$STRING)
$SIGN: bot
$STRING: string
Map a sign into a string which are covered by this sign.
sign_to_symbol(+$SIGN, -$SYMBOL)
$SIGN: bot
$SYMBOL: string
Map a sign into a grammar symbol.
schema_to_labels(+$NAME, -$LEFT_LABEL, +$RIGHT_LABEL)
schema_to_label(+$NAME, -$LABEL)
$NAME: bot
$LEFT_LABEL, $RIGHT_LABEL, $LABEL: string
Map a schema into edge labels for two daughters (in case of binary
rules) or an edge label for unary daughter (in case of unary rules).
will_parse(+$SENTENCE)
$SENTENCE: string
Parse $SENTENCE. Usually you should specify 'parse/1' as the body of
this predicate, but you can add any preprocessors or postprocessors.
Tsujii lab.
Send comments to:
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)