HPSG Parser Interface

The Standard Interface for HPSG Parsers defines the interface for grammars, parsers, and other applications (including will). Currently, this interface presupposes binary and unary rules and no empty categories (CHP and TNTP follow this). The interface consists of the following:

Grammar Interface

Grammar Interface defines the interface between a grammar and a parser. Grammar writers must write the following interface to give necessary data to a parser.
sentence_to_word_list(+$SENTENCE, -$MORPHs)
    $SENTENCE: 'string'
    $MORPHs:   'list' of 'bot'
This lets $MORPHs be the list of morphemes of the sentence, $SENTENCE.
lexical_entry(+$MORPH, -$SIGN)
lexical_entry(+$MORPH, -$NAME, -$SIGN)
ext_lexical_entry(+$MORPHLIST, +$TAIL, -$SIGN)
ext_lexical_entry(+$MORPHLIST, +$TAIL, -$NAME, -$SIGN)
    $MORPH: 'bot'
    $SIGN:  'bot'
    $LEXID: 'bot'
    $MORPHLIST: 'list'
    $TAIL: 'list'
These let $SIGN be a lexical entry for $MORPH (or elements described with the difference list ($MORPHLIST, $TAIL)). You can provide $NAME for identifying each lexical entry (it is stored in the LEX_NAME\ feature in 'edge_info'. When you use backtracking, you can enumerate all the solutions of $SIGN.
id_schema_binary($NAME, $LEFT, $RIGHT, $MOTHER, $HEAD, $DCP1, $DCP2)
id_schema_unary($NAME, $DAUGHTER, $MOTHER, $DCP1, $DCP2)
   $NAME: string
   $MOTHER, $LEFT, $RIGHT, $HEAD, $DAUGHTER: bot
   $DCP1, $DCP2: pred
Specify schemata of HPSG grammars. $NAME can be used for identifying each schema. The sign of the schema is specified by $MOTHER, where its daughters are specified by $LEFT and $RIGHT ($DAUGHER in unary rules). You can specify a definite clause program (DCP) $DCP1, $DCP2, which is evaluated when the schema application succeeds. If you do not need DCP, write "true". In TNTP, $DCP1 is executed in compile time, and $DCP2 in phase2 parsing. $HEAD is instansiated when compiling with one daughter.
'*grammar_name*'(+$NAME)
'*grammar_version*'(+$VERSION)
    $NAME:    string
    $VERSION: string
a name and a version of the grammar
'*dtrs_paths*'(+$PATHs)
'*ignored_paths*'(+$PATHs)
    $PATHs:   list
The lists of paths for daughters and ignored paths in checking equivalence in factoring. Ignored path's values are stored in SIGN_PLUS\ feature, while daughters are not.
'*limit_words*'(+$NUM)
'*limit_edges*'(+$NUM)
    $NUM:   integer
The limit number of words (morphemes) and edges.
'*left_bracket*'(+$LB)
'*right_bracket*'(+$RB)
    $LB, $RB:  bot
The feature structure used as a left bracket and a right bracket in the morpheme list.

Parser Input

parse(+$SENTENCE)
   $SENTENCE: string
Parse a sentence $SENTENCE and store the results into the CKY table.
parse_with_brackets(+$SENTENCE)
   $SENTENCE: string
Parse a sentence $SENTENCE and store the results into the CKY table. The brackets specified by '*left_bracket*' and '*right_bracket*' are considered to restrict the shape of the parsing tree.
parse_word_list(+$WORDLIST)
   $WORDLIST: list
Parse a preprocessed sentence $WORDLIST and store the results into the CKY table.
'*parser_mode*'(+$MODE, +$VALUE)
    $MODE, $VALUE: string
A parser-dependent switch for changing the behavior of the parser

Parser Output

Parser Input defines the access for parser outputs. Parsers are assumed to store the parsing results in the following CKY table-like data structure:
              (0,n)
             .......
        ................
    (0,2)  (1,3) ... (n-2,n)
 (0,1) (1,2) (2,3) ... (n-1,n)

  w_0   w_1   w_2      w_{n-1}
Each pair (i, j) corresponds to the cell in the CKY table which covers the i-th word to j-th word. The edges are stored in each cell, where they are indexed by an integer (an ID number). You can get an edge by specifying this ID. Applications can use the following predicates to get the result of parsing.
apply_id_schema_binary(+$ID, -$MOTHER, -$LEFT_DAUGHTER, -$RIGHT_DAUGUTER)
apply_id_schema_unary(+$ID, $MOTHER, -$DAUGHTER)
    $ID: integer
    $MOTHER, $LEFT_DAUGHTER, $RIGHT_DAUGHTER, $DAUGHTER: bot
Get the feature structure of the $ID-th schema. $ID is specified by the APPLIED_SCHEMA_ID\ of the 'nonterminal' type.
edge_id_list(+$I, +$J, -$LIST)
    $I, $J: integer
    $LIST: list of integers
Get the list of IDs of a specified cell ($I, $J) in the CKY table.
top_edge_id_list(-$LIST)
    $LIST: list of integers
Get the list of IDs of the parsing result. The same as edge_id_list(0, $LEN, $LIST), when $LEN is equal to the sentence length.
edge(+$ID, -$SIGN)
    $ID: integer
    $SIGN: bot
Get the feature structure of a specified ID.
edge_infos(+$ID, -$LIST)
    $ID: integer
    $LIST: list of 'edge_info'
Get the additional information of the edge of a specified ID. The definition of 'edge_info' is like this:
edge_info <- [bot] +
    [SIGN_PLUS\bot].
terminal <- [edge_info] +
    [LEAF_CELL\',' (0),
     LEX_NAME\bot (4),
     LEAF_ID\integer (5)].
nonterminal <- [edge_info] +
    [APPLIED_SCHEMA\bot (4),
     APPLIED_SCHEMA_ID\integer (5)].
nonterminal_binary <- [nonterminal] +
    [L_DTR\integer(0),
     R_DTR\integer(1),
     L_CELL\','(2),
     R_CELL\','(3)].
nonterminal_unary <- [nonterminal] +
   [U_DTR\integer(0),
    U_CELL\','(2)].
get_variable('*word_list*', $LIST)
    $LIST: list
The result of preprocessing.
get_variable('*sentence*', $SENTENCE)
    $SENTENCE: string
The current sentence to be parsed.
get_variable('*sentence_length*', $LEN)
    $LEN: integer
The length of the current sentence.
get_variable('*edge_number*', $NUM)
    $NUM: integer
The numbers of edges of the current sentence.
get_variable('*preprocessing_time*', $TIME)
    $TIME: integer
The preprocessing time of the current sentence (msec.)
get_variable('*lexicon_lookup_time*', $TIME)
    $TIME: integer
The lexicon-lookup time of the current sentence (msec.)
get_variable('*parsing_time*', $TIME)
    $TIME: integer
The parsing time of the current sentence (msec.)
'*parser*'(+$NAME)
'*parser_version*'(+$VERSION)
    $NAME, $VERSION: string
The name and the version of the parser.

will interface

Section 5 describes the interface for graphical user interface called will.
filter_sentence_id(+$INPUT, -$OUTPUT)
    $INPUT, $OUTPUT: list of integers
Filter the ID of the sentence in $INPUT, and output to $OUTPUT.
word_to_string(+$MORPH, -$STRING)
    $MORPH: bot
    $STRING: string
Map an element of the morpheme list into a string.
schema_to_string(+$NAME, -$STRING)
    $NAME: bot
    $STRING: string
Map a name of a schema into a string.
sign_to_string(+$SIGN, -$STRING)
    $SIGN: bot
    $STRING: string
Map a sign into a string which are covered by this sign.
sign_to_symbol(+$SIGN, -$SYMBOL)
    $SIGN: bot
    $SYMBOL: string
Map a sign into a grammar symbol.
schema_to_labels(+$NAME, -$LEFT_LABEL, +$RIGHT_LABEL)
schema_to_label(+$NAME, -$LABEL)
    $NAME: bot
    $LEFT_LABEL, $RIGHT_LABEL, $LABEL: string
Map a schema into edge labels for two daughters (in case of binary rules) or an edge label for unary daughter (in case of unary rules).
will_parse(+$SENTENCE)
    $SENTENCE: string
Parse $SENTENCE. Usually you should specify 'parse/1' as the body of this predicate, but you can add any preprocessors or postprocessors.
Tsujii lab.
Send comments to:
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)