pecco - C++ library for efficient classification with conjunctive features

developed by Naoki Yoshinaga at Yoshinaga Lab., IIS, University of Tokyo
Skip to [ Features | Download | History | Usage | Performance | Tuning | References ]

About

Pecco is a C++ library for efficient classification with conjunctive features [1,2]. Pecco is fast because it performs adaptive classification; it enumerates common classification problems (feature sequences) in the target task, and computes their weights in advance. Pecco solves an input problem by computing a margin from the pre-calculated weight of a common problem most similar to the input.

The latest version of pecco enumerates common classification problems from input stream of classification problems; in other words, it adaptively speeds up the classification while processing the stream.

Pecco is x2-10 or x30-500 faster than a linear classifier or a kernel-based classifier in several NLP tasks including dependency parsing and base phrase chunking, yielding >10000 sentences/sec. transition-based dependency parser on laptop environment. Note that pecco does not support training (testing only); consider opal for training.

If you make use of pecco for research or commercial purposes, the reference will be:

N. Yoshinaga and M. Kitsuregawa. A Self-adaptive Classifier for Efficient Text-stream Processing. Proc. COLING 2014, pp. 1091--1102. 2014.
N. Yoshinaga and M. Kitsuregawa. Polynomial to Linear: Efficient Classification with Conjunctive Features. Proc. EMNLP 2009, pp. 1542--1551. 2009. A longer journal version is here.

Features

License (except test directory): GNU GPLv2, LGPLv2.1, and BSD; or e-mail me for other licenses you want.

NOTE: The distribution includes the labeled examples for a transition-based dependency parser along with LLMs and SVMs trained with them, all of which are used in the paper.

Download & Setup

> wget http://www.tkl.iis.u-tokyo.ac.jp/~ynaga/pecco/pecco-latest.tar.gz
> tar zxvf pecco-latest.tar.gz
> cd pecco-YYYY-MM-DD

# default classifier is usually efficient enough for most users
> configure

# you can obtain x2-3 speed up by enabling array-based pseudo trie
> configure --enable-array-trie

# a bit slower, x2-3 more space-efficient classifier (esp. d≥3 models)
#   --enable-abuse-trie is meant for speeding up (binary) kernel-based models;
#     it loses 1-bit float precision of the classifier outputs (scores)
> configure --with-trie-impl=darts-clone --enable-float (--enable-abuse-trie)

> make install

For those who want use pecco in hyponymy extraction tool 1.0: apply this patch to the tool so that it can use the current version of pecco.

For Mac OS X users: try port pecco via MacPorts (special thanks to @hjym_u).

For those who want to run pecco with kernel-based models (d≥3): darts-clone is recommended to save disk/memory usage in storing weights off features; this is because Directed Acyclic Word Graph (darts-clone) effectively collapses nodes (features) with the same weights (1/2 to 1/3 memory needed). The default (included) library, cedar, is slightly faster than darts-clone, but requires more memory (see size in binary dataset).

Requirements

ToDo

History

Usage

command-line options

Typing ./pecco --help shows the following usage information; test/ includes models used in the paper.

pecco - please enjoy classification w/ conjunctive features
Copyright (c) 2008-2014 Naoki Yoshinaga, All rights reserved.

Usage: src/pecco [options] model [test]

model   model file             model
test    test file              test examples; read STDIN if omitted

Optional parameters:
  -t, --classifier=TYPE        select classifier type
                                 0 - PKI (kernel model only)
                                 1 - PKE | SPLIT
                               * 2 - FST
                                 3 - PMT
  -p, --pmsize=INT             set upper limit of partial margins (1 << INT; default: 20)
  -e, --event=FILE             examples to obtain feature count for reordering ()
  -f, --fst-event=FILE         examples to enumerate feature sequence in FST ()
  -i, --fst-prune-factor=INT   use FST with 2^-INT of feature sequences (0)
  -b, --fst-build-verbose      build FSTs with 2^-i (i > 0) of feature sequences
  -c, --force-compile          force to recompile model
  -O, --output=TYPE            select output type of testing
                               * 0 - report accuracy
                                 1 - + labels
                                 2 - + labels + classifier outputs
                                 3 - + labels + probabilities
  -o, --output!=TYPE           output examples with labels/margins

Optional parameters in kernel model:
  -m, --pke-minsup=INT         threshold to feature frequency (1)
  -s, --pke-sigma=FLOAT        threshold to feature weight (0)
  -r, --split-ratio=FLOAT      threshold to feature frequency ratio (0)

Misc.:
  -v, --verbose=INT            verbosity level (0)
  -h, --help                   show this help and exit

file format for [-e] and [-f] options

(BNF-like notation)
<class>   := +1 | -1
<feature> := integer // >=1
<value>   := 1       // value will be ignored
<line>    := <class> <feature>:<value> <feature>:<value> ... <feature>:<value>

example

# analyze examples in test using an PKE classifier
#   (-s: threshold \sigma to prune insignificant features [3])
> pecco -t 1 -s 0.01 -v 1 model test

# analyze examples in test using a SPLIT classifier
> pecco -t 1 -r 0.01 -v 1 model test

# analyze examples in test using an FST (+ SPLIT) classifier with event
# (you can guide ordering features by using event [-e] to count features;
#  this would be useful if test data have different feature distribution
#  from the training data)
> pecco -t 2 -r 0.01 -v 1 -e event -f event model test

# analyze examples in test using an FST classifier with a pruned fstrie
# (the pruning scheme in reduces # feature sequences (to build fstrie) to 1/25)
> pecco -t 2 -i 5 -e train -f event -v 1 model test

Performance Comparison

See the performance comparison section in opal.

Tuning hyper-parameters

You may want to set either [-r] or [-s] to reduce memory footprint of the classifier for a kernel model (d≥2). If you stick to exact computation (margin), set [-t 1] with positive SPLIT ratio [-r]. Starting from [-r 1.0] (taking features appearing in 1% of support vectors as common features), decrease the value in a step-wise manner (0.5 → 0.1 → 0.05 → 0.01..) as long as the classifier speeds up and it is space-efficient enough for your demand (too small value makes the classifier slower). Although the optimal value depends on the degree of polynomial kernel and the sparsity of feature distribution, one rule-of-thumb value is somewhere around 0.005, 0.05, and 0.5 for d=2,3,4, respectively.

If you can accept some approximation, consider to set positive PKE sigma [-s] instead of [-r]. You should carefully tune [-s] using development set; larger PKE sigma will result in a `looser' classifier. You may want to know in advance the exact accuracy using SPLIT classifier (see above). Generally, setting [-s] yields a faster classifier (than setting [-r]), and the speed up gained by FST [-t 1] will be more significant.

After fixing [-r] or [-s] parameter, consider using FST classifier [-t 2] to build the fastest classifier for d≥2 models if you can prepare some examples in the task (e.g., the training data). You may want to set [-i n] for d=2 models; use 1/2n of fundamental classification problems, in order to build the fastest classifier. [-b] option will help this. Note that you can obtain the same classifier outputs regardless of [-t] value.

For a linear model, you need to set neither [-r] nor [-s] option. FST [-t 2] is usually faster than [-t 0] for d≥2 models.

Disclaimer

We do not guarantee that the implemented algorithms other than those proposed by us are patent-free; we regarded them to be patent-free simply because their implementations are available as (existing) open-source softwares (otherwise a simple patent look-up). Please be careful when you use this software for commercial use.

How to pronounce `pecco'

Read as you want; one may imagine a naming like ``Portable Efficient Classifier with COnjunctive features'' or ``Polynomial-to-linear Efficient Classifier Carefully Optimized'', but I just think ``Please Enjoy Classification with COnjunctive features''. The naming comes into my mind when I felt hungry (hara-peco in Japanese), and you can call the software `pecco-peco' when you are really hungry.

References

  1. N. Yoshinaga and M. Kitsuregawa. Polynomial to Linear: Efficient Classification with Conjunctive Features. Proc. EMNLP 2009, pp. 1542--1551. 2009. A longer journal version is here.
  2. N. Yoshinaga and M. Kitsuregawa. A Self-adaptive Classifier for Efficient Text-stream Processing. Proc. COLING 2014, pp. 1091--1102. 2014.
  3. T. Kudo and Y. Matsumoto. Fast Methods for Kernel-Based Text Analysis. Proc. ACL, pages 24--31. 2003.
  4. Y. Goldberg and M. Elhadad. splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications. Proc. ACL, short papers, pages 237--240. 2008.

Copyright © 2009 - 2014 Naoki Yoshinaga, All right Reserved.
XHTML 1.1 $ Last modified at Thu May 12 01:44:21 2016 $