Pecco is a C++ library for efficient classification with conjunctive features [1,2]. Pecco is fast because it performs adaptive classification; it enumerates common classification problems (feature sequences) in the target task, and computes their weights in advance. Pecco solves an input problem by computing a margin from the pre-calculated weight of a common problem most similar to the input.
The latest version of pecco enumerates common classification problems from input stream of classification problems; in other words, it adaptively speeds up the classification while processing the stream.
Pecco is x2-10 or x30-500 faster than a linear classifier or a kernel-based classifier in several NLP tasks including dependency parsing and base phrase chunking, yielding >10000 sentences/sec. transition-based dependency parser on laptop environment. Note that pecco does not support training (testing only); consider opal for training.
If you make use of pecco for research or commercial purposes, the reference will be:
N. Yoshinaga and M. Kitsuregawa. A Self-adaptive Classifier for Efficient Text-stream Processing. Proc. COLING 2014, pp. 1091--1102. 2014.
N. Yoshinaga and M. Kitsuregawa. Polynomial to Linear: Efficient Classification with Conjunctive Features. Proc. EMNLP 2009, pp. 1542--1551. 2009. A longer journal version is here.
[-t 2]
needs sub-linear time with respect to the number of (conjunctive) features.[-t 3]
adapts a classifier to a text stream to avoid duplicated computation.[-t 1 -s (>0)]
can reduce memory footprint of kernel models by ignoring useless features with negligible weights (approximation).[-t 1 -r (>0)]
can control a trade-off between time and space needed, without losing the exactness of the model (kernel model only).[-t 0]
uses inverted indices to reduce the cost of kernel evaluation; slower but most space-efficient and smallest preparation time (kernel model only).[-O 0,1]
.`to_pecco.py model'
checks whether pecco can handle model
, and re-formats it if needed; currently, the following models are supported:
swig/
.License (except test directory): GNU GPLv2, LGPLv2.1, and BSD; or e-mail me for other licenses you want.
NOTE: The distribution includes the labeled examples for a transition-based dependency parser along with LLMs and SVMs trained with them, all of which are used in the paper.
> wget http://www.tkl.iis.u-tokyo.ac.jp/~ynaga/pecco/pecco-latest.tar.gz
> tar zxvf pecco-latest.tar.gz
> cd pecco-YYYY-MM-DD
# default classifier is usually efficient enough for most users
> configure
# you can obtain x2-3 speed up by enabling array-based pseudo trie
> configure --enable-array-trie
# a bit slower, x2-3 more space-efficient classifier (esp. d≥3 models)
# --enable-abuse-trie is meant for speeding up (binary) kernel-based models;
# it loses 1-bit float precision of the classifier outputs (scores)
> configure --with-trie-impl=darts-clone --enable-float (--enable-abuse-trie)
> make install
For those who want use pecco in hyponymy extraction tool 1.0: apply this patch to the tool so that it can use the current version of pecco.
For Mac OS X users: try port pecco via MacPorts (special thanks to @hjym_u).
For those who want to run pecco with kernel-based models (d≥3): darts-clone is recommended to save disk/memory usage in storing weights off features; this is because Directed Acyclic Word Graph (darts-clone) effectively collapses nodes (features) with the same weights (1/2 to 1/3 memory needed). The default (included) library, cedar, is slightly faster than darts-clone, but requires more memory (see size in binary dataset).
[-t]
.[-o]
instead of [-O]
).--enable-float
(in the last version).[-r]
(esp. d = 3), by using popcnt (SSE4.2 or lookup table) to compute inner products.[-s >0]
).--disable-model-suffix
(keep original SPLIT ratio [-r]
).[-m]
in mining conjunctive features (experimental).tool/to_pecco.py
to check whether pecco can handle input model.[-O 1,2,3]
to output labels, classifier_outputs (raw values of decision function), and probabilities.tool/to_pecco.py
).(--disable-model-suffix)
that disables model suffixes.[-c]
that forces to compile a model.command-line options
Typing ./pecco --help
shows the following usage information; test/
includes models used in the paper.
pecco - please enjoy classification w/ conjunctive features
Copyright (c) 2008-2014 Naoki Yoshinaga, All rights reserved.
Usage: src/pecco [options] model [test]
model model file model
test test file test examples; read STDIN if omitted
Optional parameters:
-t, --classifier=TYPE select classifier type
0 - PKI (kernel model only)
1 - PKE | SPLIT
* 2 - FST
3 - PMT
-p, --pmsize=INT set upper limit of partial margins (1 << INT; default: 20)
-e, --event=FILE examples to obtain feature count for reordering ()
-f, --fst-event=FILE examples to enumerate feature sequence in FST ()
-i, --fst-prune-factor=INT use FST with 2^-INT of feature sequences (0)
-b, --fst-build-verbose build FSTs with 2^-i (i > 0) of feature sequences
-c, --force-compile force to recompile model
-O, --output=TYPE select output type of testing
* 0 - report accuracy
1 - + labels
2 - + labels + classifier outputs
3 - + labels + probabilities
-o, --output!=TYPE output examples with labels/margins
Optional parameters in kernel model:
-m, --pke-minsup=INT threshold to feature frequency (1)
-s, --pke-sigma=FLOAT threshold to feature weight (0)
-r, --split-ratio=FLOAT threshold to feature frequency ratio (0)
Misc.:
-v, --verbose=INT verbosity level (0)
-h, --help show this help and exit
file format for [-e]
and [-f]
options
(BNF-like notation)
<class> := +1 | -1
<feature> := integer // >=1
<value> := 1 // value will be ignored
<line> := <class> <feature>:<value> <feature>:<value> ... <feature>:<value>
example
# analyze examples in test using an PKE classifier
# (-s: threshold \sigma to prune insignificant features [3])
> pecco -t 1 -s 0.01 -v 1 model test
# analyze examples in test using a SPLIT classifier
> pecco -t 1 -r 0.01 -v 1 model test
# analyze examples in test using an FST (+ SPLIT) classifier with event
# (you can guide ordering features by using event [-e] to count features;
# this would be useful if test data have different feature distribution
# from the training data)
> pecco -t 2 -r 0.01 -v 1 -e event -f event model test
# analyze examples in test using an FST classifier with a pruned fstrie
# (the pruning scheme in reduces # feature sequences (to build fstrie) to 1/25)
> pecco -t 2 -i 5 -e train -f event -v 1 model test
See the performance comparison section in opal.
You may want to set either [-r]
or [-s]
to reduce memory footprint of the classifier for a kernel model (d≥2). If you stick to exact computation (margin), set [-t 1]
with positive SPLIT ratio [-r]
. Starting from [-r 1.0]
(taking features appearing in 1% of support vectors as common features), decrease the value in a step-wise manner (0.5 → 0.1 → 0.05 → 0.01..) as long as the classifier speeds up and it is space-efficient enough for your demand (too small value makes the classifier slower). Although the optimal value depends on the degree of polynomial kernel and the sparsity of feature distribution, one rule-of-thumb value is somewhere around 0.005, 0.05, and 0.5 for d=2,3,4, respectively.
If you can accept some approximation, consider to set positive PKE sigma [-s]
instead of [-r]
. You should carefully tune [-s]
using development set; larger PKE sigma will result in a `looser' classifier. You may want to know in advance the exact accuracy using SPLIT classifier (see above). Generally, setting [-s]
yields a faster classifier (than setting [-r]
), and the speed up gained by FST [-t 1]
will be more significant.
After fixing [-r]
or [-s]
parameter, consider using FST classifier [-t 2]
to build the fastest classifier for d≥2 models if you can prepare some examples in the task (e.g., the training data). You may want to set
for d=2 models; use 1/2n of fundamental classification problems, in order to build the fastest classifier. [-i n]
[-b]
option will help this. Note that you can obtain the same classifier outputs regardless of [-t]
value.
For a linear model, you need to set neither [-r]
nor [-s]
option. FST [-t 2]
is usually faster than [-t 0]
for d≥2 models.
We do not guarantee that the implemented algorithms other than those proposed by us are patent-free; we regarded them to be patent-free simply because their implementations are available as (existing) open-source softwares (otherwise a simple patent look-up). Please be careful when you use this software for commercial use.
Read as you want; one may imagine a naming like ``Portable Efficient Classifier with COnjunctive features'' or ``Polynomial-to-linear Efficient Classifier Carefully Optimized'', but I just think ``Please Enjoy Classification with COnjunctive features''. The naming comes into my mind when I felt hungry (hara-peco in Japanese), and you can call the software `pecco-peco' when you are really hungry.