pecco - C++ library for efficient classification with conjunctive features

developed by Naoki Yoshinaga at Yoshinaga Lab., IIS, University of Tokyo

About

Pecco is a C++ library for efficient classification with conjunctive features [1,2]. Pecco is fast because it performs adaptive classification; it enumerates common classification problems (feature sequences) in the target task, and computes their weights in advance. Pecco solves an input problem by computing a margin from the pre-calculated weight of a common problem most similar to the input.

The latest version of pecco enumerates common classification problems from input stream of classification problems; in other words, it adaptively speeds up the classification while processing the stream.

Pecco is x2-10 or x30-500 faster than a linear classifier or a kernel-based classifier in several NLP tasks including dependency parsing and base phrase chunking, yielding >10000 sentences/sec. transition-based dependency parser on laptop environment. Note that pecco does not support training (testing only); consider opal for training.

If you make use of pecco for research or commercial purposes, the reference will be:

N. Yoshinaga and M. Kitsuregawa. A Self-adaptive Classifier for Efficient Text-stream Processing. Proc. COLING 2014, pp. 1091--1102. 2014.
N. Yoshinaga and M. Kitsuregawa. Polynomial to Linear: Efficient Classification with Conjunctive Features. Proc. EMNLP 2009, pp. 1542--1551. 2009. A longer journal version is here.

Features

Extremely fast classification based on feature sequence trie [FST] [-t 2] needs sub-linear time with respect to the number of (conjunctive) features.
Efficient classification based on dynamic feature sequence trie [PMT] [-t 3] adapts a classifier to a text stream to avoid duplicated computation.
Fast classification using polynomial kernel expanded [PKE] [3] [-t 1 -s (>0)] can reduce memory footprint of kernel models by ignoring useless features with negligible weights (approximation).
Fast classification using polynomial kernel splitting [SPLIT] [4] [-t 1 -r (>0)] can control a trade-off between time and space needed, without losing the exactness of the model (kernel model only).
Kernel-based classification using polynomial kernel inverted [PKI] [3] [-t 0] uses inverted indices to reduce the cost of kernel evaluation; slower but most space-efficient and smallest preparation time (kernel model only).
Truncated margin computations: Pecco truncates margin computations that do not contribute to determining labels for FST, PKE, and SPLIT, when exact margin is unnecessary [-O 0,1].
Various model support: `to_pecco.py model' checks whether pecco can handle model, and re-formats it if needed; currently, the following models are supported:
- kernel-based model trained with LIBSVM, SVM^light, TinySVM, or opal; linear kernel and polynomial kernel of d≤4.
- multi-class kernel-based model trained with opal.
- (multi-class) log-linear model trained with Tsuruoka's MaxEnt Estimator, taking conjunctive features '\d+[:\d+]+'.
SWIG-based Perl/Python/Ruby/Lua bindings: you can use pecco from your favorite script languages, although it may undermine the virtue of peco (efficiency); follow instructions under swig/.

License (except test directory): GNU GPLv2, LGPLv2.1, and BSD; or e-mail me for other licenses you want.

NOTE: The distribution includes the labeled examples for a transition-based dependency parser along with LLMs and SVMs trained with them, all of which are used in the paper.

Download & Setup

> wget http://www.tkl.iis.u-tokyo.ac.jp/~ynaga/pecco/pecco-latest.tar.gz
> tar zxvf pecco-latest.tar.gz
> cd pecco-YYYY-MM-DD

# default classifier is usually efficient enough for most users
> configure

# you can obtain x2-3 speed up by enabling array-based pseudo trie
> configure --enable-array-trie

# a bit slower, x2-3 more space-efficient classifier (esp. d≥3 models)
#   --enable-abuse-trie is meant for speeding up (binary) kernel-based models;
#     it loses 1-bit float precision of the classifier outputs (scores)
> configure --with-trie-impl=darts-clone --enable-float (--enable-abuse-trie)

> make install

For those who want use pecco in hyponymy extraction tool 1.0: apply this patch to the tool so that it can use the current version of pecco.

For Mac OS X users: try port pecco via MacPorts (special thanks to @hjym_u).

For those who want to run pecco with kernel-based models (d≥3): darts-clone is recommended to save disk/memory usage in storing weights off features; this is because Directed Acyclic Word Graph (darts-clone) effectively collapses nodes (features) with the same weights (1/2 to 1/3 memory needed). The default (included) library, cedar, is slightly faster than darts-clone, but requires more memory (see size in binary dataset).

Requirements

OS: 32/64-bit UNIX-compatible OS (tested on Linux / Mac OS X)
Compiler: tested with GNU gcc (≥ 4.0) or clang (≥ 2.9)
Library (Optional): You can use Darts (by Taku Kudo) or darts-clone (by Susumu Yata; recommended) to store weights of features / feature sequences.
Data (Optional): Possible feature vectors in the target task (in the same format as the training data) to build an feature sequence trie (fstrie) [-t 1]. You can start with the training data.

ToDo

quantization of feature weights.
Support manually specified feature combination.

History

March 18th 2022 (xz) (development; minor bug/typo fixes and docs only):
- Support Python3 instead of Python in SWIG bindings
October 5th 2015 (xz):
- Change License from GNU GPL/LGPL to GNU GPL/LGPL and 2-clause BSD license.
June 22nd, 2014 (xz):
- Implement an adaptive classifier that builds feature sequence trie online.
- (Forgot to) add test scripts for swig bindings.
- Fix a bug in swig bindings.
- Change the numberings of classifiers [-t].
- Disable array-based pseudo trie as default.
December 25th, 2013 (xz):
- Package with the latest version of cedar; the reduced trie significantly reduces time and space needed in trainig/testing when --kernel-splitN is set to large.
- Add an option to output test examples with predicted labels ([-o] instead of [-O]).
- Fix a configuration error in clang (Mac OS X Mavericks).
January 21st, 2013 (xz):
- Fix a bug in configuring with --enable-float (in the last version).
- Fix a compilation error in gcc 4.3/4.4 [-O2]; missing template specialization.
- Add a workaround for a clang 2.9/3.0 bug in value initialization within template class.
- Include the latest version of cedar.
December 12th, 2012 (xz):
- Speed up testing (esp d = 1) by implementing raw strtol to read feature vectors.
- Remove most of the paddings and static variables (verified with clang++ -Weverything).
- Update swig files (incl. a workaround for SSE4.2 popcnt).
October 25th, 2012 (xz):
- x1.5 speed up kernel splitting with large [-r] (esp. d = 3), by using popcnt (SSE4.2 or lookup table) to compute inner products.
- Fix a bug in storing sigmoid parameters (unaware of model approximation [-s >0]).
- Fix a bug in configuring with --disable-model-suffix (keep original SPLIT ratio [-r]).
- Support minimum frequency support [-m] in mining conjunctive features (experimental).
October 6th, 2012 (xz):
- Perform sigmoid fitting for kernel model to support probability output.
- Add a script tool/to_pecco.py to check whether pecco can handle input model.
- Add option [-O 1,2,3] to output labels, classifier_outputs (raw values of decision function), and probabilities.
- Support models trained with LIBSVM and SVM^light (tool/to_pecco.py).
- Support linear kernel (regarded as a special case of polynomial kernel: (1 s^T x + 0)^1).
- Fix bugs in handling empty feature vectors.
September 26th, 2012 (xz):
- Implement MSD radix sort for sorting rare features (guarantee complexity of O(1)).
- Resolve a tiny memory leak in d=1.
- Add a workaround for a gcc bug in value initialization.
September 11th, 2012 (xz):
- x1.5-2 speed up by skipping unnecessary margin computations (like opal).
- Reduce memory footprint in a linear model (d = 1).
- Reduce memory footprint by ignoring 0-weighted features / support vectors.
- Fix a bug that causes a crash in SPLIT (d = 1).
- Fix a bug in reading examples with trailing spaces.
- Support 32-bit cygwin environments (very experimental; RTDSC timer doesn't work).
- Oops, I forgot to include test data and swig wrappers in the last version.
May 15th, 2012:
- Support GNU autoconf/automake.
- Change License from GNU GPL to GNU GPL/LGPL.
- Add a configuration parameter (--disable-model-suffix) that disables model suffixes.
- Add option [-c] that forces to compile a model.
December 21st, 2011:
- Rewrite interface with GNU Getopt.
- x2-3 speed up using an array-based pseudo trie.
- x1.5 speed up with logistic regression using template specialization.
- Reduce overhead of sorting features by twiddling bits.
- Support multi-class kernel-based models trained with opal.
- Support automatic model identification; the (above mentioned) supported models are handled accordingly.
- Support the latest version of darts-clone (0.32g).
- Compress support vectors by variable-byte coding to reduce disk space needed to store a compiled model.
August 29th, 2011:
- Add SWIG-based Perl/Python/Ruby/Lua bindings (experimental).
- Reduce the performance difference among executables compiled by various GCC versions.
- Rename dda.h to cedar.h
- Remove intricacy in Makefile (you can make install).
- NOTE: USE_DARTS_CLONE (in pecco_conf.h) becomes obsolete; set USE_DARTS instead. (thanks to @hjym_u)
January 10th, 2011:
- Significantly speed up mining higher-order conjunctive features.
September 16th, 2010:
- Resolve memory leak in dda.h.
August 1st, 2010:
- Include double array implementation (dda.h) used in opal to improve the portability (pecco will become ~20% slower and require more memory).
July 21st, 2010:
- Add minor comments and fix copyright.
- Support a model trained with opal.
March 16th, 2010:
- Support '\t' as delimiter in model/test files.
December 19th, 2009:
- Improve implementation of kernel splitting (10-40% speed-up).
December 17th, 2009:
- Fix a compilation error with Apple gcc 4.2.1 (snow leopard).
December 1st, 2009:
- initial release.

Usage

command-line options

Typing ./pecco --help shows the following usage information; test/ includes models used in the paper.

pecco - please enjoy classification w/ conjunctive features
Copyright (c) 2008-2014 Naoki Yoshinaga, All rights reserved.

Usage: src/pecco [options] model [test]

model   model file             model
test    test file              test examples; read STDIN if omitted

Optional parameters:
  -t, --classifier=TYPE        select classifier type
                                 0 - PKI (kernel model only)
                                 1 - PKE | SPLIT
                               * 2 - FST
                                 3 - PMT
  -p, --pmsize=INT             set upper limit of partial margins (1 << INT; default: 20)
  -e, --event=FILE             examples to obtain feature count for reordering ()
  -f, --fst-event=FILE         examples to enumerate feature sequence in FST ()
  -i, --fst-prune-factor=INT   use FST with 2^-INT of feature sequences (0)
  -b, --fst-build-verbose      build FSTs with 2^-i (i > 0) of feature sequences
  -c, --force-compile          force to recompile model
  -O, --output=TYPE            select output type of testing
                               * 0 - report accuracy
                                 1 - + labels
                                 2 - + labels + classifier outputs
                                 3 - + labels + probabilities
  -o, --output!=TYPE           output examples with labels/margins

Optional parameters in kernel model:
  -m, --pke-minsup=INT         threshold to feature frequency (1)
  -s, --pke-sigma=FLOAT        threshold to feature weight (0)
  -r, --split-ratio=FLOAT      threshold to feature frequency ratio (0)

Misc.:
  -v, --verbose=INT            verbosity level (0)
  -h, --help                   show this help and exit

file format for [-e] and [-f] options

(BNF-like notation)
<class>   := +1 | -1
<feature> := integer // >=1
<value>   := 1       // value will be ignored
<line>    := <class> <feature>:<value> <feature>:<value> ... <feature>:<value>

example

# analyze examples in test using an PKE classifier
#   (-s: threshold \sigma to prune insignificant features [3])
> pecco -t 1 -s 0.01 -v 1 model test

# analyze examples in test using a SPLIT classifier
> pecco -t 1 -r 0.01 -v 1 model test

# analyze examples in test using an FST (+ SPLIT) classifier with event
# (you can guide ordering features by using event [-e] to count features;
#  this would be useful if test data have different feature distribution
#  from the training data)
> pecco -t 2 -r 0.01 -v 1 -e event -f event model test

# analyze examples in test using an FST classifier with a pruned fstrie
# (the pruning scheme in reduces # feature sequences (to build fstrie) to 1/25)
> pecco -t 2 -i 5 -e train -f event -v 1 model test

Performance Comparison

See the performance comparison section in opal.

Tuning hyper-parameters

You may want to set either [-r] or [-s] to reduce memory footprint of the classifier for a kernel model (d≥2). If you stick to exact computation (margin), set [-t 1] with positive SPLIT ratio [-r]. Starting from [-r 1.0] (taking features appearing in 1% of support vectors as common features), decrease the value in a step-wise manner (0.5 → 0.1 → 0.05 → 0.01..) as long as the classifier speeds up and it is space-efficient enough for your demand (too small value makes the classifier slower). Although the optimal value depends on the degree of polynomial kernel and the sparsity of feature distribution, one rule-of-thumb value is somewhere around 0.005, 0.05, and 0.5 for d=2,3,4, respectively.

If you can accept some approximation, consider to set positive PKE sigma [-s] instead of [-r]. You should carefully tune [-s] using development set; larger PKE sigma will result in a `looser' classifier. You may want to know in advance the exact accuracy using SPLIT classifier (see above). Generally, setting [-s] yields a faster classifier (than setting [-r]), and the speed up gained by FST [-t 1] will be more significant.

After fixing [-r] or [-s] parameter, consider using FST classifier [-t 2] to build the fastest classifier for d≥2 models if you can prepare some examples in the task (e.g., the training data). You may want to set [-i n] for d=2 models; use 1/2ⁿ of fundamental classification problems, in order to build the fastest classifier. [-b] option will help this. Note that you can obtain the same classifier outputs regardless of [-t] value.

For a linear model, you need to set neither [-r] nor [-s] option. FST [-t 2] is usually faster than [-t 0] for d≥2 models.

Disclaimer

We do not guarantee that the implemented algorithms other than those proposed by us are patent-free; we regarded them to be patent-free simply because their implementations are available as (existing) open-source softwares (otherwise a simple patent look-up). Please be careful when you use this software for commercial use.

How to pronounce `pecco'

Read as you want; one may imagine a naming like ``Portable Efficient Classifier with COnjunctive features'' or ``Polynomial-to-linear Efficient Classifier Carefully Optimized'', but I just think ``Please Enjoy Classification with COnjunctive features''. The naming comes into my mind when I felt hungry (hara-peco in Japanese), and you can call the software `pecco-peco' when you are really hungry.

References

N. Yoshinaga and M. Kitsuregawa. Polynomial to Linear: Efficient Classification with Conjunctive Features. Proc. EMNLP 2009, pp. 1542--1551. 2009. A longer journal version is here.
N. Yoshinaga and M. Kitsuregawa. A Self-adaptive Classifier for Efficient Text-stream Processing. Proc. COLING 2014, pp. 1091--1102. 2014.
T. Kudo and Y. Matsumoto. Fast Methods for Kernel-Based Text Analysis. Proc. ACL, pages 24--31. 2003.
Y. Goldberg and M. Elhadad. splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications. Proc. ACL, short papers, pages 237--240. 2008.