opal - C++ header library of online learning with kernel slicing

developed by Naoki Yoshinaga at Kitsuregawa, Toyoda Lab., IIS, University of Tokyo
Quick Link: [ What is this? | Download & Setup | Usage | Performance | References ]

What is this?

Opal is a C++ header library of online learning [1,2] with kernel slicing [3], which enables large-scale training with binary conjunctive features (polynomial kernel). The kernel slicing generalizes kernel splitting [4] in a feature-wise manner to combine the merits of linear and kernel-based training. To improve the scalability of the training with redundant data (in NLP), the kernel slicing reuses the partial margins computed in past rounds and truncates margin computations that do not contribute to parameter updates. The resulting learner is faster than a vanilla kernel-based online learner (w/ inverted indexing technique) by a factor of 30-250, while retaining its space-efficiency. Opal supports both linear and kernel-based training (polynomial kernel) with binary features; the models trained with the polynomial kernel are saved in the SVMlight-like format.

Main features of opal are summarized as follows:

The following table summarizes the features of opal variants and pecco in training/testing.

featuresopal [-k -p]opal [-k]opal [-p]opalopalkpecco
trainingkernel expansion/splittingyesyesyesyesnon/a
kernel slicingyesyesnononon/a
truncated marginsyesnoyesnonon/a
testingkernel expansion/splittingyesyesyesyesnoyes
kernel slicing (or fstrie)yesyesnononoyes
adaptive classificationyesyesnononono
truncated marginsyesnoyesnonono

The libraries are distributed under the GNU General Public License (Version 2) and the GNU Lesser General Public License (Version 2.1) (e-mail me when you prefer some license other than GNU GPL/LGPL). Note that this license doesn't apply to the training/dev/test data in test/ directory, which are generated from the annotated data by the way described in [3,6].

Please keep in mind that opal is not a versatile learner; the efficiency is based on the assumption of binary feature space with skewed (or sparse) and redundant examples (typically, NLP).

Download & Setup

> wget http://www.tkl.iis.u-tokyo.ac.jp/~ynaga/opal/opal-latest.tar.bz2
> tar jxvf opal-latest.tar.bz2
> cd opal-YYYY-MM-DD
> configure
> make install

Alternatively, Mac OS X users can exploit macports to install opal (renamed as opal-ml to avoid package name confliction; again special thanks to @hjym_u); when you train a binary classifier, you may want to use opal instead of opal-multiclass (opal-multiclass is slow compared to opal).

See below for usage; In [3], PA-I SL, PA-I SL*, PA-I SP and PA-I KERNEL refer to opal [-k -p], opal [-p], opal, and opalk, respectively.

Requirements

ToDo

History

Usage

command-line options

Copyright (c) 2009-2012 Naoki Yoshinaga, All rights reserved.

Usage: src/opal-multi [options] train model test

train   training file       set '-' to skip training
model   model file          set '-' to training/test w/o saving a mode
test    test file           set '-' to skip testing

Optional parameters in training and testing:
  -t, --kernel=TYPE         select type of kernel function
                            * 0 - linear     (w^T * x)
                              1 - polynomial (s^T * x + 1)^d
  -d, --kernel-degree=INT   parameter d in polynomial kernel (0)
  -N, --kernel-splitN=INT   parameter N in kernel splitting (0)
                            (a learner [-N 0] will automatically calibrates N
                             to keep the weight trie within [-T] MiB bytes)
  -T, --max-trie-size=INT   max. RAM size (MiB) for feature weight trie (32)
                            (ignored if kernel-splitN [-N] is specified)

Optional parameters in training:
  -m, --model0=FILE         re-train a model trained w/ opal ('-')
  -l, --learner=TYPE        select learning algorithm
                              0 - Perceptron
                              1 - Passive Aggressive    (PA)
                            * 2 - Passive Aggressive I  (PA-I)
                              3 - Passive Aggressive II (PA-II)
  -c, --reg-cost=FLOAT      PA-I/-II aggressiveness parameter C (1.0)
  -i, --iteration=INT       # iterations (10)
  -a, --averaging           average parameters
  -s, --shuffling           shuffle training examples on RAM
  -b, --buffer=TYPE         select type of buffer to read examples
                            * 0 - Use RAM to load examples
                              1 - Use DISK to cache examples
                              2 - Do not cache examples
  -M, --max-examples=INT    max. # examples used in training (0: all)
  -k, --kernel-slicing      perform kernel slicing
  -p, --pruning-margin      terminate margin computation if unnecessarily
  -C, --max-classes=INT     max. # classes (needed for [-b 2])

Optional parameters in testing:
  -O, --output=TYPE        select output type of testing
                            * 0 - report accuracy
                              1 - report accuracy per iteration
                              2 - output margins

Misc.:
  -h, --help               show this help and exit

file format for train / test

(BNF-like notation)
<class>   := +1 | -1 (or any string in multi-class classification)
<feature> := integer // >=1
<value>   := 1       // value is ignored
<line>    := <class> <feature>:<value> <feature>:<value> ... <feature>:<value>

See opal/test/*.{train,dev,test}. As in SVMlight, feature/value pairs must be ordered by increasing feature number (values will be ignored and assumed to be 1, though).

example

# linear training w/ Averaged Perceptron [-l 0, -a] with shuffling examples
> opal -t 0 -l 0 -c 1.0 -i 5 -as train model -

# linear training and testing w/ Passive Aggressive-I w/o saving model
> opal -t 0 -c 0.005 -i 20 train - test

# kernel-based training and testing (polynomial kernel of d=2) with Passive Aggressive-I
#  (event on Disk; shuffling ignored)
> opal -t 1 -d 2 -i 20 -c 0.0005 -a -b 1 train model test

# kernel-based testing
#  (only combinations among 500 common features are explicitly considered)
> opal -N 500 - model test

NOTE:


Performance comparison

The following tables show a comparison among linear learners (opal; ver. in 2011, oll 0.2, AROW++ 0.05, libarow 0.2.0, Vowpal Wabbit 4.1, SVMlin 1.0 and LIBLINEAR 1.7) and a comparison among kernel-based learners (opal, LIBLINEAR-POLY2 1.7 ([-d 2] only) and TinySVM 0.09), and LIBSVM 3.0: for LIBLINEAR, results with the fastest algorithm and the algorithm that obtained the most accurate model are shown. pecco (FST [-t 1]) is also used to test the models obtained with opal ([-d 2,3]; fstrie is build from the training data). The experiments were conducted on Mac OS X 10.5 over Intel Xeon E5462 3.2Ghz CPU with 32GB main memory. Only oll uses 4-byte float as floating-point numbers (others use 8-byte double). AROW++ and libarow cannot output margins of examples (others (can) do). opal, LIBLINEAR-POLY2 (modified to handle a large number of conjunctive features; int->int64_t) and TinySVM (modified for the experiments in [3]) and LIBSVM (add -m64, to make speed and memory comparable to 64-bit TinySVM) adopt 64-bit addressing, while others adopt 32bit addressing. These differences may cause 10-20% difference in time, and at most 100% difference in space (32bit v.s. 64bit). opal with [-b 1] and Vowpal Wabbit do not load events on memory; this leads to slower but memory-efficient training (demand additional Disk space for caching data, though); in the experiments, examples for these learners were offline shuffled to make the accuracy of the obtained models comparable to the others.

We have tried parameters # iters = 1, 5, 10, 20 (online learners only), and c/r = 100.0, 50.0, ..., 0.000005, 0.000001; the table lists the model that achieved the best accuracy on the development set. Due to intricate parameters related to learning rate in SGD, we have only tried l= 1.0, 0.5, 0.2, .., 0.02, 0.01 for Vowpal Wabbit. The gettimeofday command is used to measure the run-time (namely, the whole training or testing time).

Dependency Parsing (DEP in [3], opal/test/tl.*, # training / test examples = 296,776 / 113,716)

Linear training
SoftwareAlgorithmd#
iters
c (r), lAcc.
(%)
Mem
[MiB]
Time
[s]
Time (test)
[s]
#
SVs
opalP [-l 0 -as]-10-90.7661.51.60.18-
opalPA-I [-l 2]-200.00500090.9057.52.40.18-
opalPA-I [-l 2 -b 1 -s]-200.00500090.9010.92.40.18-
opalPA-I [-l 2 -as]-200.01000091.2161.52.70.18-
ollAP-5-90.82126.817.36.08-
ollPA-I-200.00500091.06125.518.36.07-
ollCW-10.00005088.83126.817.26.13-
AROW++AROW-550.00000090.41186.716.95.33-
libarowAROW-10100.00000090.48149.24.9-
Vowpal WabbitSGD-200.50000091.0121.4161.64.11-
SVMlinL2-SVM-MFN [-A 1]-80.10000090.94132.3126.31.24-
LIBLINEARL1-reg L2-SVM [-s 5]-190.10000090.84219.14.50.65-
LIBLINEARL2-reg L1-SVM (dual) [-s 3]-2850.50000091.10114.412.70.6877700
Kernel-based training (polynomial kernel)
SoftwareAlgorithmd#
iters
cAcc.
(%)
Mem
[MiB]
Time
[s]
Time (test)
[s]
#
SVs
opal (+ pecco)P [-l 0 -N 2000 -a]25-92.94146.07.41.26 (0.51)59427
opal (+ pecco)PA-I [-l 2 -N 4000]2200.00010093.04162.628.31.46 (0.51)94813
opal (+ pecco)PA-I [-l 2 -N 2000 -a]2200.00050093.20152.927.91.46 (0.51)83816
LIBLINEAR-POLY2L2-reg L1-SVM (dual) [-s 3]2660.00500093.2416013.8422.8310.2766056
TinySVM (+ pecco)SVM24388470.00500093.24246.430040.31739.92 (0.68)65158
LIBSVM (+ pecco)SVM32619160.00500093.24266.723302.21694.98 (0.68)65104
opal (+ pecco)P [-l 0 -N 250 -as]35-93.04150.920.12.75 (1.38)57295
opal (+ pecco)PA-I [-l 2 -N 250 -s]3200.00000593.34161.6119.33.73 (1.67)89578
opal (+ pecco)PA-I [-l 2 -N 250 -as]3200.00001093.36172.4111.43.64 (1.61)85870
TinySVM (+ pecco)SVM33062910.00010093.38244.826048.61896.05 (1.94)68530
LIBSVM (+ pecco)SVM32027410.00010093.38266.122457.01785.52 (1.97)68499

NOTE: the above accuracy is computed against the test examples, and is not directly comparable to those in [3], which shows dependency accuracy.

Hyponymy Relation Identification (REL in [3], opal/test/wiki.*, # training / test examples = 201,664 / 10,000)

Linear training
SoftwareAlgorithmd#
iters
c (r), lAcc.
(%)
Mem
[MiB]
Time
[s]
Time (test)
[s]
#
SVs
opalP [-l 0 -as]-10-91.7439.01.10.04-
opalPA-I [-l 2 -s]-200.010092.0837.11.60.04-
opalPA-I [-l 2 -s -b 1]-200.010092.085.01.60.04-
opalPA-I [-l 2 -as]-200.050091.7439.01.80.04-
ollAP-20-91.2040.57.50.34-
ollPA-I-200.010091.8939.47.50.35-
ollCW-10.500091.0240.66.60.35-
AROW++AROW-2050.000091.6059.99.30.30-
libarowAROW-1050.000091.6268.42.0-
Vowpal WabbitSGD-201.000091.6521.8110.80.38-
SVMlinL2-SVM-MFN [-A 1]-110.500091.8461.8104.20.17-
LIBLINEARL2-reg L2-SVM (dual) [-s 1]-210.100091.7249.72.80.25110084
LIBLINEARL2-reg L2-SVM (primal) [-s 2]-400.500091.7665.833.30.33-
Kernel-based training (polynomial kernel)
SoftwareAlgorithmd#
iters
cAcc.
(%)
Mem
[MiB]
Time
[s]
Time (test)
[s]
#
SVs
opal (+ pecco)P [-a -N 500 -as]210-92.78102.16.70.53 (0.14)42261
opal (+ pecco)PA-I [-l 2 -N 2000 -s]2200.001092.86153.717.10.92 (0.18)81441
opal (+ pecco)PA-I [-l 2 -N 2000 -as]2200.005092.86151.712.90.80 (0.17)67918
TinySVM (+ pecco)SVM22580810.010092.85139.812378.359.60 (0.20)77151
LIBSVM (+ pecco)SVM32059930.010092.85175.016695.3110.14 (0.20)77240
opal (+ pecco)P [-l 0 -N 250 -as]310-92.89120.47.30.63 (0.21)35902
opal (+ pecco)PA-I [-l 2 -N 250 -s]3200.000592.95156.313.90.83 (0.27)56768
opal (+ pecco)PA-I [-l -N 250 -as]3200.000593.05156.714.20.88 (0.28)56768
TinySVM (+ pecco)SVM33517580.010093.09140.817403.379.88 (0.35)91873
LIBSVM (+ pecco)SVM32398810.010093.09174.718082.2129.64 (0.35)92009

NOTE: LIBLINEAR-POLY2 did not work in this dataset (it should require ~400G memory space).

Disclaimer

We do not guarantee that the implemented algorithms other than those proposed by us are patent-free; we regarded them to be patent-free simply because their implementations are available as (existing) open-source softwares (otherwise a simple patent look-up). Please be careful when you use this software for commercial use.

How to pronounce `opal'?

Opal was originally meant for online passive-agressive algorithms; the efficiency attributes to obsolete partial-margins in the active lifespan.

References

  1. Y. Freund and R. E. Schapire. Large Margin Classification using the Perceptron Algorithm. Machine Learning 37(3):277--296, 1999.
  2. K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online Passive-Aggressive Algorithms. JMLR 7(Mar):551--585. 2006.
  3. N. Yoshinaga and M. Kitsuregawa. Kernel Slicing: Scalable Online Training with Conjunctive Features. Proc. COLING (oral), pp. 1245--1253. 2010.
  4. Y. Goldberg and M. Elhadad. splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications. Proc. ACL, short papers, pp. 237--240. 2008.
  5. S. Yata and M. Tamura and K. Morita and M. Fuketa and J. Aoe. Sequential Insertions and Performance Evaluations for Double-arrays. Proc. the 71st National Convention of IPSJ, pp. 1263--1264. 2009.
  6. N. Yoshinaga and M. Kitsuregawa. Polynomial to Linear: Efficient Classification with Conjunctive Features. Proc. EMNLP, pp. 1542--1551. 2009.
  7. R. McDonald, K. Hall, and G. Mann. Distributed Training Strategies for the Structured Perceptron. Proc. NAACL, pp. 456--464. 2010.
  8. K. Crammer and Y. Singer. Ultraconservative Online Algorithms for Multiclass Problems. JMLR 3(Jan):951--991, 2003.
  9. S. Matsushima, N. Shimizu, K. Yoshida, T. Ninomiya, H. Nakagawa. Exact Passive-Aggressive Algorithm for Multiclass Classification using Support Class. Proc. SDM, pp. 301--314. 2010.
  10. G. S. Manku and R. Motwani, Approximate Frequency Counts over Data Streams, Proc. VLDB, pp. 346--357. 2002.

Copyright © 2010 Naoki Yoshinaga, All right Reserved.
XHTML 1.1 $ Last modified at Tue May 15 01:39:08 2012 $