Opal is a C++ header library of online learning [1,2] with kernel slicing [3], which enables large-scale training with binary conjunctive features (polynomial kernel). The kernel slicing generalizes kernel splitting [4] in a feature-wise manner to combine the merits of linear and kernel-based training. To improve the scalability of the training with redundant data (in NLP), the kernel slicing reuses the partial margins computed in past rounds and truncates margin computations that do not contribute to parameter updates. The resulting learner is faster than a vanilla kernel-based online learner (w/ inverted indexing technique) by a factor of 30-250, while retaining its space-efficiency. Opal supports both linear and kernel-based training (polynomial kernel) with binary features; the models trained with the polynomial kernel are saved in the SVMlight-like format.
Main features of opal are summarized as follows:
[-t 1] based on kernel slicing [-k]: currently, perceptron [P] [1] [-t 0], passive-aggressive algorithms [PA, PA-I, PA-II] [2] [-t 1,2,3] (opal), and their multi-class extensions [8,9] (opal-multi) are implemented. You can easily add your own additive online learning algorithms (need to modify 4 lines at the shortest).[-k]): in testing, opal builds fstrie online; it requires memory space (sub-)linear to the test data size, though ;-) If you want to have the memory footprint fixed, do not enable kenrel slicing ([-k]). Opal also avoids computation irrelevant to determine the label ([-p]). If you want to obtain an exact margin value, do not set [-p]. Generally, we recommend pecco [6] for testing; faster and more space-efficient than opal (in terms of testing) thanks to the static double array library.[-t 0]: significantly faster than oll, LIBLINEAR, Vowpal Wabbit or other major open-source linear learners/classifiers.[-m]: you can use the model that has been trained w/ opal as an initial model.[-a] and shuffling [-s]: both stabilize the model accuracy; averaged PA trains a stable model with a larger c value (aggressive update), which requires a smaller number of iterations than PA to obtain the same accuracy.[-b]: buffer event on memory (time-efficient) [-b 0], buffer event on disk (space-efficient) [-b 1], or no buffering (slowest but no resource needed) [-b 2].std::tr1::unordered_map by a factor of 2-3, when the data access is skewed.The following table summarizes the features of opal variants and pecco in training/testing.
| features | opal [-k -p] | opal [-k] | opal [-p] | opal | opalk | pecco | |
|---|---|---|---|---|---|---|---|
| training | kernel expansion/splitting | yes | yes | yes | yes | no | n/a |
| kernel slicing | yes | yes | no | no | no | n/a | |
| truncated margins | yes | no | yes | no | no | n/a | |
| testing | kernel expansion/splitting | yes | yes | yes | yes | no | yes |
| kernel slicing (or fstrie) | yes | yes | no | no | no | yes | |
| adaptive classification | yes | yes | no | no | no | no | |
| truncated margins | yes | no | yes | no | no | no |
The libraries are distributed under the GNU General Public License (Version 2) and the GNU Lesser General Public License (Version 2.1) (e-mail me when you prefer some license other than GNU GPL/LGPL). Note that this license doesn't apply to the training/dev/test data in test/ directory, which are generated from the annotated data by the way described in [3,6].
Please keep in mind that opal is not a versatile learner; the efficiency is based on the assumption of binary feature space with skewed (or sparse) and redundant examples (typically, NLP).
> wget http://www.tkl.iis.u-tokyo.ac.jp/~ynaga/opal/opal-latest.tar.bz2 > tar jxvf opal-latest.tar.bz2 > cd opal-YYYY-MM-DD > configure > make install
Alternatively, Mac OS X users can exploit macports to install opal (renamed as opal-ml to avoid package name confliction; again special thanks to @hjym_u); when you train a binary classifier, you may want to use opal instead of opal-multiclass (opal-multiclass is slow compared to opal).
See below for usage; In [3], PA-I SL, PA-I SL*, PA-I SP and PA-I KERNEL refer to opal [-k -p], opal [-p], opal, and opalk, respectively.
[-k] and terminating marging computations [-]N [# features expanded].command-line options
Copyright (c) 2009-2012 Naoki Yoshinaga, All rights reserved.
Usage: src/opal-multi [options] train model test
train training file set '-' to skip training
model model file set '-' to training/test w/o saving a mode
test test file set '-' to skip testing
Optional parameters in training and testing:
-t, --kernel=TYPE select type of kernel function
* 0 - linear (w^T * x)
1 - polynomial (s^T * x + 1)^d
-d, --kernel-degree=INT parameter d in polynomial kernel (0)
-N, --kernel-splitN=INT parameter N in kernel splitting (0)
(a learner [-N 0] will automatically calibrates N
to keep the weight trie within [-T] MiB bytes)
-T, --max-trie-size=INT max. RAM size (MiB) for feature weight trie (32)
(ignored if kernel-splitN [-N] is specified)
Optional parameters in training:
-m, --model0=FILE re-train a model trained w/ opal ('-')
-l, --learner=TYPE select learning algorithm
0 - Perceptron
1 - Passive Aggressive (PA)
* 2 - Passive Aggressive I (PA-I)
3 - Passive Aggressive II (PA-II)
-c, --reg-cost=FLOAT PA-I/-II aggressiveness parameter C (1.0)
-i, --iteration=INT # iterations (10)
-a, --averaging average parameters
-s, --shuffling shuffle training examples on RAM
-b, --buffer=TYPE select type of buffer to read examples
* 0 - Use RAM to load examples
1 - Use DISK to cache examples
2 - Do not cache examples
-M, --max-examples=INT max. # examples used in training (0: all)
-k, --kernel-slicing perform kernel slicing
-p, --pruning-margin terminate margin computation if unnecessarily
-C, --max-classes=INT max. # classes (needed for [-b 2])
Optional parameters in testing:
-O, --output=TYPE select output type of testing
* 0 - report accuracy
1 - report accuracy per iteration
2 - output margins
Misc.:
-h, --help show this help and exit
file format for train / test
(BNF-like notation) <class> := +1 | -1 (or any string in multi-class classification) <feature> := integer // >=1 <value> := 1 // value is ignored <line> := <class> <feature>:<value> <feature>:<value> ... <feature>:<value>
See opal/test/*.{train,dev,test}. As in SVMlight, feature/value pairs must be ordered by increasing feature number (values will be ignored and assumed to be 1, though).
example
# linear training w/ Averaged Perceptron [-l 0, -a] with shuffling examples > opal -t 0 -l 0 -c 1.0 -i 5 -as train model - # linear training and testing w/ Passive Aggressive-I w/o saving model > opal -t 0 -c 0.005 -i 20 train - test # kernel-based training and testing (polynomial kernel of d=2) with Passive Aggressive-I # (event on Disk; shuffling ignored) > opal -t 1 -d 2 -i 20 -c 0.0005 -a -b 1 train model test # kernel-based testing # (only combinations among 500 common features are explicitly considered) > opal -N 500 - model test
NOTE:
model='-') with PA-I [-l 2] or PA-II [-l 3] and polynomial kernel [-t 1], the reported accuracy can be slightly different from the one obtained with non-instant mode; this is due to rounding errors.[-m model0]) with averaging ignores #examples processed in a prior training (e.g., with [-a], a model trained with [-i 20] will be different from a model retrained with [-i 10] from a model trained with [-i 10]).The following tables show a comparison among linear learners (opal; ver. in 2011, oll 0.2, AROW++ 0.05, libarow 0.2.0, Vowpal Wabbit 4.1, SVMlin 1.0 and LIBLINEAR 1.7) and a comparison among kernel-based learners (opal, LIBLINEAR-POLY2 1.7 ([-d 2] only) and TinySVM 0.09), and LIBSVM 3.0: for LIBLINEAR, results with the fastest algorithm and the algorithm that obtained the most accurate model are shown. pecco (FST [-t 1]) is also used to test the models obtained with opal ([-d 2,3]; fstrie is build from the training data). The experiments were conducted on Mac OS X 10.5 over Intel Xeon E5462 3.2Ghz CPU with 32GB main memory. Only oll uses 4-byte float as floating-point numbers (others use 8-byte double). AROW++ and libarow cannot output margins of examples (others (can) do). opal, LIBLINEAR-POLY2 (modified to handle a large number of conjunctive features; int->int64_t) and TinySVM (modified for the experiments in [3]) and LIBSVM (add -m64, to make speed and memory comparable to 64-bit TinySVM) adopt 64-bit addressing, while others adopt 32bit addressing. These differences may cause 10-20% difference in time, and at most 100% difference in space (32bit v.s. 64bit). opal with [-b 1] and Vowpal Wabbit do not load events on memory; this leads to slower but memory-efficient training (demand additional Disk space for caching data, though); in the experiments, examples for these learners were offline shuffled to make the accuracy of the obtained models comparable to the others.
We have tried parameters # iters = 1, 5, 10, 20 (online learners only), and c/r = 100.0, 50.0, ..., 0.000005, 0.000001; the table lists the model that achieved the best accuracy on the development set. Due to intricate parameters related to learning rate in SGD, we have only tried l= 1.0, 0.5, 0.2, .., 0.02, 0.01 for Vowpal Wabbit. The gettimeofday command is used to measure the run-time (namely, the whole training or testing time).
| Linear training | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Software | Algorithm | d | # iters | c (r), l | Acc. (%) | Mem [MiB] | Time [s] | Time (test) [s] | # SVs |
| opal | P [-l 0 -as] | - | 10 | - | 90.76 | 61.5 | 1.6 | 0.18 | - |
| opal | PA-I [-l 2] | - | 20 | 0.005000 | 90.90 | 57.5 | 2.4 | 0.18 | - |
| opal | PA-I [-l 2 -b 1 -s] | - | 20 | 0.005000 | 90.90 | 10.9 | 2.4 | 0.18 | - |
| opal | PA-I [-l 2 -as] | - | 20 | 0.010000 | 91.21 | 61.5 | 2.7 | 0.18 | - |
| oll | AP | - | 5 | - | 90.82 | 126.8 | 17.3 | 6.08 | - |
| oll | PA-I | - | 20 | 0.005000 | 91.06 | 125.5 | 18.3 | 6.07 | - |
| oll | CW | - | 1 | 0.000050 | 88.83 | 126.8 | 17.2 | 6.13 | - |
| AROW++ | AROW | - | 5 | 50.000000 | 90.41 | 186.7 | 16.9 | 5.33 | - |
| libarow | AROW | - | 10 | 100.000000 | 90.48 | 149.2 | 4.9 | - | |
| Vowpal Wabbit | SGD | - | 20 | 0.500000 | 91.01 | 21.4 | 161.6 | 4.11 | - |
| SVMlin | L2-SVM-MFN [-A 1] | - | 8 | 0.100000 | 90.94 | 132.3 | 126.3 | 1.24 | - |
| LIBLINEAR | L1-reg L2-SVM [-s 5] | - | 19 | 0.100000 | 90.84 | 219.1 | 4.5 | 0.65 | - |
| LIBLINEAR | L2-reg L1-SVM (dual) [-s 3] | - | 285 | 0.500000 | 91.10 | 114.4 | 12.7 | 0.68 | 77700 |
| Kernel-based training (polynomial kernel) | |||||||||
| Software | Algorithm | d | # iters | c | Acc. (%) | Mem [MiB] | Time [s] | Time (test) [s] | # SVs |
| opal (+ pecco) | P [-l 0 -N 2000 -a] | 2 | 5 | - | 92.94 | 146.0 | 7.4 | 1.26 (0.51) | 59427 |
| opal (+ pecco) | PA-I [-l 2 -N 4000] | 2 | 20 | 0.000100 | 93.04 | 162.6 | 28.3 | 1.46 (0.51) | 94813 |
| opal (+ pecco) | PA-I [-l 2 -N 2000 -a] | 2 | 20 | 0.000500 | 93.20 | 152.9 | 27.9 | 1.46 (0.51) | 83816 |
| LIBLINEAR-POLY2 | L2-reg L1-SVM (dual) [-s 3] | 2 | 66 | 0.005000 | 93.24 | 16013.8 | 422.8 | 310.27 | 66056 |
| TinySVM (+ pecco) | SVM | 2 | 438847 | 0.005000 | 93.24 | 246.4 | 30040.3 | 1739.92 (0.68) | 65158 |
| LIBSVM (+ pecco) | SVM | 3 | 261916 | 0.005000 | 93.24 | 266.7 | 23302.2 | 1694.98 (0.68) | 65104 |
| opal (+ pecco) | P [-l 0 -N 250 -as] | 3 | 5 | - | 93.04 | 150.9 | 20.1 | 2.75 (1.38) | 57295 |
| opal (+ pecco) | PA-I [-l 2 -N 250 -s] | 3 | 20 | 0.000005 | 93.34 | 161.6 | 119.3 | 3.73 (1.67) | 89578 |
| opal (+ pecco) | PA-I [-l 2 -N 250 -as] | 3 | 20 | 0.000010 | 93.36 | 172.4 | 111.4 | 3.64 (1.61) | 85870 |
| TinySVM (+ pecco) | SVM | 3 | 306291 | 0.000100 | 93.38 | 244.8 | 26048.6 | 1896.05 (1.94) | 68530 |
| LIBSVM (+ pecco) | SVM | 3 | 202741 | 0.000100 | 93.38 | 266.1 | 22457.0 | 1785.52 (1.97) | 68499 |
NOTE: the above accuracy is computed against the test examples, and is not directly comparable to those in [3], which shows dependency accuracy.
| Linear training | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Software | Algorithm | d | # iters | c (r), l | Acc. (%) | Mem [MiB] | Time [s] | Time (test) [s] | # SVs |
| opal | P [-l 0 -as] | - | 10 | - | 91.74 | 39.0 | 1.1 | 0.04 | - |
| opal | PA-I [-l 2 -s] | - | 20 | 0.0100 | 92.08 | 37.1 | 1.6 | 0.04 | - |
| opal | PA-I [-l 2 -s -b 1] | - | 20 | 0.0100 | 92.08 | 5.0 | 1.6 | 0.04 | - |
| opal | PA-I [-l 2 -as] | - | 20 | 0.0500 | 91.74 | 39.0 | 1.8 | 0.04 | - |
| oll | AP | - | 20 | - | 91.20 | 40.5 | 7.5 | 0.34 | - |
| oll | PA-I | - | 20 | 0.0100 | 91.89 | 39.4 | 7.5 | 0.35 | - |
| oll | CW | - | 1 | 0.5000 | 91.02 | 40.6 | 6.6 | 0.35 | - |
| AROW++ | AROW | - | 20 | 50.0000 | 91.60 | 59.9 | 9.3 | 0.30 | - |
| libarow | AROW | - | 10 | 50.0000 | 91.62 | 68.4 | 2.0 | - | |
| Vowpal Wabbit | SGD | - | 20 | 1.0000 | 91.65 | 21.8 | 110.8 | 0.38 | - |
| SVMlin | L2-SVM-MFN [-A 1] | - | 11 | 0.5000 | 91.84 | 61.8 | 104.2 | 0.17 | - |
| LIBLINEAR | L2-reg L2-SVM (dual) [-s 1] | - | 21 | 0.1000 | 91.72 | 49.7 | 2.8 | 0.25 | 110084 |
| LIBLINEAR | L2-reg L2-SVM (primal) [-s 2] | - | 40 | 0.5000 | 91.76 | 65.8 | 33.3 | 0.33 | - |
| Kernel-based training (polynomial kernel) | |||||||||
| Software | Algorithm | d | # iters | c | Acc. (%) | Mem [MiB] | Time [s] | Time (test) [s] | # SVs |
| opal (+ pecco) | P [-a -N 500 -as] | 2 | 10 | - | 92.78 | 102.1 | 6.7 | 0.53 (0.14) | 42261 |
| opal (+ pecco) | PA-I [-l 2 -N 2000 -s] | 2 | 20 | 0.0010 | 92.86 | 153.7 | 17.1 | 0.92 (0.18) | 81441 |
| opal (+ pecco) | PA-I [-l 2 -N 2000 -as] | 2 | 20 | 0.0050 | 92.86 | 151.7 | 12.9 | 0.80 (0.17) | 67918 |
| TinySVM (+ pecco) | SVM | 2 | 258081 | 0.0100 | 92.85 | 139.8 | 12378.3 | 59.60 (0.20) | 77151 |
| LIBSVM (+ pecco) | SVM | 3 | 205993 | 0.0100 | 92.85 | 175.0 | 16695.3 | 110.14 (0.20) | 77240 |
| opal (+ pecco) | P [-l 0 -N 250 -as] | 3 | 10 | - | 92.89 | 120.4 | 7.3 | 0.63 (0.21) | 35902 |
| opal (+ pecco) | PA-I [-l 2 -N 250 -s] | 3 | 20 | 0.0005 | 92.95 | 156.3 | 13.9 | 0.83 (0.27) | 56768 |
| opal (+ pecco) | PA-I [-l -N 250 -as] | 3 | 20 | 0.0005 | 93.05 | 156.7 | 14.2 | 0.88 (0.28) | 56768 |
| TinySVM (+ pecco) | SVM | 3 | 351758 | 0.0100 | 93.09 | 140.8 | 17403.3 | 79.88 (0.35) | 91873 |
| LIBSVM (+ pecco) | SVM | 3 | 239881 | 0.0100 | 93.09 | 174.7 | 18082.2 | 129.64 (0.35) | 92009 |
NOTE: LIBLINEAR-POLY2 did not work in this dataset (it should require ~400G memory space).
We do not guarantee that the implemented algorithms other than those proposed by us are patent-free; we regarded them to be patent-free simply because their implementations are available as (existing) open-source softwares (otherwise a simple patent look-up). Please be careful when you use this software for commercial use.
Opal was originally meant for online passive-agressive algorithms; the efficiency attributes to obsolete partial-margins in the active lifespan.