opal - C++ implementation of online learning with kernel slicing

developed by Naoki Yoshinaga at Yoshinaga Lab., IIS, University of Tokyo

About

Opal is a C++ library of online learning with kernel slicing [1], which enables large-scale training with binary conjunctive features (polynomial kernel). Opal is fast because it reuses temporal partial margins computed in past rounds and truncates margin computations that do not contribute to parameter updates.

Opal is x30-x250 faster than a kernel-based online learner, while retaining its space-efficiency in several NLP tasks. It supports training of linear and kernel-based model (polynomial kernel) with binary features.

If you make use of opal for research or commercial purposes, the reference will be:

N. Yoshinaga and M. Kitsuregawa. Kernel Slicing: Scalable Online Training with Conjunctive Features. Proc. COLING 2010 (oral), pp. 1245--1253. 2010.

Features

Fast training of a kernel-based model [-t 1] based on kernel slicing [-kp]: currently, Perceptron [2] [-l 0], passive-aggressive algorithms [PA, PA-I, PA-II] [3] [-l 1,2,3], and their multi-class extensions [4,5] are implemented.
Fast training of a linear model [-t 0]: usually faster than LIBLINEAR, Vowpal Wabbit, oll or other major OSS.
Adaptive testing ([-k]; optional): though it takes additional memory space (sub-)linear to the test data size, testing with opal (with [-k] option) becomes faster as it processes more data (I recommend pecco for testing a kernel-based model).
Probability output [-O 4]: Platt's sigmoid fitting is used to convert a margin to a probability (binary classifier only) [6]. Remember to set [-P] to perform the fitting.
Retraining [-m]: you can use the model that has been trained w/ opal as an initial model.
Instant training & testing [set model to '-'] performs training/testing w/o saving a model as libarow does.
Various example buffering [-b 1,2,3]: you can buffer training example on memory (time-efficient) [-b 0], buffer event on disk (space-efficient) [-b 1], or no buffering (slowest but no resource needed) [-b 2].
Averaging [-a] & shuffling [-s] stabilize the model accuracy; averaged PA trains a model with a larger c value (aggressive update), which requires a smaller number of iterations than PA to obtain the same accuracy.
String feature support: configure --enable-string-feature will support string features.
Dumping feature weights: opal[-multi] - model - will dump weights of features; for a kernel model, it will mine effective conjunctive features.
C++ library header of an online double-array builder supports Darts-compatible APIs; implement fast random insertion algorithm [7]; insertion/retrieval is x2-3 faster than std::unordered_map, when the data access is skewed; this library is now released as cedar.
SWIG-based Perl/Python/Ruby/Lua bindings: you can use opal from your favorite script languages, although it may undermine the virtue of opal (efficiency); follow instructions under swig/.

The following table summarizes the features of opal variants and pecco in training/testing.

	features	opal `[-kp]`	opal `[-k]`	opal `[-p]`	opal	opalk	pecco
training	kernel expansion/splitting	yes	yes	yes	yes	no	n/a
	kernel slicing	yes	yes	no	no	no	n/a
	truncated margins	yes	no	yes	no	no	n/a
testing	kernel expansion/splitting	yes	yes	yes	yes	no	yes
	kernel slicing (or fstrie)	yes	yes	no	no	no	yes
	adaptive classification	yes	yes	no	no	no	no
	truncated margins	yes	no	yes	no	no	yes

License (except test directory): GNU GPLv2, LGPLv2.1, and BSD; or e-mail me for other licenses you want.

NOTE: The distribution includes in test directory the labeled examples for a transition-based dependency parser and hyponymy relation identification, all of which are used in the paper.

Download & Setup

> wget http://www.tkl.iis.u-tokyo.ac.jp/~ynaga/opal/opal-latest.tar.gz
> tar zxvf opal-latest.tar.gz
> cd opal-YYYY-MM-DD
> configure
> make install

For Mac OS X users: try port opal-ml via MacPorts (special thanks to @hjym_u).

In the paper, PA-I SL, PA-I SL^*, PA-I SP and PA-I KERNEL refer to opal [-k -p], opal [-p], opal, and opalk, respectively.

Requirements

OS: 32/64-bit UNIX-compatible OS (tested on Linux / Mac OS X)
Compiler: tested with GNU gcc (≥ 4.0) or clang (≥ 2.9)

ToDo

Support other online learning algorithms (e.g., averaged SGD or Soft Confidence-Weighted).
Support reading/writing a compiled model for quick start-up in testing.

History

March 18th 2022 (xz) (development; minor bug/typo fixes and docs only):
- Support Python3 instead of Python in SWIG bindings
October 5th 2015 (xz):
- Change License from GNU GPL/LGPL to GNU GPL/LGPL and 2-clause BSD license.
February 8th, 2015 (xz):
- Install rand_shuf, intended for users to shuffle training data on disk prior to training from them [-b 1,2].
- Allow to disable an array-based pseudo trie (configure --disable-array-trie).
- Fix a compilation error with configure --enable-float
- Fix a compilation error under Ubuntu 14.04 (thanks: Dr. Wenliang).
December 25th, 2013 (xz):
- Package with the latest version of cedar; the reduced trie significantly reduces time and space needed in trainig/testing when --kernel-splitN is set to large.
- Add getLabel () for a multiclass classifier.
- Support a multi-class classifer and string features in SWIG-based Perl/Python/Ruby/Lua bindings.
- Fix a bug in SWIG-based Perl/Python/Ruby/Lua bindings.
- Fix a configuration error in clang (Mac OS X Mavericks).
- Note: A model trained by a previous verion of opal with string feature support (--enable-string-feature) is not compatible with this version (thanks: Dr. Shinzato). This is because a double-array trie library used to store string features has been updated not to accept empty strings. To make a model trained with the previous version compatible to this version, run 'sed -i.bak -e 's/^$/ /' model.features'.
April 23rd, 2013 (xz):
- Support parallel training [8] for linear training [-n] (configure --enable-openmp).
- Support raw string features (configure --enable-string-feature).
- Support dumping feature weights (bias with polynomial kernel is outputted as `f_0').
- Add an option to output test examples with predicted labels ([-o] instead of [-O]).
- Fix a compilation error with configure --with-trie-impl=map.
- Support comments in training/test data (lines starting with '#' will be skipped).
- Omit less important short options.
- Disable truncating margin computations using partial margins (to enable, configure --enable-temporal-pruning).
January 12th, 2013 (xz):
- x1.1-1.2 speed up in a lower-order polynomial kernel ([-d 1,2]).
- Export cedar.h as independent header library.
- Fix a bug in saving a multi-class models (in the last version).
- Fix a bug in setting [-C].
December 12th, 2012 (xz):
- x1.4 speed up in linear training [-t 0] by implementing raw strtol to read feature vectors.
- Support feature thresholding [-f].
- Remove most of the paddings and static variables (verified with clang++ -Weverything).
- Update swig files (incl. a workaround for SSE4.2 popcnt).
October 30th, 2012 (xz):
- x1.5 speed up with small [-N] (esp. [-t 1 -d 3]), by using popcnt (implemented by SSE4.2 or look-up table) to compute inner products.
- x1.1-1.2 speed up (esp. [-t 1] and [-t 0 -s]) and reduce memory footprint by revisiting the implementation.
- Fix a bug in USE_MAP_TRIE and USE_HASH_TRIE.
October 6th, 2012 (xz):
- Perform sigmoid fitting [-P] to support probability output (binary classifier only).
- Revisit option [-O] to output labels/classifier_outputs/probabilities.
- Disable pruning [-p] in testing if needed exact margin [-O 3,4]
- Fix a bug that causes a crash in loading a multi-class linear model.
September 26th, 2012 (xz):
- Automatically adjust [-c] if 0 is set to it (experimental).
- Resolve tiny memory leak in [-t 0]
August 31st, 2012 (xz):
- Slightly speed up training by truncating margin computations using partial margins [-k -p].
- Revisit partial margin management (guarantee amortized complexity of O(1)).
- Support 32-bit cygwin environments (very experimental; RTDSC timer doesn't work).
- Oops, I forgot to include test data and swig wrappers in the last version.
May 15th, 2012:
- Support GNU autoconf/automake.
- Change License from GNU GPL to GNU GPL/LGPL.
- Add option to enable kernel slicing [-k] and terminating margin computations [-p]
- Speed up the training/testing with multi-class classifier.
- Resolve memory leak in cedar.h.
December 21st, 2011:
- Rewrite interface with GNU Getopt.
- x2-3 speed up using an array-based pseudo trie.
- Support multi-class classification using multi-class perceptron [4] and support-class passive aggressive algorithms [5].
- Support automatic shrinking of parameter N [# features expanded].
- Support automatic model identification.
- Suppress rounding errors by reducing the number of multiplications.
August 29th, 2011:
- Add SWIG-based Perl/Python/Ruby/Lua bindings (experimental).
- Rename dda.h to cedar.h.
- Remove intricacy in Makefile (you can make install).
- Cache std::strtol() in a double array to speed up reading examples.
March 27th, 2011:
- Cache test examples as well to speed up per-iteration testing.
- Support per-iteration testing w/ averaging (no side effect).
- Add event=Null not to cache training examples (poor man's training).
- Adopt double-precision floating point for conjunctive feature weight to suppress rounding errors (slightly more training time and memory footprint).
January 21st, 2011:
- Fix a (tiny) bug in PA-II and USE_HASH_TRIE.
- Add USE_MAP_TRIE option.
October 28th, 2010:
- Fix a (tiny) bug in averaging (update on the last example at the last iteration had been wrongly ignored).
September 16th, 2010:
- Improve the performance with Disk mode; 1) caching examples into disk and 2) on-disk shuffling (≠ std::random_shuffle(), though).
- Optimize the option order.
- Resolve memory leak in dda.h.
- Fix a bug in retraining with polynomial kernel.
August 14th, 2010:
- Fix a bug in linear retraining with averaging.
- Revisit debug option (=1: output margins of examples in testing; =2: report model accuracy on test data at each iteration; debug=2 with development data will greatly help you to optimize # iterations (iter), but do not tune it on the test-set).
August 12th, 2010:
- Replace random() with XorShift RNG (also set as default RNG instead of TR1 mt19937) for shuffling examples (results updated).
- Fix a bug in feature mapping.
August 1st, 2010:
- Make dda.h compatible to darts.
- Improve the header portability by adding hash-based trie implementation to pa.h.
- Clean up codes & fix a bug in counting # features and retraining with kernel.
July 27th, 2010:
- Optimize the option order and support instant mode (w/o saving a model; like libarow).
July 23rd, 2010:
- Revisit debug option.
July 21st, 2010:
- initial release.

Usage

command-line options

Typing ./opal --help shows the following usage information; test/ includes training and testing data used in the paper [1].

opal - online learning with kernel slicing
Copyright (c) 2009-2013 Naoki Yoshinaga, All rights reserved.

Usage: src/opal [options] train model test

train   training file       set '-' to skip training
model   model file          set '-' to training/test w/o saving a model
test    test file           set '-' to skip testing

Optional parameters in training and testing:
  -t, --kernel=TYPE         select type of kernel function
                            * 0 - linear     (w^T * x)
                              1 - polynomial (s^T * x + 1)^d
  -d, --kernel-degree=INT   parameter d in polynomial kernel (0)
      --kernel-splitN=INT   # common features in kernel splitting (0)
      --max-trie-size=INT   adjust # common features to fit the feature trie
                            within RAM size (MiB) (32)
  -p, --pruning-margin      terminate margin computation if unnecessarily
  -k, --kernel-slicing      perform kernel slicing
  -O, --output=TYPE         select output type of testing
                             * 0 - report accuracy
                               1 - report accuracy per iteration
                               2 - labels
                               3 - labels and margins
                               4 - labels and probabilities
  -o, --output!=TYPE        output examples with labels/margins/probabilities

Optional parameters in training:
  -l, --learner=TYPE        select learning algorithm
                              0 - Perceptron
                              1 - Passive Aggressive    (PA)
                            * 2 - Passive Aggressive I  (PA-I)
                              3 - Passive Aggressive II (PA-II)
  -c, --reg-cost=FLOAT      PA-I/-II aggressiveness parameter C (1.0)
                            ([i * avg. k(x,x)]^-1 if C is set to 0)
  -i, --iteration=INT       # iterations (10)
  -a, --averaging           average parameters
  -s, --shuffling           shuffle training examples on RAM
  -b, --buffer=TYPE         select type of buffer to read examples
                            * 0 - Use RAM to load examples
                              1 - Use DISK to cache examples
                              2 - Do not cache examples
      --model0=FILE         re-train a model trained w/ opal ('-')
      --feat-threshold=INT  thresholding features by frequency (1)
  -M, --max-examples=INT    max. # examples used in training (0: all)
  -P, --probability-output  perform sigmoid fitting to output probabilities

Misc.:
  -h, --help               show this help and exit

file format for train / test

(BNF-like notation)
<class>   := +1 | -1 (or any string starting w/o '#' in multi-class classification)
<feature> := integer // >=1
<value>   := string (w/o ' ' and '\t') // value is ignored (always assumed to be 1)
<line>    := <class> <feature>:<value> <feature>:<value> ... <feature>:<value> | # ...

Refer to test/*.{train,dev,test}. As in SVM^light, feature/value pairs must be ordered by increasing feature number (values are ignored and assumed to be 1).

If you want to use a string feature, configure --enable-string-feature then you can use any string (without spaces and ':') as a feature (and you can omit a feature value). Namely,

(BNF-like notation)
<class>   := +1 | -1 (or any string starting w/o '#' in multi-class classification)
<feature> := string (w/o ' ', '\t' and ':')
<value>   := string (w/o ' ' and '\t') // value is ignored (always assumed to be 1)
<line>    := <class> <feature>[:<value>] <feature>[:<value>] ... <feature>[:<value>] | # ...

example

# linear training w/ Averaged Perceptron [-l 0, -a] with shuffling examples
> opal -t 0 -l 0 -c 1.0 -i 5 -as train model -

# linear training and testing w/ Passive Aggressive-I w/o saving model
> opal -t 0 -c 0.005 -i 20 train - test

# kernel-based training and testing (polynomial kernel of d=2) with Passive Aggressive-I
#  (event on Disk; shuffling ignored)
> opal -t 1 -d 2 -i 20 -c 0.0005 -a -b 1 train model test

# kernel-based testing
#  (only combinations among 500 common features are explicitly considered)
> opal --kernel-splitN 500 - model test

NOTE:

When you run opal as instant mode (model='-') with PA-I [-l 2] or PA-II [-l 3] and polynomial kernel [-t 1], the reported accuracy can be slightly different from the one obtained with non-instant mode; this is due to rounding errors.
The retraining ([-m model0]) with averaging ignores #examples processed in a prior training (e.g., with [-a], a model trained with [-i 20] will be different from a model retrained with [-i 10] from a model trained with [-i 10]).

Performance comparison

The following tables show a comparison among linear learners (opal; ver. in 2011, oll 0.2, AROW++ 0.05, libarow 0.2.0, Vowpal Wabbit 4.1, SVMlin 1.0 and LIBLINEAR 1.7) and a comparison among kernel-based learners (opal, LIBLINEAR-POLY2 1.7 ([-d 2] only) and TinySVM 0.09), and LIBSVM 3.0: for LIBLINEAR, results with the fastest algorithm and the algorithm that obtained the most accurate model are shown. pecco (FST [-t 1]) is also used to test the models obtained with opal ([-d 2,3]; fstrie is build from the training data). The experiments were conducted on Mac OS X 10.5 over Intel Xeon E5462 3.2Ghz CPU with 32GB main memory. Only oll uses 4-byte float as floating-point numbers (others use 8-byte double). AROW++ and libarow cannot output margins of examples (others (can) do). opal, LIBLINEAR-POLY2 (modified to handle a large number of conjunctive features; int->int64_t) and TinySVM (modified for the experiments in the paper) and LIBSVM (add -m64, to make speed and memory comparable to 64-bit TinySVM) adopt 64-bit addressing, while others adopt 32bit addressing. These differences may cause 10-20% difference in time, and at most 100% difference in space (32bit v.s. 64bit). opal with [-b 1] and Vowpal Wabbit do not load events on memory; this leads to slower but memory-efficient training (demand additional Disk space for caching data, though); in the experiments, examples for these learners were offline shuffled to make the accuracy of the obtained models comparable to the others.

We have tried parameters # iters = 1, 5, 10, 20 (online learners only), and c/r = 100.0, 50.0, ..., 0.000005, 0.000001; the table lists the model that achieved the best accuracy on the development set. Due to intricate parameters related to learning rate in SGD, we have only tried l= 1.0, 0.5, 0.2, .., 0.02, 0.01 for Vowpal Wabbit. The gettimeofday command is used to measure the run-time (namely, the whole training or testing time).

Dependency Parsing (DEP in the paper, opal/test/tl.*, # training / test examples = 296,776 / 113,716)

Linear training:

Software	Algorithm	d	# iters	c (r), l	Acc. (%)	Mem [MiB]	Training [s]	Test [s]	# SVs
opal	P [-l 0 -as]	-	10	-	90.76	49.9	1.1	0.10	-
opal	PA-I [-l 2 -s]	-	20	0.005000	90.90	46.1	1.7	0.10	-
opal	PA-I [-l 2 (-s) -b 1]	-	20	0.005000	90.90	8.6	2.2	0.10	-
opal	PA-I [-l 2 -as]	-	20	0.010000	91.21	49.9	2.0	0.10	-
oll	AP	-	5	-	90.82	126.8	17.3	6.08	-
oll	PA-I	-	20	0.005000	91.06	125.5	18.3	6.07	-
oll	CW	-	1	0.000050	88.83	126.8	17.2	6.13	-
AROW++	AROW	-	5	50.000000	90.41	186.7	16.9	5.33	-
libarow	AROW	-	10	100.000000	90.48	149.2	4.9		-
Vowpal Wabbit	SGD	-	20	0.500000	91.01	21.4	161.6	4.11	-
SVMlin	L2-SVM-MFN [-A 1]	-	8	0.100000	90.94	132.3	126.3	1.24	-
LIBLINEAR	L1-reg L2-SVM [-s 5]	-	19	0.100000	90.84	219.1	4.5	0.65	-
LIBLINEAR	L2-reg L1-SVM (dual) [-s 3]	-	285	0.500000	91.10	114.4	12.7	0.68	77700

Kernel-based training (polynomial kernel):

Software	Algorithm	d	# iters	c	Acc. (%)	Mem [MiB]	Training [s]	Test [s]	# SVs
opal (+ pecco)	P [-l 0 --kernel-splitN 2000 -p -as]	2	5	-	92.94	72.1	3.5	0.72 (0.32)	59427
opal (+ pecco)	PA-I [-l 2 --kernel-splitN 2000 -p -s]	2	20	0.000100	93.03	79.7	13.5	0.90 (0.35)	94820
opal (+ pecco)	PA-I [-l 2 --kernel-splitN 4000 -p -as]	2	20	0.000500	93.20	84.6	14.1	1.19 (0.35)	83867
LIBLINEAR-POLY2	L2-reg L1-SVM (dual) [-s 3]	2	66	0.005000	93.24	16013.8	422.8	310.27	66056
TinySVM (+ pecco)	SVM	2	438847	0.005000	93.24	246.4	30040.3	1739.92 (0.33)	65158
LIBSVM (+ pecco)	SVM	2	261916	0.005000	93.24	266.7	23302.2	1694.98 (0.33)	65104
opal (+ pecco)	P [-l 0 --kernel-splitN 250 -kp -as]	3	5	-	93.07	151.2	16.1	2.16 (1.17)	57295
opal (+ pecco)	PA-I [-l 2 --kernel-splitN 250 -kp -s]	3	20	0.000005	93.32	164.1	84.6	2.89 (1.29)	89651
opal (+ pecco)	PA-I [-l 2 --kernel-splitN 250 -kp -as]	3	20	0.000010	93.36	178.9	80.7	2.83 (1.26)	85863
TinySVM (+ pecco)	SVM	3	306291	0.000100	93.38	244.8	26048.6	1896.05 (1.19)	68530
LIBSVM (+ pecco)	SVM	3	202741	0.000100	93.38	266.1	22457.0	1785.52 (1.20)	68499

NOTE: the above accuracy is computed against the test examples, and is not directly comparable to those in the paper, which shows dependency accuracy.

Hyponymy Relation Identification (REL in the paper, opal/test/wiki.*, # training / test examples = 201,664 / 10,000)

Linear training:

Software	Algorithm	d	# iters	c (r), l	Acc. (%)	Mem [MiB]	Training [s]	Test [s]	# SVs
opal	P [-l 0 -as]	-	10	-	91.74	22.4	0.6	0.02	-
opal	PA-I [-l 2 -s]	-	20	0.0100	92.08	20.7	1.0	0.02	-
opal	PA-I [-l 2 (-s) -b 1]	-	20	0.0100	92.08	4.4	1.2	0.02	-
opal	PA-I [-l 2 -as]	-	20	0.0500	91.74	22.9	1.2	0.02	-
oll	AP	-	20	-	91.20	40.5	7.5	0.34	-
oll	PA-I	-	20	0.0100	91.89	39.4	7.5	0.35	-
oll	CW	-	1	0.5000	91.02	40.6	6.6	0.35	-
AROW++	AROW	-	20	50.0000	91.60	59.9	9.3	0.30	-
libarow	AROW	-	10	50.0000	91.62	68.4	2.0		-
Vowpal Wabbit	SGD	-	20	1.0000	91.65	21.8	110.8	0.38	-
SVMlin	L2-SVM-MFN [-A 1]	-	11	0.5000	91.84	61.8	104.2	0.17	-
LIBLINEAR	L2-reg L2-SVM (dual) [-s 1]	-	21	0.1000	91.72	49.7	2.8	0.25	110084
LIBLINEAR	L2-reg L2-SVM (primal) [-s 2]	-	40	0.5000	91.76	65.8	33.3	0.33	-

Kernel-based training (polynomial kernel):

Software	Algorithm	d	# iters	c	Acc. (%)	Mem [MiB]	Training [s]	Test [s]	# SVs
opal (+ pecco)	P [-l 0 --kernel-splitN 2000 -p -as]	2	10	-	92.78	56.5	4.0	0.30 (0.15)	42261
opal (+ pecco)	PA-I [-l 2 --kernel-splitN 2000 -p -s]	2	20	0.0010	92.89	64.6	14.8	0.53 (0.23)	81429
opal (+ pecco)	PA-I [-l 2 --kernel-splitN 2000 -p -as]	2	20	0.0050	92.93	62.2	11.9	0.46 (0.20)	67919
TinySVM (+ pecco)	SVM	2	258081	0.0100	92.85	139.8	12378.3	59.60 (0.22)	77151
LIBSVM (+ pecco)	SVM	2	205993	0.0100	92.85	175.0	16695.3	110.14 (0.22)	77240
opal (+ pecco)	P [-l 0 --kernel-splitN 125 -kp -as]	3	10	-	92.89	113.5	6.2	0.46 (0.29)	35902
opal (+ pecco)	PA-I [-l 2 --kernel-splitN 250 -kp -s]	3	20	0.0005	92.99	152.7	11.6	0.64 (0.35)	56772
opal (+ pecco)	PA-I [-l 2 --kernel-splitN 250 -kp -as]	3	20	0.0005	93.07	153.1	11.8	0.65 (0.36)	56772
TinySVM (+ pecco)	SVM	3	351758	0.0100	93.09	140.8	17403.3	79.88 (0.50)	91873
LIBSVM (+ pecco)	SVM	3	239881	0.0100	93.09	174.7	18082.2	129.64 (0.50)	92009

NOTE: LIBLINEAR-POLY2 did not work in this dataset (it should require ~400G memory space).

Tuning hyper-parameters

You may want to consider the following optimization strategy for tuning parameter [-c] and the number of iterations [-i] in the default learner PA-I [-l 1]. Keeping other options fixed as default (other than parameters mentioned below), first tune [-c] on development set, and then halve [-c] and double [-i] to conduct a more cautious optimization. Following this simple procedure, You can easily obtain a model whose accuracy is comparable to those trained with batch learning.

Averaging [-a] is must for Perceptron [-l 0] while optional for variants of PA [-l 1,2,3]; however, I still recommend to set [-a] in PA because averaging allows us to reduce the number of iterations needed to reach the best performance. Shuffling [-s] is must if the training examples are sorted by some biased order, otherwise optional (generally, recommended). I usually unset [-a] or [-s] in a task where neighboring examples are closely related with each other (e.g., deterministic dependency parsing); in such a case, shuffling just slows down the training, and the accuracy of the resulting model could (slightly) drop.

Setting [-p] or [-kp] is highly recommended to speed up training with [-t 1 -d 2] or [-t 1 -d 3], respectively. Remember that when you use variants of PA [-l 1,2,3] as a learning algorithm, kernel slicing [-k] will introduce some `negligible' rounding errors (meaning the resulting model is not identical in practice though theoretically the same), while truncating margin computations [-p] is unlikely to suffer from rounding errors (You can usually get the identical model). Never mind, anyway.

Finally, you do not need to worry about tuning parameter [--kernel-splitN]; otherwise the memory footprint will be adjusted according to the value of [-T]. If you plan to perform training several times with the same dataset, you may want to set [--kernel-splitN] to the value finally reached in the first training; this will slightly speed up the training.

Disclaimer

We do not guarantee that the implemented algorithms other than those proposed by us are patent-free; we regarded them to be patent-free simply because their implementations are available as (existing) open-source softwares (otherwise a simple patent look-up). Please be careful when you use this software for commercial use.

How to pronounce `opal'?

Opal was originally meant for online passive-aggressive algorithms; the efficiency attributes to obsolete partial-margins in the active lifespan.

References

N. Yoshinaga and M. Kitsuregawa. Kernel Slicing: Scalable Online Training with Conjunctive Features. Proc. COLING 2010 (oral), pp. 1245--1253. 2010.
Y. Freund and R. E. Schapire. Large Margin Classification using the Perceptron Algorithm. Machine Learning 37(3):277--296, 1999.
K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online Passive-Aggressive Algorithms. JMLR 7(Mar):551--585. 2006.
K. Crammer and Y. Singer. Ultraconservative Online Algorithms for Multiclass Problems. JMLR 3(Jan):951--991, 2003.
S. Matsushima, N. Shimizu, K. Yoshida, T. Ninomiya, H. Nakagawa. Exact Passive-Aggressive Algorithm for Multiclass Classification using Support Class. Proc. SDM, pp. 301--314. 2010.
H.-T. Lin, C.-J. Lin, and R. C. Weng. A Note on Platt's Probabilistic Outputs for Support Vector Machines. Journal of Machine Learning 68(3):267 - 276. 2007
S. Yata and M. Tamura and K. Morita and M. Fuketa and J. Aoe. Sequential Insertions and Performance Evaluations for Double-arrays. Proc. the 71st National Convention of IPSJ, pp. 1263--1264. 2009.
R. McDonald, K. Hall, and G. Mann. Distributed Training Strategies for the Structured Perceptron. Proc. NAACL, pp. 456--464. 2010.