IJCAI-16

This is the page which provides the source code and dataset we used for IJCAI-16.

Paper: Ordering Concepts Based on Common Attribute Intensity
Authors: Tatsuya Iwanari, Naoki Yoshinaga, Nobuhiro Kaji, Toshiharu Nishina, Masashi Toyoda, and Masaru Kitsuregawa.
Slides:

[IJCAI 2016] Ordering Concepts Based on Common Attribute Intensity from Tatsuya Iwanari

Errata

There were errors in the gold standard ordering in the paper. Please see this.

Source

See a snapshot (not refactored) at GitHub.
- We have a plan to maintain the code in an open repository (Under construction…).

Dataset

We make 2 types of dataset available on this page.

humans (for gold-standard)

This data contains the human annotations to make the gold standard ordering. They are written in Japanese.

Download:

data

Please see human annotation format section (below) for more detail.

human annotation format

In the archive file above, you can see some files like data0.csv, which contains the annotation of a person.

The format is like this (the numbers at the beginnings of lines show Line#):

1: # adjective,antonym
2: # conceptA = conceptB = conceptC > conceptD > conceptE = conceptF = conceptG > conceptH
3: conceptA,1,conceptB,1,conceptC,1,conceptD,4,conceptE,5,conceptF,5,conceptG,5,conceptH,8
...

Line 1 shows an adjective and antonym of each query.
Line 2 shows a human’s annotation result. We permit volunteers to use “equals (=)” for uncertain sub-orderings.
Line 3 shows the interpretation of Line 2 as “concept,rank,concept,rank,…”.

orderings (system-generated orderings)

This data contains the gold and system-generated orderings. They are written in Japanese and English.

Download:

Please see orderings format section (below) for more detail.

orderings format

The archive file has 4 files in the following.

gold.csv

This is the gold-standard based on human annotations.

baseline.csv

This is the baseline (the PMI of co-occurence) generated ordering.

svm.csv

This is the ranking SVM generated ordering.

svr.csv

This is the SVR generated ordering.

They have the same file format like this (the numbers at the beginnings of lines show Line#):

# beautiful
lavender,1,sakura,2,camellia,3,rose,4,daisy,5,sunflower,6,platycodon,7,lily,8
# elegant
pearl,1,amethyst,2,sapphire,3,opal,4,turquoise,5,ruby,6,emerald,7,tourmaline,8
...

Line 1 shows an adjective of each query.
Line 2 shows the system-generated ordering as “concept,rank,concept,rank,…”.

Contact

If you have trouble to use them, please contact me (Tatsuya Iwanari nari@tkl.iis.u-tokyo.ac.jp).