Analyzing microblogs where we post what we experience enables us to perform various applications such as social-trend analysis and entity recommendation. To track emerging trends in a variety of areas, we want to categorize information on emerging entities (e.g., Avatar 2) in microblog posts according to their types (e.g., Film). We thus introduce a new entity typing task that assigns a fine-grained type to each emerging entity when a burst of posts containing that entity is first observed in a microblog. The challenge is to perform typing from noisy microblog posts without relying on prior knowledge of the target entity. To tackle this task, we build large-scale Twitter datasets for English and Japanese using time-sensitive distant supervision. We then propose a modular neural typing model that encodes not only the entity and its contexts but also meta information in multiple posts. To type 'homographic' emerging entities (e.g., 'Go' means an emerging programming language and a classic board game), which contexts are noisy, we devise a context selector that finds related contexts of the target entity. Experiments on the Twitter datasets confirm the effectiveness of our typing model and the context selector.
-O
option. For installation, this document might be useful.
dataset
contains English and Japanese data.{en,ja}_{train,test}_{nonamb,amb}
contain entity ID, entity name, fine-grained type, coarse-grained type. Note that nonamb
and amb
refer to "non-homographic entities" and "homographic entities" in the paper, respectively.{en,ja}_{train,test}_ent_{emerging,prevalent,amb,nonamb}
contains files of entity ID. Each file contains tweet IDs of contexts used for training and testing our proposed model. Note that emerging
and prevalent
refer to "emerging contexts" and "prevalent contexts" in the paper, respectively.@inproceedings{akasaki2021ee,
title = {Fine-grained Typing of Emerging Entities in Microblogs},
author = {Satoshi Akasaki, Naoki Yoshinaga and Masashi Toyoda},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2021},
pages = {4667-4679},
year = {2019},
}