Research on Novel Database Core Technologies for the Era of Big Data
Our research group has been exploring and developing novel database core technologies for enabling Big Data management and analytics at a scale, a depth and efficiency that were ever thought to be impossible. The endeavor focuses on systems software such as database systems, storage systems and operating systems, but is not limited to them; the group also ventures to the fusion of infrastructure systems and cutting-edge social and business applications. One of the recent work is a super-fast database engine that the group successfully developed based on a novel execution principle. This brand-new engine is being deployed into more production systems in the market.

Very Large scale web solutions
ウェブ・ソーシャルメディア等のサイバー空間と実世界は密接に連動しており、サイバー空間と実世界センサデータの融合解析による社会課題解決を目標とした研究を推進しています。1999年から継続的に日本語ウェブページを大規模収集し、数百億URL、数十億ブログ記事、Twitterの数百億つぶやき等を含む ウェブアーカイブを構築するとともに、ドライブレコーダデータ、交通トラフィックデータ、気象データ等の実世界データの収集・蓄積を行い、その構造、内容、時間変化等を解析するシステムを開発中です。膨大なサイバー空間・実世界データを、データマイニング、機械学習、リンク解析、自然言語処理、画像処理等を用いて解析し、様々な切り口で探索可能な可視化システムを大規模ディスプレイウォール上に実装しています。

Natural Language Processing and Computational Linguistics towards understanding human and society using fast and accurate language technologies
Humans think with languages and verbalize experiences in the real world to convey them to others. We study on natural language processing that process languages efficiently and accurately with computers. The pursuit of such technologies leads to computational linguistics that reveals the mechanism of languages and intellectual ability. Recently, the growth of social media and mobile devices allows us to accumulate our experiences and opinions as social big data. We deeply analyze massive text in the social big data with computers to read society trend. We also develop technologies that promote language communication and aim at understanding our thinking from what we write.
- Simple and Effective Domain Adaptation in Neural Machine Translation using Vocabulary Adaptation (EMNLP 2020, Findings)
- Robust Estimation of Out-of-Vocabulary Embeddings Inspired by the Process of Creating Words (EMNLP 2020, Findings)
- Uncertainty-aware Evaluation Metrics for Open-domain Dialogues (ACL 2020 SRW) [paper]
- Accurate Task-specific Multilingual Model Applicable to Any Languages and Tasks (CoNLL 2019) [paper]
- Analyzing and Improving Neural Machine Translation for Longer Inputs (CoNLL 2019) [paper]
- Early Discovery of Emerging Entities in Microblogs (IJCAI 2019) [paper]
- Neural Description Generator for Unknown Phrases (NAACL 2019) [paper]
- Modeling Interpersonal Semantic Variations in Word Meanings (NAACL 2019), short [paper]
- Accurate linguistically-motivated Neural Machine Translation (ACL 2017) [paper]
- Situation-Aware Neural Conversational Model (ACL 2017 SRW) [paper]
- Acquiring Values from Social Media (IJCAI 2016) [paper]
- Mathematical modeling of word meanings and its translations (CoNLL 2015, short) [paper]
- Self-adaptive classifier for Efficient Language Analysis (COLING 2014) [paper]
- Emotion-aware Conversational Model with Humanity (ACL 2013) [paper]
- Managing World Knowledge Acquired from time-series text (EMNLP 2012) [paper]
- Social-media Text Analysis and its Visualization for Social Analysis (PacificVis 2012, IUI 2016, PacificVis 2018) [paper] [demo]


Petabyte-class global environment digital library
多種多様かつ膨大な地球観測データを統合・解析することにより、科学的・社会的に有用な情報に変換するアプリケーションのためのプラットフォームの構築を進めています。 大規模データアーカイブ、メタデータ管理、高性能データ解析処理、ビジュアライゼーション等に関する技術の研究開発に加え、長期的安定的なシステム運用、国際的な地球環境ポータルの構築にも取り組んでいます。 堅牢なデータベースと巨大な解析空間を有し、多分野からの莫大な量の地球環境データが蓄積されているだけでなく、さまざまなデータ処理・解析ツールも用意された統合的データ基盤を目指しています。 [詳細]