We create a real-world long-term open-domain dialogue dataset constructed from Japanese conversation logs on Twitter. Ideally, dialogue systems should naturally understand the dependencies from past sessions. Therefore, we use our archive of Tweets that are retrieved by using Twitter API and construct multi-session Twitter dialogue datasets.
We regard one reply tree as one dialog session and only used dialog sessions consisting of utterances alternately posted by two specific users. After creating the test dataset, we remove tweets of the users appearing in the development dataset, and then eliminate tweets of the users seen in the test or development dataset from the training dataset. Also, we remove dialogues containing URLs, images, and posts tweeted by bots. In order to exclude too short or long dialogues, we used only episodes with 11-25 sessions which consist of 5-30 turns.
In training and testing our dialogue-context retriever for dialogue systems, assuming a user who starts the conversation in the final session as a human user and the other user as a dialogue system, the dialogue systems are requested to generate responses for 2n th (n>0) user uttterances in the final session.