Download PDFOpen PDF in browserBuilding a Question-Answering Chatbot using Forum Data in the Semantic SpaceEasyChair Preprint 49796 pages•Date: February 5, 2021AbstractWe attempt to combine both conversational agents and conceptual semantic representations by creating a Chatbot based on an online autism forum, and aimed at answering questions of parents of autistic children. First, we collected the titles and posts of all threads in two forums. We filter the threads based on their titles, such that only threads which titles are questions are kept. We remove threads without replies and obtain 36,000 threads, totalling 600,000 replies. Then, to provide the best answers, we use Amazon Mechanical Turk to obtain usefulness labels on five levels for a part of the data set: about 10’000 replies. We train a variety of models to learn from these labels and apply them on the unlabelled replies. We use seven standardized continuous features, with three features on sent2vec cosine similarity. The Random Forest Classifier came on top with an F1-score of 0.66. Afterwards, we compute the sentence vectors of questions and replies by averaging word2vec embeddings. When the Chatbot is asked a question, its sentence representation is computed and compared to all forum questions. The replies of the most cosine-similar question are first filtered to keep the ones with the highest usefulness label, and then the most cosine-similar reply is returned as answer. An example of how the Chatbot works is its answer to “What is Autism?”: “Autism has always been difficult for some people to to explain, but I do know what it is not: Pretty colors and sparkly gems”. Keyphrases: Chatbot, Natural Language Processing, Question Answering, Representation Learning, Sentence embeddings, semantic representations, word embeddings
|