Download PDFOpen PDF in browserRobust Authorship Verification with Transfer LearningEasyChair Preprint 86512 pages•Date: March 29, 2019AbstractWe address the problem of open-set authorship verification, a classification task that consists of attributing texts of unknown authorship to a given author when the testing set may differ significantly with the training set in terms of documents and candidate authors. We present an end-to-end model-building process that is universally applicable to a wide variety of corpora, with little to no modification or fine-tuning. It relies on transfer learning of a deep language model and uses a generative adversarial network and a number of text augmentation techniques to improve the model's generalization ability. The language model encodes documents of known and unknown authorship into a domain-invariant space, aligning document pairs as input to the classifier while keeping them separate. The resulting embeddings are used to train an ensemble of recurrent and quasi-recurrent neural networks. The entire pipeline is bidirectional; forward and backward pass results are averaged. We perform experiments on four traditional authorship verification datasets, a collection of machine learning papers mined from the web, and a large Amazon-Reviews dataset. Experimental results outperform baseline and state-of-the-art techniques, validating the proposed approach. Keyphrases: Transfer Learning, authorship verification, language modeling
|