Download PDFOpen PDF in browserGPT-K : a GPT Based Model for Generation of Text in KannadaEasyChair Preprint 93236 pages•Date: November 15, 2022AbstractLarge AI-based language models are changing how we work with language. They are becoming increasingly popular because they allow us to create complex linguistic structures without requiring a lot of resources. A language model must have access to a large corpus of linguistic data (e.g., word frequencies) to learn and generate new words. GPT-2, a language model, can generate coherent paragraphs on its own, without any input on what to write about or guidance on grammar rules. Although multiple pre-trained GPT-2 models for English and other high resource languages exist, there are few to no such models for Indic languages like Kannada. In this study, we propose GPT-K, a GPT-2 based model for language modeling in Kannada. GPT-K has been trained on a large corpus of Kannada text and can effectively perform language modeling tasks in Kannada. The model generated syntactically correct text in most cases Keyphrases: GPT-2, Hyperparameter finetuning, language modeling, language models, model training
|