Download PDFOpen PDF in browserEnhancing Protein Language Models for Remote Homology Detection: a Study on Parameter Efficient Fine-Tuning TechniquesEasyChair Preprint 155796 pages•Date: December 16, 2024AbstractRemote homology detection is a critical task in structural biology, essential for understanding evolutionary relationships between proteins. This study explores the application of Parameter Efficient Fine-Tuning (PEFT) techniques, specifically Low-Rank Adaptation (LoRA), to enhance pre-trained protein language models for remote homology detection. We experimented with several state-of-the-art models, encompassing a range of architectures and parameter sizes, to investigate the trade-offs between model complexity and performance. The dataset was divided into training (85%, 127,500 pairs) and test (15%, 22,500 pairs) sets using stratified sampling. Models were fine-tuned over 5 epochs using the Adam optimizer with a learning rate of 2e-4 and a weight decay of 0.01. Our iterative evaluation process ensured optimal performance tuning for each model. Results indicate that ProGen2 achieved the highest accuracy and F1 scores, demonstrating superior capability in detecting remote homologs. This study highlights the potential of PEFT techniques like LoRA in efficiently adapting large protein language models, even with limited computational resources, thereby advancing the field of protein sequence analysis and evolutionary biology. Keyphrases: Fine-tuned, Low-Rank Adaptation, PEFT, Parameter-Efficient Fine-Tuning, Pre-trained Protein Language Models, Structural Classification of Proteins, detecting remote homologs, efficient fine tuning of large language models, large language models, protein language models, protein remote homology detection, remote homology detection, scop database, task of protein remote homolog detection, techniques for protein remote homology detection
|