Download PDFOpen PDF in browserClassifying Protein Families with Learned Compressed Representations11 pages•Published: May 1, 2023AbstractClassifying proteins into families is an important task when studying newly discovered proteins. If we can identify the family a protein belongs to, we can predict features without knowing the exact structure of such a protein.However, this grouping process is challenging. We propose a two-stage algorithm that classifies proteins into families by combining a dimensionality reduction technique using a variational autoencoder with learned fingerprint representations using a Convolutional Neural Network (CNN). Our models use fewer parameters than existing methods but perform better, with our variational autoencoder achieving 94% accuracy in reconstructing the most common amino acid in a sequence alignment, and the neural network provides 98-100% accuracy in classifying protein families. We developed a software framework to access our algorithms. All code and data are publicly available at https://github.com/ramindehghanpoor/CLI. Keyphrases: cnn, machine learning, neural network, protein family classification, vae In: Hisham Al-Mubaid, Tamer Aldwairi and Oliver Eulenstein (editors). Proceedings of International Conference on Bioinformatics and Computational Biology (BICOB-2023), vol 92, pages 47-57.
|