Download PDFOpen PDF in browserFrom Unsupervised Multi-Instance Learning to Identification of Near-Native Protein Structures10 pages•Published: March 11, 2020AbstractA major challenge in computational biology regards recognizing one or more biologically- active/native tertiary protein structures among thousands of physically-realistic structures generated via template-free protein structure prediction algorithms. Clustering structures based on structural similarity remains a popular approach. However, clustering orga- nizes structures into groups and does not directly provide a mechanism to select individual structures for prediction. In this paper, we provide a few algorithms for this selection prob- lem. We approach the problem under unsupervised multi-instance learning and address it in three stages, first organizing structures into bags, identifying relevant bags, and then drawing individual structures/instances from these bags. We present both non-parametric and parametric algorithms for drawing individual instances. In the latter, parameters are trained over training data and evaluated over testing data via rigorous metrics.Keyphrases: multi instance learning, protein structure prediction, protein tertiary structure, unsupervised learning In: Qin Ding, Oliver Eulenstein and Hisham Al-Mubaid (editors). Proceedings of the 12th International Conference on Bioinformatics and Computational Biology, vol 70, pages 59-68.
|