Download PDFOpen PDF in browserClassification of DNA Sequence Using Machine Learning TechniquesEasyChair Preprint 86035 pages•Date: August 4, 2022AbstractDNA, the blueprint of life, a long repeating chain of nucleic acids, contains the genetic information of living organisms. Information extraction from DNA is an important research topic in genomics. The process of determining the order of base pairs is called DNA sequencing and the activity of identifying whether or not an unlabeled sequence corresponds to an existing class is known as DNA sequence classification. This paper presents several machine learning techniques for DNA sequence classification using two public datasets. Promoters and splice datasets are used to assess the approaches' effectiveness and achieve noteworthy improvements in that datasets. Among all experimented schemes, only two of them have less than 90 percent accuracy in training the data sets and most of the techniques achieve more than 90 percent test accuracy. The results of the experiment reveal that several techniques outperform all other models. Keyphrases: AdaBoost, DNA sequence, DNA sequence classification, Decision Tree, Gaussian processes, K-Nearest Neighbour, Multi Layer Perceptron, Naive Bayes, Random Forest, Support Vector Machine, logistic regression, machine learning
|