Download PDFOpen PDF in browserA Code Naturalness Based Defect Prediction Method at Slice LevelEasyChair Preprint 3918, version 224 pages•Date: November 20, 2020AbstractSoftware defect prediction is an active research topic in the domain of software quality assurance. It can help developers find potential defects and make better use of resources. How to design more discriminative metrics for the prediction system, taking into account performance and interpretability, is a research direction that people devote to. Aiming at this challenge, a code naturalness feature based defect predictor method (CNDePor) is proposed. This method improves the language model by taking advantage of the bidirectional code-sequence measurement and weighting the samples by using the quality information, so as to increase the defect discrimination of the cross-entropy (CE) type metrics obtained from the model. Aiming at the shortcomings of coarse-grained defect prediction (e.g. difficulties in focusing on defect areas and high cost of code reviews), a new fine-grained defect prediction problem, statement-oriented slice level defect prediction, is studied. Four metrics are designed for this problem, and the effectiveness of these metrics and CNDePor are verified on two types of security defect datasets. The experimental results show that: CE-type metrics are learnable, which contain the relevant knowledge learned from the corpus by language model; the improved CE metrics are significantly better than the original metrics and traditional size metrics; the CNDePor method has significant advantages over the traditional defect prediction methods and an existing method based on code naturalness, and own comparable performance and stronger interpretability than a state-of-the-art mothed based on deep learning. Keyphrases: Software fault prediction, code naturalness, cross-entropy, deep learning, defect prediction, language model, machine learning, slice level, software defect prediction, 交叉熵, 代码自然性, 切片粒度, 深度学习, 缺陷预测, 语言模型, 软件质量保障
|