"Software analysis and mining" multiple kernel ensemble learning for software defect prediction

Source: Internet
Author: User
Tags svm

Summary:

Using the historical defect data in the software to establish the classifier, the software flaw detection.

Multi-core learning (multiple kernel learning): Map historical defect data to high-dimensional feature space, so that the data can be better expressed;

Integrated Learning (Ensemble learning): Use a series of classifiers to reduce the classification errors caused by the main class, so that there is better detection results.

In this paper, an integrated learning method is used to construct a multi-core classifier, the advantages of multi-core learning and integrated learning are presented, and the method is proposed: propose a multiple kernel ensemble learning (Mkel) approach for software defect C Lassi?cation and prediction. Considering the cost risk in software defect prediction, a new model weight vector updating strategy is designed to reduce the cost risk caused by mis-classification.

Working with Datasets: NASA MDP.

s1   Introduction

Multi-core learning + Integrated learning:

       A. Mapping historical defect data to high-dimensional spaces To dig up more useful information;

       B. Problems with unbalanced classes;

       C. Avoid complex parameter optimization problems;

       d. Adjust the accuracy of the classifier according to the different application needs;

Mis-categorization is a problem, especially when the defective is considered to be free of defects.

Contribution:

        A. Introduction of the Multiple kernel learning technique for the first time Software defect prediction field;

        B. Consider the cost risk, design a new sample weight vector Update strategy: The training stage, by increasing the weight of the defect samples, reduce the weight of non-defective samples to pay more attention to the defect module, reduce the defect as a defect-free cost risk, and obtain a better result.

S2 related work

Software defect Prediction Technology is divided into two categories: static and dynamic.

Static as shown: first marked with or without defects, then feature extraction constitutes a training sample, constructs the classifier.

Many traditional machine learning classification algorithms such as Bayesian, SVM, decision tree and so on can be used for static defect prediction techniques. In order to solve the problem of class imbalance, the methods of resampling, integration learning and cost sensitivity are adopted.

In this paper, multi-core learning algorithm is used to predict the defect tendency of software module, and a multi-kernel integrated learning (Mkel) algorithm is proposed. Unlike previous studies, Mkel has the following characteristics:

  • Combining the advantages of multi-core learning and integrated learning, the multi-core learning method is applied to software defect prediction for the first time.
  • Weight vector Update section, based on historical defect data, we use cost risk to solve the defect data forecast.
S3 Mkel A. Problem definition

Given a well-tagged training set:;

Given m basic kernel functions:

Mkel is designed to learn a classifier f (x) based on multiple kernel functions:

Get an effective boosting mechanism to learn the best kernel-based polynomial ft and its weight αt in every boosting trial. When this T-boosting test is complete, get t-ft (x) and αt, and finally integrate learning to integrate them.

B. Multi-core Learning

Mapping: The original problem is that now, by mapping it to a new feature space, consider the original problem in this new feature space.

Multi-Core Learning:

The multi-core learning method is a new hotspot in the field of nuclear machine learning. Nuclear method is an effective method to solve the problem of nonlinear pattern analysis, but in a
In some complex cases, a nuclear machine consisting of a single kernel function does not meet the practical applications such as data heterogeneity or irregularity, large sample size, uneven sample distribution, etc.
requirements, it is an inevitable choice to combine multiple kernel functions to achieve better results.

Optimization issues:----------->

This period requires a lot of complex operations, in order to avoid this disadvantage, use the boosting method to calculate the multi-core problem.

-------------------------------------------------------------------------------------------Multi-core learning extracurricular knowledge (head)------------------- ---------------------------------------------------------------

Multi-core Learning approach

due to the different characteristics of nuclear functions, so that in different applications, the performance of nuclear function is very different, and the construction or selection of nuclear function has not been perfect theory  In addition, when the sample features contain heterogeneous information (heterogeneous information) [20?26], the sample size is large [27?30], the multidimensional data is irregular (unnormalised data) [31?32]  Or the data in the high-dimensional spatial distribution of the uneven (non-?at) [33?34], a single simple kernel mapping of all the sample processing is not reasonable . In response to these problems, a large number of studies on the nuclear combination (Kernel combination) approach have emerged in recent years, namely multicore learning Methods [23, 31, 35?40].

Multicore models are a more flexible kernel-based learning model, and recent theories and applications have demonstrated the ability to use multicore instead of single-nuclear-enhanced decision-making functions (interpretability), and to achieve better performance than single-core or single-core machine combination models [41?42]. One of the simplest and most common ways to construct a multicore model isconvex combination considering multiple basic kernel functions, its shape: Here kj is the basic kernel function, M is the total number of basic nuclei, βj is the weight coefficient. Therefore, in the multi-core framework, the representation problem of the sample in the feature space is transformed into the choice of the basic kernel and weight coefficients. In this composite space constructed by multiple feature spaces, the selection of kernel functions and the selection of variables and models related to nuclear target measurement (Kernel tar-get Alignment, KTA) [43?44] are well solved by combining the ability of feature mapping of each basic nucleus [31,  45?46]. Whileby inputting the different characteristic components of heterogeneous data into corresponding kernel functions, the data can be better expressed in the new feature space, and the accuracy of classification is improved significantly.However, the most important issue here ishow to get the characteristic space of this combination, that is how to learn to get the weight coefficient.   In view of this problem, there are many effective multi-core learning theories and methods recently. such as the early boosting[21, 47-based multi-core combinatorial model learning method, based on the semi-fixed planning (semide?nite programming, SDP) [41] Multi-core learning method, based on two-constraint two-time planning (quadrat-ically Constrained quadratic program, QCQP) [36], based on the learning method of semi-infinite linear programming (semi-in?nitelinear programs, SILP) [24, 37], based on the hyper-core (Hyperker  Nels) [31] learning methods, as well as the recent emergence of simple multi-core learning (simply MKL) [27, 29] methods and multi-core learning methods based on the idea of grouping lasso.  In the combination of weight coefficients and kernel functions, the researchers have also made some improvements to multicore methods, such as non-stationary multicore learning methods [23], local multicore learning methods [40], non-sparse multicore learning methods [30], and so on.  In addition, based on a class of kernel functions with multi-scale representation, the multi-core learning method has many expansions in the direction of multi-scale kernel method [32?34, 48?52]. These multicore learning methods are based on a finite set of basic kernel function assumptions, easy to see,The combination of finite nuclei is limited by choice, and in order to extend it to a large number of nuclear combinations, a learning method based on infinite kernel has recently emerged.[39, 53?54].

Also refer to: Http://zipperary.com/2014/11/27/mkl/?utm_source=tuicool

------------------------------------------------------------------------------------------Multi-core learning extracurricular knowledge (tail)-------------------- ---------------------------------------------------------------

C. Multi-core Integration learning

Based on the same sample, different kernel functions are used, their weights are different, the T classifier is constructed, and then the boosting method is used to calculate the weights.

Steps:

    1. The initial sample is established, and the strategy of random sampling is used as the initial training set.
    2. At first, each sample weights are the same, through the boosting algorithm after the iteration, weights according to a certain strategy will be adjusted, in order to pay attention to the next round of boosting need to cause attention to the sample;
    3. Once the initial set and weights have been determined, run the boosting algorithm: used to measure the performance of the mis-classification at T-boosting. The classifier is designed to make the possibility of mis-categorization as small as possible.
    4. To obtain the final classification result from all weak classifiers, each classifier is assigned a weight. For the boosting iteration of T-wheel, its weight can be obtained by calculation;
    5. Update the weights in the next round of boosting (the general boosting algorithm updates the weights only according to the classification results: the weights that are correctly classified in T-theory boosting will be reduced, and the classification will increase, so that in the next round, focus on those who mistakenly classify, Because the defect as the seriousness of the defect is far greater than the absence of defects as defective, for defective samples, if they are correctly judged, then the weight is unchanged, otherwise, the weight increases; for samples without defects, if they are misjudged, the weight is unchanged, otherwise, the weight decreases, such as:);
    6. After the iteration, the final classifier is obtained:
Flow chart:

S4 Experiment

Mainly includes basic data sets, evaluation criteria, experimental design.

4.1 Basic Data Set

NASA12 20 basic metrics for a data set:


12 data sets with defects and no defects in quantity

4.2 Evaluation Indicators

Defect prediction Indicators

    • Pd= A/(A + B), the higher the means to identify the defective module as much as possible;
    • pf=c/(C + D),
    • precision=a/(A + C), the higher the prediction of defective modules as accurately as possible;
    • Accuracy= (A + D)/(A + B + C + D)

Because the higher the PD, the lower the precision, and vice versa. So, a new metric needs to be introduced as a trade-off--f? Measure

F? Measure = 2? Pd? precision/(Pd + precision).

F? Measure between 0~1,f? Higher measure indicates better performance.

4.3 Experimental design

The steps are as follows:

    1. Randomly selected 50% defective and no defective samples as the training set, the remaining 50% to test;
    2. To get a more general conclusion, we repeat the algorithm 20 times on each data set, and record each result;
    3. 30 basic kernel functions, including: 21 Gaussian kernel functions of different widths (Radial Basis function Kernel), from (2^?10, 2^?9,, 2^10), 9 polynomial kernel functions from 1~9;
    4. SVM is used LIBSVM;
    5. Boosting training set uses: Randomly selected 40% of the training samples as the initial training samples, the default boosting trial100 times, so the final classifier will be integrated with 100 kernel-based weak classifiers;
S5 Experimental Conclusion

The experiment evaluates Mkel and compares it with some other methods, such as the method of solving class imbalance: coding based ensemble Learning (CEL) and dynamic version of ADABOOST.NC (DVA, with 10 cross-validation , 9/10 is used to build the model, where 8/9 is used for training sets, 1/9 for validation sets, and some other more representative flaw detection methods: Naive Bayesian, decision tree, cost-sensitive neural network, asymmetric kernel principal component classification, etc.

The experimental results showed that PD was the highest and PF was not low.

About F-measure is also not low:

McNemar's Test is also performed (mainly for pairing data rate tests (equivalent to paired chi-square tests)) Here the Mkel method is compared with other methods, when P-value is less than 0.05, the two methods have different statistical meanings.

The effects of multi-core learning and single-core learning on the results and the effects of different initial sampling ratios and the number of boosting trials on performance in the comparison of defect prediction problems.

Conclusion:

  • Multicore performance is better than single core;
  • The proportion of sampling has little effect on performance;
  • The number of boosting experiments has little effect on performance.
S6 Effectiveness Risk

As with the method mentioned in the article, there is a similar problem in the process of initializing the training set, the use of a random extraction strategy, because of the problem of class imbalance, this will lead to no defects (that is, there is no guarantee that the initial training set contains both defective modules, but also contains a module without defects).

"Software analysis and mining" multiple kernel ensemble learning for software defect prediction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.