Total contents of all articles Category: Http://www.cnblogs.com/asxinyu/p/4288836.html
Microsoft Infer.net or parts: http://www.cnblogs.com/asxinyu/p/4329742.html
a description of this document
This document is based on Infer.net 2.6 for the Chinese translation of the Infer.net User guide, but has been simplified and refined in accordance with the original site's ideas, but not limited to its sequence.
It is welcome to share the information of the original person, but it is forbidden to use the document directly for commercial profit.
I am studying based on the Infer.net component and plan to apply it to actual predictions, which are powerful, well-packaged, but also a lot of hard-to-understand places, while the official gives a number of examples, limited to personal energy, newer time is slow, and want to be interested friends to complete the work.
Email:[email protected]
This article address: http://www.cnblogs.com/asxinyu/p/InferNet_Demo_Learner_1.html
1. Basic Introduction
A "learner" is a complete machine learning application solution, such as a classification system or referral system. These learners can be called directly from the command line or the. NET program without having to learn the Infer.net API interface. Each learner includes training, forecasting and evaluation capabilities. Learners's source code includes some examples of using infer.net to build complex and stable machine learning function programs. This article is based on the previous article to introduce. This article is in the original English address:
2. Mapping of standard data formats
Together we review the following classifier mapping interface iclassifiermapping, which declares 4 methods:
2.1 GetInstances
Ienumerable<tinstance> getinstances (Tinstancesource instancesource);
The GetInstances method provides learners with an instance sample set (training or test set) manually, which can be used in training and forecasting. In addition, these two generic types, tinstance Tinstancesource, are freely selectable. For example, tinstance may be bound to an attribute and label provided by a class, or it may be referenced or indexed to an object of such a class. To match the cache, the Bayesian machine classifier assumes that the same instance source always provides the same instance.
2.2 Getfeatures
Tfeatures Getfeatures (tinstance instance, Tinstancesource Instancesource = Default (Tinstancesource));
Getfeatures provides all the eigenvalues (indicators) for the specified sample instance. This approach is also used in training and forecasting. Note that if the instance itself contains a corresponding characteristic value, it is not necessary to specify the source. The Bayesian machine classifier needs to bind the tfeatures to the MicrosoftResearch.Infer.Maths.Vector. To match the cache, Getfeatures assumes that the same instance source always provides the same instance. You may want to add a constant feature to all instances, such as adding 1 features that have always been 1. This causes the classifier to have invariance of the attribute value transfer (by adding an expected value for each class). In addition, the fewer correlations between attributes are better (highly correlated attributes can cause slow convergence of the training algorithm).
2.3 Getlabel
Tlabel Getlabel (tinstance instance, Tinstancesource Instancesource = Default (Tinstancesource), Tlabelsource LabelS Ource = Default (Tlabelsource));
Getlabel provides the tag value (category value) of the actual data required by the instance, which is called during the training process. The source of a label is more flexible if it is independent from the instance source, because in some cases the actual data source attributes and tag values may be separate. If the instance itself includes the data of the tag, then this can be omitted. In order to cache the label data and avoid the loss caused by the conversion to the local data format, the Bayesian machine classifier assumes that the same tag is obtained for the same instance call Getlabel.
2.4 Getclasslabels
Ienumerable<tlabel> getclasslabels (Tinstancesource instancesource = Default (Tinstancesource), TLabelSource l Abelsource = Default (Tlabelsource));
the Getclasslabels method can get all the label datasets in the categorical data. It includes not only the identification of multiple labels for the current task, but also the type labels for the corresponding labels. In the simplest case, Getclasslabels will return {True,false} when the label is of type bool. Getclasslabels can guarantee that each type of label is valid during the classification process. The Iclassifiermapping interface requires this approach because it is often possible that it will not be able to introduce a complete collection of sample tags, that is, the training set or test set may not contain all label types.
Given the flexibility to choose the right data type Tinstance,tinstancesource,tlabelsource and Tlabel (tfeatures requires a vector type), it provides a straightforward, Efficient iclassifiermapping interface implementation is possible, so providing a standard data format is a very convenient way to use classifiers. As mentioned earlier, the Bayesian machine classifier ultimately needs to convert the standard data format into a local data format acceptable to the infer.net algorithm. However, in many cases, using the local data format in your program is more useful than the standard data format you need to convert. An example of an implementation of iclassifiermapping is an example of gender prediction earlier in this tutorial.
3. Local Data format mapping
The ibayespointmachineclassifiermapping interface provides data formats that can be categorized by Bayesian machines in the training and forecasting process, so this provided format is also known as the local data format.
The native data format has two different representations: dense and sparse. Dense means that all eigenvalues associated with a single instance are stored in a double array, and the sparse representation contains all non-0 eigenvalues and corresponding indexes. The results of these 2 representations are the same, but the cost of the calculations is different. In the training and prediction data including many 0 eigenvalues of the matrix is faster than the corresponding sparse matrix.
In addition to the feature value representation, the native data format can also fix the label type. In the two classification problem, the tag value must be a Boolean type, and in the case of a multi-classification problem, the label must start at 0 consecutive integers. Providing data in a native format requires implementing the following 8 methods:
1.IsSparse:Issparse Indicates whether to store in a sparse matrix, note that the feature cannot be modified.
BOOL Issparse (Tinstancesource instancesource);
2.GetFeatureCount:Getfeaturecount Gets the number of features in the categorical data. When using sparse matrix representation, the corresponding inference algorithm needs to be established.
int Getfeaturecount (Tinstancesource instancesource);
3.GetClassCount:Getclasscount Returns the total number of all category labels in the instance.
int Getclasscount (Tinstancesource instancesource = Default (Tinstancesource), Tlabelsource Labelsource = Default (TL Abelsource));
All three of these methods are used to set the characteristics of the Bayesian machine classifier and the validity of the label. They are all called during the training and prediction process. The Issparse and Getfeaturecount methods can determine the partial return value from Getfeaturevalues and getfeatureindexes. The following two methods pass eigenvalue values to a single instance in the prediction of a single object. Their return value is not cached.
4.GetFeatureValues (single-instance):getfeaturevalues Returns an array of eigenvalues for a given instance. If it is represented by a sparse attribute (for example, Issparse returns True), this method returns only all related non-0 eigenvalues of the given instance. If the representation is non-sparse, getfeaturevalues returns all eigenvalue values, regardless of whether they are zero or not.
Double[] Getfeaturevalues (tinstance instance, Tinstancesource Instancesource = Default (Tinstancesource));
5.GetFeatureIndexes (single-instance):getfeatureindexes method returns null if the instance is represented as a non-sparse matrix, otherwise returns an array that returns a characteristic exponent corresponding to a non-0 eigenvalue. (??), so there are two methods for training and testing algorithms that provide attribute values and their indexes. During training, the instance may be further divided into subsets (called batches) that can not process all the data directly from memory. Do not use batches, and most of the following two methods are used.
Int[] Getfeatureindexes (tinstance instance, Tinstancesource Instancesource = Default (Tinstancesource));
6.GetFeatureValues (multi-instance):getfeaturevalues Returns the attribute value of the instance of the batch specified by each instance (relative to return the characteristic value of the specified batch). By default, the attributes and labels of all instances are assumed to be in one process. If this is difficult, possibly because it requires too much memory, the Bayesian machine classifier allows the training data to be mapped to multiple batches (setting the Batchcount property). The instance runs a batch with indexes from 0 to BatchCount-1. Training for multiple batch processing settings reference IterationCount.
Double[][] Getfeaturevalues (tinstancesource instancesource, int batchnumber = 0);
7.GetFeatureIndexes (multi-instance): Getfeatureindexes returns null if in sparse matrix standard, otherwise returns the feature index of the specified batch on the specified instance. For instances of the same batch, this is consistent with the characteristic values returned by Getfeaturevalues. Ibayespointmachineclassifiermapping the last method is to provide a true tag value:
Int[][] Getfeatureindexes (tinstancesource instancesource, int batchnumber = 0);
8.GetLabels:Getlabel provides the actual label category value for a given instance or label data source. Note that the Tlabel is of type bool in the two classification and is of type int in the multi-copy class. This method is not needed in the forecast.
Tlabel[] Getlabels (Tinstancesource instancesource, Tlabelsource labelsource = Default (Tlabelsource), int batchNumb ER = 0);
The Nativeclassifiermapping class and its sub-methods in the MicrosoftResearch.Infer.Learners.BayesPointMachineClassifierInternal namespace are implemented Ibayespointmach Examples of ineclassifiermapping interfaces. The encapsulation of these iclassifiermapping classes shows how the process of converting from a standard data format to local data, and how to cache data in batches during training.
4. Evaluating Data Format mappings
A simple way to evaluate the performance of a test classifier is to use evaluator. Evaluator reads true tag values through an instance that implements the Iclassifierevaluatormapping mapping interface, because evaluator is independent of the data format required by a particular classifier, Iclassifierevaluatormapping, such as the Bayesian classifier, essentially declares a common standard data format mapping Iclassifiermapping interface, except that there is no Getfeatures method. Instead of predicting the input parameters of the evaluation method, do not pass the mapping.
5. Create and serialize 5.1 create a Bayesian classifier
Once you have created an instance of the map, you can easily create a Bayesian machine classifier by invoking the following 2 factory methods based on the type of classification problem.
Bayespointmachineclassifier.createbinaryclassifier (Type II problem) bayespointmachineclassifier.createmulticlassclassifier (Multi-class issues)
In addition, both methods have two versions, one to map the input to the local data format, one to map the input to the standard data format, and to implement the ibayespointmachineclassifiermapping and Iclassifiermapping interfaces, respectively. The factory method returns a classifier of type ibayespointmachineclassifier, providing some appropriate settings for training and forecasting.
For example, suppose that we have implemented a mapping of the local data format required from the SQL database to the Bayesian machine classification sqlnativemapping, it is easy to create a classifier with a multi-classifier problem:
var mapping = new sqlnativemapping (), var classifier = bayespointmachineclassifier.createmulticlassclassifier (mapping) ;
The operation of creating a Bayesian machine classifier is simple and does not require data input related calculations.
5.2 Serialization
serialization is implemented through the save extension method of the Ilearner interface. The method has 2 overloaded versions, one is in the file, and the other is in the formatted stream. Both the training and the untrained classifier can be serialized and deserialized. Serialize a Bayesian machine classifier to a file with the following calling program:
Classifier. Save ("Bpm.bin");
The program serializes the classifier's parameters and user-defined mappings. It does not serialize any training data and no event handlers are attached.
5.3 Deserialization
You can load a classifier that was previously serialized by calling a static method of Bayespointmachineclassifier. Similar to the Save method, a classifier can also be deserialized from a file or a formatted stream.
var classifier = Bayespointmachineclassifier.load <sqlinstancesource, Instance, sqllabelsource, int, discrete, Bayespointmachineclassifiertrainingsettings, MULTICLASSBAYESPOINTMACHINECLASSIFIERPREDICTIONSETTINGS<INT&G T;> ("Bpm.bin");
The generic parameters required by the above method are as follows:
L Tinstancesource: type of Source instance
L Tinstance: The type of a single instance
L Tlabelsource: type of Source tag
L Tlabel: Type of single label (2 types of Bayesian classifier, 2 classes bool, multi-class int)
L Tlabeldistribution: Distribution type of labels
L Ttrainingsettings: Training Settings type
L Tpredictionsettings: Predictive settings type, two-class and multi-class problems, and the type of forecast settings is different. (binarybayespointmachineclassifierpredictionsettings
and Multiclassbayespointmachineclassifierpredictionsettings).
The deserialized generic parameter must exactly match the parameters of the classifier when it is serialized. Also when deserializing, the version checker is executed, and if the serialized version does not match, an exception is thrown. However, in many cases, for convenience, there are many methods that do not require many specific generic types, such as:
var classifier = Bayespointmachineclassifier.loadmulticlassclassifier <sqlinstancesource, Instance, Sqllabelsour CE, int, discrete> ("Bpm.bin");
This is equivalent to the previous deserialization of a multi-class Bayesian machine classifier from a file.
. NET platform or pieces-infer.net (iii) Learner api-data Mapping and serialization