Discover machine learning algorithms book, include the articles, news, trends, analysis and practical advice about machine learning algorithms book on alibabacloud.com
the curve is above the Curve.The common convex functions are:
exponential function f (x) =ax;a>1
Negative logarithm function? logax;a>1,x>0
Two-time function of opening up
The decision of the convex function:1, If F is a first-order, x, y in any data domain satisfies F (y) ≥f (x) +f′ (x) (y?x)2. If f is a differentiable guide,Examples of convex optimization applications
SVM: which consists of max|w| Turn min (12?| W|2)
Least squares?
The loss function of L
Nine algorithms for machine learning---regressionTransferred from: http://blog.csdn.net/xiaohai1232/article/details/59551240Regression analysis is to quantify the size of the dependent variable affected by the independent variable, to establish a linear regression equation or a nonlinear regression equation, so as to predict the dependent variable, or the interpr
training samples.The above two or three can be done in the case of inverse existence, but what if the characteristics of the data are more than the sample points, because the inverse is not present at this time? You can use the ridge regression method to solve this problem, that is, it will be converted to, the other and the previous approach is similar.Of course, there is a method called forward stepwise regression, it is through each step to a certain weight increase or decrease a small value
paper is usually European-style distance, Pearson coefficient or cosine similarity.Assuming that a matrix A is established, the M*n matrix, the rows are all users, n is all items, each element of the matrix represents the user's rating of the item, then the item-based or user-based recommendation is to calculate the similarity of all columns or all rows. In real life, this matrix is very sparse.Topic: Recommend users to buy TOPN itemsThe Matrix C is a m*n matrix, each row represents each user,
One of the top ten algorithms for Machine Learning: EM algorithm. One of the top 10, which makes people think Nb-rich. What is Nb? We generally say someone is Nb because he can solve problems that others cannot solve. Why God is God, because God can do things that many people cannot do. So what problems can the EM algorithm solve? Or the reason why the EM algorit
The idea of clustering: dividing a DataSet into several subsets (called a cluster cluster) that you don't want to cross, each potentially corresponding to a concept. But the practical significance of each cluster is determined by the users themselves, and the clustering algorithm will only be divided.The role of Clustering:1) can be used as a separate process for finding a distribution pattern of data2) as a preprocessing process for classification. First, classify data is clustered and then the
Naive Bayesian algorithm is to look for a great posteriori hypothesis (MAP), which is the maximum posteriori probability of the candidate hypothesis.As follows:In Naive Bayes classifiers, it is assumed that the sample features are independent from one another:Calculate the posterior probability of each hypothesis and choose the maximum probability, and the corresponding category is the result of the sample classification.Advantages and DisadvantagesVery good for small-scale data, suitable for mu
In machine learning, there are many problems, there is no analytic form of solution, or analytic form of the solution but the computation is very large (for example, the problem of the least-squares solution), for such problems, we usually choose to use an iterative optimization method to solve.These commonly used optimization algorithms include gradient descent
, example to get the classification result of Class 1The same input is transferred to different nodes and the results are different because the respective nodes have different weights and biasThis is forward propagation.10. MarkovVideoMarkov Chains is made up of state and transitionsChestnuts, according to the phrase ' The quick brown fox jumps over the lazy dog ', to get Markov chainStep, set each word to a state, and then calculate the probability of transitions between statesThis is the proba
This section learns to use Sklearn for voting classification, see a specific example, the dataset uses the Iris DataSet, using only the sepal width and petal length two dimension features, Category we also only use two categories: Iris-versicolor and Iris-virginica, the standard uses ROC AUC.Python Machine learning Chinese catalog (http://www.aibbt.com/a/20787.html)Reprint please specify the source, Python
minimizing the degree of impurity at each step, the cart can handle the outliers and be able to handle the vacancy values. The termination condition of the tree partition: 1, the node achieves the complete purity; 2, the depth of the tree reaches the depth of the user3, the number of samples in the node belongs to the user specified number;Pruning method of tree is a pruning method of cost complexity;See details: http://blog.csdn.net/tianguokaka/article/details/9018933 Copyright NOTICE: This ar
, the message is the probability of classification C, when the word appears more time, will come to the problem of accuracy, you can dissolve the problem into a joint probability, that is, the probability of each word to find P (c| Wi), and then take out the probability of the largest topn to solve, such as n=10,n=15, and so on, the joint probability formula is as follows:
p=p1*p2*p3*....pn/(p1*p2*p3*....pn+ (1-P1) * (1-P2) * (1-P3) ... * (1-PN)), where P1-PN is our chosen topn probability.
1. Linear modelSimple form, easy to model, good explanatory2. Logistic regressionNo prior assumptions about the data distribution;Approximate probability prediction can be obtained.Many numerical optimization algorithms can be directly used to calculate the optimal solution for the convex function of arbitrary order of the rate function.3. Linear discriminant Analysis (LDA)When two kinds of data are the same as prior, Gaussian distribution and covaria
MySpace qizmt is a mapreduce framework designed to run and develop distributed computing application projects running on Windows Server large-scale clusters. MySpace qizmt is an open-source framework initiated by MySpace to develop trustworthy, scalable, and super-Simple distributed application projects. Open Source Address: http://code.google.com/p/qizmt /.
Infer. NET is an open-source framework that runs Bayesian inference in graphical mode. It is also used for ProbabilityProgramDesign. Open
(First chapter above)1.2.5 Linalg Linear Algebra LibraryBased on the basic operation of matrices, the Linalg Library of NumPy can satisfy most linear algebra operations.. determinant of matrices. Inverse of the Matrix. Symmetry of matrices. The rank of the matrix. The reversible matrix solves the linear equation1. Determinant of matrices from Import * in[#N-order matrix determinant operation in [6]: A = Mat ([[[1,2,3],[4,5,6],[7,8,9]]) in [print]det (A):"6.66133814775e-162. Inverse of the Matrix
Decision tree is to select the most information gain properties, classification.The core part is to use information gain to judge the classification performance of attributes. The information gain is calculated as follows:Information entropy:Multiple categories are allowed.Calculates the information gain for all attributes, choosing the largest root node as the decision tree. Then, the sample branches, continuing to determine the remaining properties of the information gain.Information gain has
nodes on the node on behalf of a variety of fractions, example to get the classification result of Class 1The same input is transferred to different nodes and the results are different because the respective nodes have different weights and biasThis is forward propagation.10. MarkovVideoMarkov Chains is made up of state and transitionsChestnuts, according to the phrase ' The quick brown fox jumps over the lazy dog ', to get Markov chainStep, set each word to a state, and then calculate the prob
ReferenceNB: High efficiency, easy to implement;LR: Less assumptions about data, strong adaptability, can be used for online learning, and the requirement of linearDecision tree: Easy to interpret, independent of data linearity or not; easy overfitting, no online supportRF: Fast and scalable, with few parameters, possibly over fittingSVM: High accuracy, processing of non-linear sub-data (high-dimensional data processing); Memory consumption, difficult
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.