The mathematical principle of machine learning Note (iii)

Source: Internet
Author: User

    1. Positive definite matrices

1.1 Definitions

Generalized: Set M is n-order matrix, if to any non-zero vector z, there are ztmz> 0, wherein ZT represents z transpose, is called M positive definite matrix. [1]

Narrowly defined: an n-order real symmetric matrix M is a positive definite condition when and only if for all non-0 real coefficient vector z, there are ztmz> 0. Where ZT represents the transpose of Z.

1.2 Theorems and properties

The positive definite matrix can be transformed into a standard type, that is, a diagonal matrix under contract transformation.

L all symmetric matrices (or Hermite matrices) with eigenvalues greater than 0 are also positive definite matrices.

L Judgment theorem 1: The sufficient and necessary condition for the symmetric matrix A to be positive definite is that the eigenvalues of a are all positive.

L Judgment Theorem 2: The sufficient and necessary condition for the symmetric matrix A to be positive definite is: A's order of all orders is positive.

L Judgment theorem 3: The sufficient and necessary condition for the arbitrary array A to be positive definite is: a contract in the unit array.

The properties of positive definite matrices:

L definite matrices must be non-singular. Definition of singular matrix: if n-order matrix A is a singular array, then its determinant is zero, i.e. | A|=0.

L any master matrix of positive definite matrices is also a positive definite matrix.

L If A is an n-order symmetric positive definite matrix, then there exists a unique main diagonal element that is positive for the lower triangular array L, making a=l*l′, which is called the Choreski (Cholesky) decomposition of the positive definite matrix.

L If a is n-order positive definite matrix, then A is n-order invertible matrix.

    1. Inverse matrix

2.1 Concept of the inverse matrix

Set A is an n-order matrix on a number of fields, if there is another n-order matrices B on the same number field, it makes: ab=ba=e. Then we call B a inverse matrix, and a is called a reversible matrix.

2.2 Matrix Inversion

A) adjoint matrix method

If | A|≠0, the matrix A is reversible and

Where A * is the adjoint matrix of Matrix A.

b) Elementary Transformation method

An elementary transformation method for finding inverse matrices

To write an n-order invertible matrix A and N-order unit matrix I as a nx2n matrix

For B, the elementary line transformation, that is, a and I do exactly the same number of elementary line transformation, the goal is to a unit matrix. When a is a unit matrix I, the right-half matrix of B is simultaneously converted to a.

If asked

Inverse matrix of A-1.

Therefore a reversible and, by the right half can get the inverse matrix a-1=

2.3 Properties

L The invertible matrix must be a square.

L (Uniqueness) if matrix A is reversible, its inverse matrix is unique.

The inverse of the inverse matrix of L A is still a. Recorded as (A-1) -1=a.

L invertible matrix A transpose matrix at is also reversible, and (at) -1= (A-1) T (transpose inverse equals inverse transpose)

L if matrix A is reversible, then matrix A satisfies the elimination law. That is, ab=o (or ba=o), then B=o,ab=ac (or Ba=ca), b=c.

The product of the L two invertible matrices is still reversible.

L matrix Reversible when and only if it is a full rank matrix

    1. sigmoid function

The sigmoid function is an S-type function commonly found in biology, also known as the S-type growth curve. [1]

The sigmoid function is defined by the following formula

Its derivative of x can be represented by itself:

In information science, because of its monocytogenes and inverse function monocytogenes, sigmoid function is often used as the threshold function of neural network, and the variables are mapped to 0 and 1.

    1. Maximum likelihood estimation

4.1 Definitions

The maximum likelihood method (Maximum likelihood,ml) is also called the most approximate estimate, also called maximum likelihood estimation, is a theoretical point estimation method, the basic idea of this method is: when the sample observations of n groups are randomly extracted from the model population, The most reasonable parameter estimates should make the probability of extracting the N-Group sample observations from the model the most, rather than the least squares estimation method, which is designed to make the model best fit the sample data in the estimation of the parameters.

4.2 Features

It is a parameter estimation method used under the conditions known to the main station type.

4.3 Maximum likelihood estimation method

To find the maximum likelihood estimate for a parameter:

(1) Write out the likelihood function

(2) Take the logarithm

(3) The logarithmic likelihood function is obtained by partial derivative and zero for each parameter, so that the logarithmic likelihood equation group is obtained. If there is only one unknown parameter in the overall distribution, it is an equation, called the logarithmic likelihood equation.

(4) The q1,q2,...qk is solved from the equation group and recorded as

    1. Least squares

5.1 Definitions

Least squares (also known as the least squares method) is a mathematical optimization technique. It matches by minimizing the squared error and finding the best function of the data. By using the least squares method, the unknown data can be easily obtained, and the sum of the errors between the obtained data and the actual data is minimized. The least squares can also be used for curve fitting. Other optimization problems can also be expressed by minimizing the energy or maximizing the entropy using the least squares method.

5.2 Basic formula for linear least squares

Consider a set of super-fixed equations (the number of unknowns is less than the equation):

where m stands for m equations, N stands for n unknowns, M>n, and quantifies it to:

Obviously, the equations are generally not solved, so in order to choose the most suitable to let the equation "as far as possible", introduce residuals squared sum function s

(in statistics, the residual squared sum function can be regarded as n-times the mean square error MSE)

At that time, take the minimum value, recorded as:

By using the differential [2] to obtain the most value, you can get:

If the matrix is non-singular there is a unique solution [3]:

    1. Covariance

6.1 Definitions

In probability theory and statistics, covariance is used to measure the overall error of two variables. The variance is a special case of covariance, that is, when two variables are the same.

The covariance cov (x, y) between the two real-random variables X with the value of e[x] and e[y], respectively, is defined as:

Visually, covariance represents the expectation of the total error of two variables.

If the trend of the two variables is the same, that is, if one is greater than the expected value of the other, then the covariance between the two variables is positive, and if the two variables change in the opposite direction, that is, one of the variables is greater than its own expectation, and the other one is less than its own expectation. Then the covariance between the two variables is a negative value.

If X and Y are statistically independent, then the covariance between the two is 0, because two independent random variables satisfy e[xy]=e[x]e[y].

However, the reverse is not true. That is, if the covariance of X and Y is 0, they are not necessarily statistically independent.

The unit of measure for covariance cov (x, y) is the covariance of X multiplied by the Y covariance. It is a dimensionless number that measures linear independence, depending on the covariance correlation.

The two random variables with a covariance of 0 are called irrelevant.

The mathematical principle of machine learning Note (iii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.