On manifold learning (manifold learning)

Source: Internet
Author: User

Machine learning Although the name took learning a word, let a person at first glance feel compared with Intelligence is just a change of argument, but in fact here the meaning of learning is much simpler. Let's take a look at the typical process of machine learning, which sometimes feels like applying math or more popular mathematical modeling, usually we have data that needs to be analyzed or processed, and based on some experience and assumptions, we can build a model This model will have some parameters (even non-parametric methods can be seen similarly), according to the data to solve the model parameters of the process, called Parameter estimation, or model Fitting , but the machine learning people, It is often called learning (or, in other terms, called Training)--because it induces a useful model based on the data.

Here, in fact, we construct the model is similar to writing a class, the data is the parameters of the constructor function, learning is the process of the construction function, after the successful construction of an object, we completed the study . Some machine learning problems come to an end, and others use the resulting model (object) to do some processing of later data, usually inferencing . At this time, there are some things like statistics, so-called "statistical inference." In fact, many of the problems of the original statistics and machine learning research are overlapping, but the two factions look at the same problem from different angles. And, indeed, there are statistical learning such a claim, which can be seen as a sub-domain of machine learning (or a molecule or even machine learning itself).

So we moved on to the next topic: Manifolds , which is manifold . Don't know if you're confused about the picture of the Earth? This is because the spherical surface is an example of a typical manifold, and the earth is a very typical "spherical" (Let's just be a sphere).

Sometimes the " low-dimensional manifold embedded in high-dimensional space " is often seen in the paper, but high-dimensional data is always unthinkable for our poor low-dimensional organisms, so the most intuitive examples are usually two-dimensional or one-dimensional epidemics embedded in three-dimensional space. For example a piece of cloth, you can think of it as a two-dimensional plane, which is a two-dimensional Euclidean space, now we (in three-dimensional) twist it, it becomes a manifold (of course, do not twist, it is a manifold, Euclidean space is a special case of manifolds). So, intuitively speaking, a manifold is like a D-dimensional space, in a m-dimensional space (M > D) is distorted after the result . It is important to note that the manifold is not a "shape" but a "space", and if you feel that the "distorted space" is hard to imagine, then remember the example of a previous piece of cloth.

If I am not mistaken, general relativity seems to be the study of our time and space as a four-dimensional flow (three dimensions plus time one dimension), gravitational force is the result of this manifold distortion. Of course, these are intuitive concepts, in fact, the manifold does not need to rely on embedded in a "peripheral space" and exist, a little more formally, a D-dimensional manifold is a local at any point of the embryo in (simply, is the positive inverse mapping is a smooth one by one map) Euclidean space . In fact, it is this kind of local and Euclidean space of the same embryo gives us a lot of benefits, which makes us in daily life many of the geometrical problems can be solved with simple Euclidean geometry, because compared to the scale of the Earth, our daily life is even a very small part of it-I suddenly think of "Seven Dragon Bead" That the king of the realm of the kind of private small planet, walk a few steps around a circle of feeling, it seems that the king not only to be physically good (the above gravity seems to be 10 times times the Earth), but also the brain is better, junior high school must be Riemann geometry!

So, in addition to this simple example of Earth, the actual application of the data, how to know that it is not a manifold it? Then you may return to the intuitive feeling. Again from the sphere, if we do not know the existence of the spherical surface, then the point on the spherical surface is actually a three-dimensional Euclidean space point, you can use a ternary group to represent its coordinates. But unlike the normal points in space, where they allow the position to be limited to the sphere, you can look at its parametric equation:

It can be seen that these three-dimensional coordinates are actually generated by two variables, or it can be said that its degree of freedom is two, but also corresponds to a two-dimensional manifold. With this feeling, it is natural to look at the examples of faces that are often used in manifold learning. is a result of the Isomap paper:

The picture here comes from the same face (face model), each picture is a grayscale image of the 64x64, if the bitmap in accordance with the column (or row) together, you can get a 4096-dimensional vector, so that each picture can be regarded as a 4096-dimensional Euclidean space in a point. Obviously, not one point in a 4096-dimensional space can correspond to a face picture, which is similar to a spherical case, and we can assume that all 4096-dimensional vectors that can be human faces are actually distributed in a subspace of D-dimensional (d < 4096). and specifically to the Isomap face of this example, in fact we know that all 698 pictures are taken from the same person face (model), but in different pose and light shooting, if the pose (up and down and left and right) as two degrees of freedom, and light as a degree of freedom, So these pictures actually have only three degrees of freedom, in other words, there is a spherical equivalent of the parametric equation (of course, the analytic formula can not be written out), given a set of parameters (that is, the upper and lower, the left and right pose and illumination of the three values), it is possible to generate a corresponding 4096-dimensional coordinates. In other words, this is a 3-dimensional manifold embedded in the 4096-D Euclidean space .

In fact, the graph above is Isomap the data set from 4096 to 3-dimensional space, and shows the results of 2-dimensional, the figure is each face in this two-dimensional space in the corresponding coordinate position, some of the red circle points are selected, and next to the point of the corresponding picture of the original It is very intuitive to see that these two dimensions correspond exactly to the results of the smooth changes of the two degrees of freedom of pose.

As far as I know, the introduction of manifolds to machine learning has two main purposes: first, the original algorithm used in Euclidean space is modified to make it work on the manifold, directly or indirectly, the structure and properties of the convective shape are used, and the second is to directly analyze the structure of the manifold, and try to map it into a Euclidean space, and then use the algorithm previously applied to Euclidean space to learn from the results obtained .

Here Isomap happens to be a very typical example, as it is actually "transforming a manifold into a Euclidean space" by "retrofitting an algorithm that would otherwise be suitable for Euclid spaces".

The method that Isomap has transformed is calledMultidimensional Scaling (MDS), MDS is a dimensionality reduction method, which is designed toso that the distance between the point 22 after the dimensionality is as constant as possible(That is, the distance between the two points corresponding to the original space is the same). Only the MDS is designed for Euclidean space, and the calculation of distances is done using Euclidean distance. Euclidean distances do not apply if the data is distributed over a manifold.

Let's go back to Earth-this two-dimensional manifold in three-dimensional space, assuming we want to calculate the distance between the north and South poles in three-dimensional space, which is easy, that is, the length of the two-point line segment, but, if you want to be in this manifold at a distance can not do so, we can not drill a hole from the To walk along the surface of the Earth, of course, if I go along any route, and then count out the total number of steps as a distance, this is not, because this way if I go along different routes, will not get a different distance value? All in all, we now need a new definition of the distance measurement on the Earth's surface (manifold), theoretically, any function that satisfies the 4 conditions of the measure can be defined as a distance, however, in order to correspond with the Euclidean space, select a generalization definition of a straight line distance.

Remember the first high school "between two points, the shortest line"? Now, instead of extending the concept of line segments to "the shortest curve between two points is a segment", the distance definition on the manifold is equivalent to Euclidean space: the distance between two points on a manifold is the length of the "line segment" connecting two points. Although it is only a replacement concept, but now the two unified, but in the manifold line is probably not necessarily "straight" (so the line is not necessarily "straight"), often referred to as "geodesic." For the simple manifold of the spherical surface, any segment must be on a "great circle", so the line on the spherical surface is actually some great circles, which also causes a series of awkward situations where there are no parallel lines on the shape of the sphere (any two lines intersect).


BackIsomapAnd it's mostly done one thing that isThe calculation of the distance from the original space in MDS to the geodesic distance on the manifold from Euclidean distance。 Of course, if the structure of the manifold is not known beforehand, this distance is impossible to calculate, so Isomap by connecting the data points to form an adjacency Graph to approximate the original manifold, and the geodesic distance corresponding to the Graph on the shortest path to approximate. As shown in the following:

This thing is calledSwiss Roll, think of it as a piece of cloth rolled up.The points of the two black circles in the graph, if calculated by Euclidean distances in the outer Euclidean space, are very close points, but in the manifold they are actually far away: The red line is the distance from the manifold Isomap .。 It can be imagined that if it were the original MDS, it would definitely be violent to project it into a two-dimensional space, completely ignoring the manifold structure, and Isomap could successfully "unfold" the manifold before projecting it.

In addition to ISOMAP, manifold embedding has many algorithms, including locally Linear embedding, Laplacian eigenmaps, Hessian eigenmaps, Local Tange NT Space Alignment, semidefinite embedding (Maximum Variance unfolding) and so on. Which is good or bad is not good to say, they all have their own characteristics, but also have a lot of variants. There is a Matlab demo on the Internet, and gives a few popular manifold embedding algorithms on some synthetic manifold results and comparisons, can have an intuitive understanding.

On the other hand, the transformation of the existing algorithm to fit the manifold structure and even specifically for the characteristics of manifolds to design a new algorithm, more typical graph regularized semi-supervised learning. Simply put, in supervised learning, we can only use the data with the label, and (usually there are a lot of) the data without the label is wasted. In the manifold hypothesis, although these data do not have a label, but still can help learn out the structure of the manifold, and after learning the manifold structure in fact, we are a little more understanding of the original problem, so it is natural to expect to get better results.

Of course, all of this is based on the same hypothesis that the data is distributed on a manifold (some of the algorithms may have slightly looser assumptions), but what exactly is real world data on manifolds? This is hard to say. However, in addition to the typical face and hand written digit, there is also a flow-based algorithm that works directly on data such as text that seems to have no relation to the manifold, and the effect seems to be good.

In other words, although it is a very important issue to combine with practical applications, it is not conclusive, especially for those who engage in theoretical aspects. I think, for them, it is actually like the application, but the mathematics of things to apply to machine learning problems, and the relationship between the complex, graph theory, Algebra, topology, Geometry ... As if the 18-way princes gathered in a hall, so I always feel that it will take another 500 years to do a bad math!

However, I can understand the potential of the existence of happy, like you play the game, the first play "Heaven and earth Rob: God Evil Supreme Biography", and then Play "Heaven and earth Rob Preface: Phantom Sword Record" in the time, will find inside the characters, plot gradually linked together, Perhaps even just in one place has appeared in a completely insignificant supporting role, when you again in another place suddenly see he is, a series of clues instantly be linked together, the overall picture gradually emerges, that kind of happy feeling, probably is called "dispel"!

Finally, the conclusion is: the so-called machine learning in the learning, is to establish a model, by the given data to solve the model parameters. The manifold learning, in the model, contains the manifold assumptions about the data.

Reference:

http://blog.pluskid.org/?p=533

Bloggers in Csdn in the corresponding article:

http://blog.csdn.net/xuexiyanjiusheng/article/details/46928771

On manifold learning (manifold learning)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.