First lesson in deep learning

Last Update:2017-06-24 Source: Internet

Author: User

Tags theano mxnet xeon e5

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.

The concept of deep learning has been very hot in recent years, and we are fortunate to have caught up with and witnessed the rise of this wave. Remember the 2012 before the mention of deep learning, most people are not familiar with, and for some time, some people are still skeptical, that the wave may be similar to the previous sparse coding, may be able to continue to fire for two or three years, but eventually to be replaced by a new technology, and then later, Whether in academia or industry, some researchers regret that they did not follow the wave of the first time. Indeed, from the 2012 Alexnet won the Imagenet Championship, five years later, the deep learning method still occupies the field of artificial intelligence.

With this wave of waves, some people as surfers, the rise of a wave of waves, leading the various areas from traditional methods to deep learning methods of transformation, and hope to be able to learn other areas of the way to improve their own areas of work, some people working hard to use deep learning methods for the company to improve performance, Want to follow up and implement the latest technology in real-time; some of the research monks on campus need to know the latest technology and the rationale behind it, on the other hand, the pressure to send articles and find work; some practitioners, such as editors and reporters, often report on the field of AI, but never have time to study deep-learning; After all this news, it's scary to ask, "can Skynet be built in its lifetime?" Or "What is the level of AI's threat to humans?"

It is obviously impossible to solve all of the above problems by simply passing a course, or a book. In view of the domestic machine learning materials are still relatively small, and mostly theoretical nature, and no practice module, we began last year to write a deep learning related tutorial, and hope that through a chapter of the real case to bring everyone familiar with deep learning, Master deep learning. Each chapter of this tutorial is centered around a real problem, from the background to the use of Paddlepaddle platform for code experiments, to fully understand how the whole problem is solved with deep learning, and then farewell to the paper. Before taking part in the event, I did not think of the number of applicants. See the next enrollment in the group of students there are some high-end users, so I know this course must be some students disappointed, because this talk as the first, can only take into account the majority of users, design into a moderately difficult course, to provide you with some deep learning the most basic concept, in order to more easily get started in depth learning. If you are a high-end user (able to run a deep learning model yourself or have done some common sense), it is recommended that you go directly to tutorial for self-study and, of course, if you are interested, please continue to follow our series of courses.

First, make a preview of this series of follow-on deep learning courses. In this tutorial, we will cover the following:

Beginner's starter
Identify numbers
Image classification
Word vector
Sentiment analysis
Text Sequence Callout
Machine translation
Personalized recommendations
Automatic Image generation

In this lesson, we'll take a look at deep learning and learn the fundamentals and ways to work with deep learning through some of its useful or interesting applications.

First, deep learning is what

In traditional machine learning, we want to define specific solutions for each of these tasks. For the image, people used to spend a lot of time to design a variety of descriptors for image characterization; for text, a single machine translation task is often designed for multiple models: such as word alignment, Word segmentation or symbolization (tokenization), rule extraction, syntactic analysis, and so on, each step of the error will accumulate to the next, Cause the entire translation to be unreliable, and it can be very complicated to track down an error. The advantage of deep learning is to compensate for the above problems, on the one hand, reduce the dependence on a lot of manual features, for image text and other fields can be directly from the original data modeling; On the other hand, through an end-to-end network model (that is, a network directly from input to output modeling, Without the need for intermediate steps) reduces the problem of error accumulation in multiple steps.

Deep Learning uses a multi-layered neural network approach that relies on big data and tough pieces.

Big Data
In this era of data explosion, the general perception is that big data is not a problem. But it's not exactly the same. From a domain perspective, the general classification of images and the training of language models may be able to obtain a large number of samples from the search engine, but for fine-grained image classification (such as the classification of different types of flowers) or professional areas of the dialogue data (such as legal advice) data is scarce; From the perspective of application, image, text and voice are easy However, if you want to have supervised training, you must have a corresponding tag (label), such as a voice corresponding to the person, or a voice corresponding to the text, this is a big project. This requires that we use existing resources, the simplest method, such as the ability to first use a large number of unlabeled data to learn the characteristics of data, you can reduce the size of data labeling.
Hard Parts
Because deep learning requires strong computational processing power, GPU graphics are needed for parallel acceleration, and hardware consolidation has become a major consensus among academia and industry in the study of deep learning networks. During 2016, both Nvidia and AMD share prices soared, as the GPU maker Nvidia (NVIDIA) shares the trend this year. It can be said that this leap-over growth is due to the GPU chip in the game, virtual reality, autonomous driving, data center and other high-performance computing needs of the field applications.
Each GPU video card has multiple (typically dozens of) multiprocessor (streaming multiprocessors, SMs) with hundreds of cuda cores per multiprocessor. A kernel instance of a multithreaded procedure executes on a single SM, and an operation on a kernel instance is assigned to a different Cuda core to execute independently. So as long as the program is properly allocated, the more processors in the GPU execute faster. such as Titan X (GM100) graphics has 24 multi-processor, each multi-processor has 128 Cuda core, the entire video card has 3,072 Cuda core, its relative 16 Xeon E5 CPU processor to accelerate 5.3~6.7 times [1], which for the real-time requirements of high application significance.

Second, the application of deep learning

Deep learning can cover a wide range of applications, and we can start with a few interesting applications to give you a basic concept, examples of industry often used in the follow-up course of detailed examples.

The minimalist version of the unmanned vehicle

The concept of unmanned vehicles in recent years is very fire, from the traditional field to the Internet companies have a number of researchers in this direction. For the initial contact with the neural network of students, we first introduced a small task. As shown in the running track of a remote-controlled trolley in the driveway, a GoPro camera is mounted above the trolley. The Blue line in the figure represents the vertical baseline, and the red line indicates the direction of the car at every moment. Our goal is to give a driving plan based on the driving direction of the car and the current image data.

Here, the neural network can be used to specify the input and output of the network is the current image and the direction should go, the whole as a regression problem to deal with, where the input image with multilayer convolutional neural network to resolve. Here may have friends will say, in fact, I just need to use basic image processing technology (such as two value image and then detect connected domain) to find out to the left and right two lanes, and then forward Square Lane point position direction is not going to go? Really can do this, we are here just to illustrate the end-to-end training of deep learning, for example, the simplest version of the unmanned car, there is a clear lane line, and no traffic lights, obstacles and other interference. In the actual situation, we need to consider a lot of situations such as vehicle front, lane keeping, obstacle detection, traffic light detection and so on, so we need multi-model design and integration. In the simplest case, the lane detection can be done only through the image processing + manual strategy to achieve the purpose, and do not need any training data, but this requires programmers each encounter a badcase need to manually modify the policy, so that the next time a programmer to take over the code, it can only be whoo.

Taking photos and painting

An article in 2015 [5], linking the artist Van Gogh and deep learning, achieved the effect of "art photo" by bringing art and painting style to everyday photographs. The practice is to design a neural network that defines the loss function of the network as a diff (taking a photo, creating a work) with a diff (art painting, creating a work) weighted sum of these two diff. where diff represents the difference between two pictures. But if the diff is calculated by the difference of each pixel, it is obviously unreasonable that the pixel value must have changed greatly for the photograph and the generated work, but for the art and the creation, it may be similar in color, but it is certainly far from the individual pixel value. So what we really want is just an abstract concept, such as an example, we just need to generate a graph that contains "cat", and the drawing wind is similar to the middle art photo. The hidden layers of neural networks are used as a measure of their diff space.

Photo Reproduced from: Http://phunter.farbox.com/post/mxnet-tutorial2

Machine translation

The two examples just now are applications of deep learning in images, which are equally significant in text. Unlike images, the text acts as a serialized message, and the depth neural network handles the data differently, but the basic ideas can be migrated from one another. For example, it has been understood that the method of image classification through deep learning, then the text classification is only a change, a short article mapped into eigenvectors to classify, this can be done by the understanding of the image of the convolutional neural network to deal with the sequence information of the cyclic neural network. Similarly, machine translation (using a computer for translation between different languages) can also be done in a similar manner. In the case of deep learning, the first one is to understand a Chinese language through a cyclic neural network (mapping to text semantic information, which can be a vector, or a timing information), calling this process "coding", and then passing the text semantic information through another recurrent neural network, each time outputting an English word, Call this process "decoding", through the encoding-decoding structure to complete the machine translation. I am only here in the vernacular to say the general idea of machine translation, interested students can refer to machine translation Chapter tutorial or follow-up courses.

Write Poems for You

After reading the above, if you want to create a poem according to a word, what do you want to do? I'm afraid that some students have thought of it: write poetry with translation. Good, machine translation can actually use a lot of places, only need to modify the data set on OK, if we want to according to a word, let the machine "create" a poem, only need to set the translation model input to this word, and output is verse can. However, this is usually problematic because it causes the input sequence to be short and the output to be very long, and the dependencies cannot be fully developed, and tying such two sequences together can cause the machine to "back" down as hard as the input corpus without really understanding semantics. Therefore, some work with the phrase to generate the first sentence of poetry, with the first sentence to generate the second sentence ... Or you can use the former n-1 sentence to generate the nth verse. Interested students can try it on their own, you can also try the secret inside the writing module.

Product recommendation

Product recommendation is the focus of e-commerce and news clients, they are concerned about the user interests of the control, its recommendation system is often good or bad for the user retention and purchase situation has a greater impact. Here we can see that the most basic recommendation strategy is the pop-up recommendation (popular), and items that have been browsed or purchased/collected. For a large number of users have not been browsing records of the project, the traditional recommendation method is generally used in collaborative filtering, that is, recommend to users of similar interests, and the other is through the recommendation based on content filtering, that is, recommend to users to browse the project similarity, which involves the user similarity and product/project similarity acquisition. On the one hand, we can use deep learning to model its similarity, and on the other hand, we can map the user features and product/project features to the same space for feature comparisons, that is, to change a (co-filtering) and B (Content-based filtering) strategy to C.

Third, the defects of deep learning

Having said the beauty of neural networks, let's take a look at some of its flaws, at least the problem that is currently difficult to solve.

Tesla events

Attention to Tesla's classmates should have noticed that last year, a 23-year-old Chinese young man, in driving Tesla electric car along the Beijing-Macau Expressway, Hebei Handan section of the road, the vehicle in front of the obstacle, the man dodge not to hit the road sweeper, a serious car accident caused death.

Tesla officials have not disclosed their internal algorithms, and we only know that Autopolit has the technology provided by the Israeli Mobileye company in its autonomous driving system. [3] Mobileye is a vision-based company that has developed many years of advanced driver Assistance Systems (ADAS) in the industry, mainly based on images returned from Monocular cameras, vehicle detection through deep neural networks, and lane recognition. But Mobileye himself has reminded Tesla that the system can only serve as an adjunct, imperfect, or fully secure vehicle owners. Although Tesla officials have stated that their families are reluctant to provide more information, the reasons for the specific Autopolit errors are not possible, but because China's unique road sweeper has not been in the training data set of monocular vision, or because of the image quality problems caused by special factors such as illumination, Indicates that the safety of autonomous driving cannot be guaranteed by visual technology alone. We can't boil this accident down to the flaws of deep learning, but in real systems we're really hard at the moment to fully locate and solve problems from an end-to-end system, and you can look at the following examples for further understanding.

Explanatory

In the previous section, "What is deep learning", we talked about deep learning that can take advantage of end-to-end learning to avoid problems caused by multi-step accumulation errors, but this is actually a flaw. We can't pinpoint where the problem is. The following is an example of an image classification of a badcase for illustration.

Krizhevsky, author of the Imagenet competition's 2012 champion Job Alexnet, has suggested that while the alexnet effect is great, in order to optimize the effect on the data set, a deep neural network with very many parameters has to be built, and such a network is very easy to fit. In the 15 CVPR meeting, Anh Nguyen presented a method for generating samples that could "fool" The deep neural network used for image recognition [4], As shown in the 8 pictures below the text labeled below is the Imagenet contest data set on the best-performing network on the image of the recognition results (confidence is higher than 99.6%), the network will we think of these ripples identified as king penguins, starfish, baseball, electric guitar, train cars, remote control, peacock, African Grey parrot. This is easy to "fool" a sample of a neural network, known as a confrontation sample.

Deep learning wants to simulate neurons in the brain and fit them through a neural network, but the learning process is different. In fact, when people go to learn knowledge, is "where will not point where", "where wrong to change where", that is, local adjustment, and deep learning is usually through all the sample to determine the entire network parameters, hope to obtain the global optimal solution on all samples; when people learn what "penguins" are, Neither deliberately through a certain number of characteristics (such as color, posture) to capture, and do not need to look at Qianbabai picture to understand such a pattern, we know the following three pictures, are a species, and neural networks want to learn such a concept is not easy, often need penguins of various varieties, various pose pictures.

Similarly, when the results of a neural network are wrong, we cannot modify some of the parameters as the human brain learns, even if it is possible, for end-to-end neural networks, which parameters to adjust, and how to adjust it. This is the limitation of the interpretative nature of deep learning.

Thank

Thank you for subscribing to this period of gitchat activities, the opening mentioned tutorial as our Paddlepaddle deep learning platform to expand the ease of use, we welcome your attention to study and provide valuable advice. Also thanks to this tutorial many volunteers of the joint efforts of students, domestic open source is not easy, to write Tutorial&demo documents more difficult, I hope that interested in the small partners to join us, together to promote interesting tutorial can be shared in the open source community.

Reference documents:

Https://www.nvidia.com/content/tegra/embedded-systems/pdf/jetson_tx1_whitepaper.pdf
Https://www.quora.com/What-is-the-difference-between-Teslas-Autopilot-system-and-Googles-driver-less-car
http://wccftech.com/tesla-autopilot-story-in-depth-technology/4/
Nguyen A, Yosinski J, Clune J. Deep Neural Networks is easily fooled:high confidence predictions for unrecognizable imag Es[c]//proceedings of the IEEE Conference on computer Vision and Pattern recognition. 2015:427-436.
Gatys L A, Ecker a S, Bethge M. A Neural algorithm of artistic style[j]. ARXIV preprint arxiv:1508.06576, 2015.

Chat transcript

Share profile: Zhang Ruiqing, member of the official development group of paddle, graduated from the Computer College of Zhejiang University. Focus on the field of deep learning, the current research direction for dialogue, graphic quiz. Weibo: Rachel____zhang.

Q: Most of the cases seen in deep learning are in image processing, speech recognition and so on. Is there any advantage to the prediction of similar regression analysis of common data with deeplearning? such as e-commerce sales of a product forecast?

Answer: The advantage of deeplearning relative to the most basic linear regression is that with the model go deeper, there will be more parameters, plus nonlinear enhancement of the model understanding ability. As to whether you can make a sales forecast, actually look at your data.

For small data volume products (mostly in fact, the amount of data is small), even if a lot of parameters are futile, because it will overfitting;
Input feature selected Right, if you consider this is a cyclical product (such as firecrackers), then you use January 16 data training, January 17 data prediction is OK, but if the March 16 data forecast will definitely kneel.

Q: Does the first poem for you write a seq to seq with one or several words?

Answer: yes. Keywords (one or more) generate the first sentence, the clause I generates the sentence i+1 (or the first sentence of the i+1 sentence); This is a common approach.

Q: When doing Chinese sequence tagging, using BI_LSTM+CRF, how do I add a well-proven manual feature, such as "word suffix"?

A : on the addition of suffixes, the input is set to Word embedding; Suffix embedding can be, in fact, it is not recommended that you directly add suffix features.

Learning Character-level representations for Part-of-speech Tagging. This article, for example, proves that there is no need to manually add a suffix. Word-level embedding can capture semantic information, Char-level embedding can capture the morphological information including suffix information, can be used directly char-level embedding, but also convenient.

Q: I do fintech field, I want to ask two questions. One is in the deeplearning this piece, if just the TensorFlow paddlepaddle as the approximate black box to use, mainly based on the existing model to make minor changes, the effect will probably have how much effect. Another is in the fintech this piece, now see someone based on DL as hedge fund, that according to your judgment, in the financial field whether there may be more achievements. There is a lot of hesitation when it comes to explaining things hard.

A : on the question of whether the hedge fund is based on deeplearning, I can only say that this can be done from the point of view of the timing signal, and that there is a company that (at least claims) used deeplearning to model the investment, However, they are not going to disclose their algorithms.

Personally, the biggest problem with hedge funds is overcoming data noise. In the natural signal (real image, voice), its internal noise is relatively small, and even I can accurately model the noise, but financial data, especially domestic financial data, the noise is very large.

I have used the financial data to do quantitative investor classification, in fact, it is very difficult, but you this if you do a trend to predict what is more difficult. That is, if you really want to do a good job, you have to consider a number of factors, the overall news (rate reduction), artificial emotions (financial tension years ago), a stock news (two large shareholders divorced) and historical trends all into the model, it can be a try.

Q: Can traditional image processing methods be used in combination with deep learning? If so, can you give an example, for example, for image segmentation? Thank you, the current deep learning has not been started, so there is this question, please the author to answer.

A : for the combination, my understanding is, for example, for image classification, the manual feature can be replaced by the SVM to other classifiers. such as the neural network classifier, FCN for segmentation (Fully convolutional Networks for Semantic segmentation) This article is more classic, input as an image, the target output is Groundtruth segmentation, a full convolution network is used to fit. One problem is that the general image recognition network output is weighted for each category (no spatial information). This is achieved by using an all-connected layer as a convolution of the full extent.

Q: The article focuses on the application of deeplearning and its paddlepaddle implementation, can you tell more about the implementation of this framework Paddlepaddle?

Answer: Receive the demand, wait for the series to arrange it, next time special open a topic.

Q: Many of the current applications are supervised learning, what are the possible directions for non-supervised learning to make substantial progress?

A : one of the most successful applications of unsupervised learning is clustering, or dimensionality reduction. Previous work (such as PCA and K-means) can also have corresponding competitive methods in deep learning, such as autoencoder based on signal reconstruction. In addition, some of the more mature paper are based on unsupervised learning. The examples we give here are also supervised. Unsupervised learning of invariant Feature hierarchies with applications to Object recognition,unsupervised Feature Learni Ng for audio classification using convolutional deep belief networks, such as these two, older, and now most still have supervised learning.

A class of unsupervised learning is the generation of models, including the recent fire of Gan, as well as VAE, can be unsupervised to generate pictures, and training model, so that its generated images like the training set of pictures, we will also be in the tutorial two period to join the content, please note. Tutorial Address: Https://github.com/paddlepaddle/book.

Q: What are the basic skills of deep learning?

A : not quite understand the meaning of the basic function, if you want to define, in fact, is to help the data to better express, such as the original need to define a lot of characteristics, now input raw data can (text cited the corresponding image and text examples). Basic skills in fact, the most fundamental is the ability to program, I think the other is to say. Maths is not difficult at first. Follow-up, specific problem specific analysis, my suggestion is that you do not have to start to read the basic probability theory.

Q: Ask a question, deep learning in poetry, image generation, news writing and other fields have applications, whether in the field of music creation is also related to the application? What's the level now? Can you recommend some related papers?

A : in Google direct search "RNN-RBM", the second result is that I have summed up this blog. In addition there are 6 examples, which make deeplearning to generate music.

Q: At present, I am doing the hot topic analysis in the text, using the correlation analysis algorithm. The difficulties we encountered: 1. The Apriori algorithm comes out of the phrase, how to convert it into a practical problem. We are now people to see, to locate the problem, the amount of labor is very large, what machine learning method? 2. What are the algorithm recommendations besides the correlation analysis algorithm? 3. After we have to do long text, relevance analysis algorithm is appropriate? Do you have any algorithm suggestions?

A : understand, that can actually be converted into a text classification problem, your input is a long text bar, no supervision is difficult, but also depends on the accuracy of your request. Direct text classification, see GitHub, this is supervised, unsupervised, you can try to use the relationship between adjacent phrases.

Q: I have some macro problems, many machine learning algorithms, what scenario deep learning applies, when not applicable, this is one; when the second one applies the existing framework, when self-research algorithm; third, do you optimize algorithms and programs based on GPUs?

A : You can use the text similarity to do clustering, you can also take advantage of the adjacent sentence semantics related assumptions, as I said, in general, deep learning is only applicable to big data, if the amount of data is small and due to deep learning parameters will lead to overfitting, so the small data recommended rules, Or try to reduce the parameters. The second problem, now there are too many frames, it is recommended not to reinvent the wheel. The third problem, yes, is the implementation of the CPU and GPU in each layer of the Paddlepaddle neural network.

Q: What are the tensorflow advantages of Paddlepaddle and Google?

A : insiders said not to count, here is Caffe author jiayangqing to paddlepaddle evaluation. To summarize:

High-quality GPU code;
Very good rnn design;
Design is very clean, not too many abstraction;
I add some more: Support CPU/GPU single-machine and multi-computer distributed computing;
Very friendly to the novice, different from the TensorFlow to the use of the neural network to understand the requirements of high;
The outer layer encapsulates a number of mainstream models, some of which are already mentioned in our tutorial.

Q: I am a complete novice in this piece, and now is the front end, currently looking at linear algebra and Python. I am ready to take a long time to get started, but I don't know any good advice on the specific course of study. The other is to learn Python first and then try to start based on the framework, or lay a good foundation to learn the line generation, statistics, and then look at the neural network related knowledge.

A : it is advisable to learn Python side by side, deeplearning need a lot of practical experience, advice to learn while doing, also help you understand the neural network.

Q: Where does the programmer begin to learn math? How to apply it faster to AI? How will ordinary programmers navigate the new wave of AI?

A : where does the programmer begin to learn mathematics? What are you going to learn? Mathematics is very general ...

How to apply it faster to AI? Clone the Paddlepaddle code this evening, then run it tomorrow and start the day after the tutorial, for one months. (If you build a development environment, Docker seconds) compiling from source code is slower than possible. Anyway, still see engineering experience, in short, compile time no pit, and have problems can mention issue.

How will ordinary programmers navigate the new wave of AI? Do what you are good at.

Q: Want to ask, in the existing network effect is not ideal, in addition to improve the characteristics, how can quickly find other improvement effect of the breach?

A : individuals feel that changing features/data is the quickest. Other, or specific problems specific analysis, to change the network, the need for specific analysis is to fit, or pre-fit. See if you want to increase the network layer, do not add trick (such as dropout) or something.

Q: Is there a autograd feature in Paddlepaddle, and if not, will this feature be developed? The current Paddlepaddle document says that if you want to add a new layer, you need to write the calculation of the forward backward direction manually. and Theano/tensorflow/mxnet has Autograd function can write only forward.

A : I understand that there is no development, TensorFlow is Autograd, but TensorFlow and Theano are relatively slow, there is a paddlepaddle and several other platform efficiency benchmark.

Q: When using the deeplearning algorithm, the training data volume requirements are not very large, before the test on the basis of not understanding, the amount of data too little effect is poor, how to grasp the amount of training data?

Answer: look at the training curve, whether or not to fit.

Q: training data for a number of reasons, can not fully guarantee the characteristics of a more reasonable distribution ratio, in the DL use of artificial non-participation in the characteristics of the premise will not lead to the effect of other machine learning algorithm worse?

A : It is possible, so there are suggestions, the amount of data is too small to use the rules.

Q: First as a starter, how to use TensorFlow and other sources of AI software? What is the second occasion to use supervised learning? On what occasions do unsupervised learning be used or mixed?

A : in fact, the officer network has detailed download, compiling instructions. Supervised learning--there is a label, and the label is plentiful. Unsupervised-no label. Mixed--There is a label sample is not enough, you need to first use unsupervised training feature, initialize the model and then use a label to do supervised learning.

Q: How can I find areas where deep learning could be applied? Or: The usual researcher, how to chat, chat, find a domain problem, you can use some kind of DL algorithm to try? However, a certain area of the problem, early can be determined, do not try.

A : see paper, see what conclusion have to solve the problem, there are other paper how to spray other paper, will find the problem to be solved, or some other people's ideas. In fact, the best advice is generally speaking, or first, there is a goal problem. and find out if there's any paper.

The best judgment is whether the data set for the scenario should be sufficient.

Q: 1) How to learn the basics of how to get started in a normal Java programmer, please recommend a few thin introductory books. 2) What to add to the knowledge of the programmer in the field of machine learning for big data, such as Mr Spark, and what is a relatively simple path to development. 3) What is the full project structure of the practical application of deep learning and what kind of personnel are needed?

A :1 basic knowledge, actually feel our online tutorial really can, very good start. Introductory books, think of the next in fact or the best foreign doctoral dissertation, you can refer to Zhou Zhihua Teacher's "machine learning" watermelon book, Andrew Ng's course is fine. 2) Ibid. 3) 1. Process data; 2. Design Network; 3. Tuning/adjusting the network; 4. If there is no implementation in the framework such as functional paddlepaddle, own development. This is basically the iterative process.

Q: Is there any good advice on the problem of feature distribution in addition to training data? The characteristic distribution of the corresponding categories in the training data is the problem, not the ideal state, for example, for some emotional classification, the attributes should be evenly distributed in each category and affect the affective words. Specifically, for some emotional classification, the attribute should be distributed evenly in each category, affecting the emotional word, but in many cases the actual training data, the attribute distribution is not uniform, similar feature distribution has a lot, so that the use of features results will be biased.

A : that is, the characteristics of no discriminant, such as "the film is very good-looking" vs "this movie is not good to see", a word difference, if not discriminant, explain the model is not good.

This is still a film embedding poorly trained, such as traditional methods will have this problem, but DL with time series model, the semantic understanding, in theory, unless said, your data is very uneven, such as your collection of film evaluation is negative ... That can only falsify data, that is, the artificial creation of a "film" positive samples.

Thanks to the people's post and telecommunications publishing house, the winner of this chat was presented with the book "The Acme of Science: A ramble on artificial intelligence".

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More