Machine learning is a multi-disciplinary subject that has emerged in the past 20 years and involves many disciplines such as probability theory, statistics, approximation theory, convex analysis, and computational complexity theory. Machine learning theory is primarily about designing and analyzing algorithms that allow computers to automatically "learn". Machine learning algorithms are a class of algorithms that automatically analyze and obtain rules from data and use rules to predict unknown data. Because learning algorithms involve a large number of statistical theories, machine learning is particularly closely related to inferred statistics, also known as statistical learning theory. In terms of algorithm design, machine learning theory focuses on achievable, effective learning algorithms. Many inference problems are difficult to follow without program, so part of the machine learning research is to develop an approximation algorithm that is easy to handle.
Machine learning has been widely used in data mining, computer vision, natural language processing, biometrics, search engines, medical diagnostics, detection of credit card fraud, securities market analysis, DNA sequence sequencing, speech and handwriting recognition, strategy games and robotics.
What is machine learning?
Machine learning is a method of data analysis that automatically analyzes the building of a model. By using an iterative learning data algorithm, machine learning allows a computer to discover hidden areas without being explicitly programmed to see where.
Iteration is very important in machine learning. Because of its existence, the model can adapt to the data independently when it encounters new data. They can learn from previously generated reliable calculations, repeated decisions and results. Machine learning is not a completely new discipline - it is a discipline that gains new momentum.
Due to the emergence of new computing technologies, today's machine learning is very different from the past. Although many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data (one after another, faster and faster) is the latest development. You may be familiar with the following examples of widely advertised machine learning applications:
-
A lot of hype, Google autopilots? The essence of machine learning.
-
Online recommendation services like Amazon and Netflix? Application of machine learning in daily life
-
Know what the customer said about you on Twutter? Machine learning is combined with the creation of language rules.
-
Fraud detection? A more obvious and important use in our lives today.
Why are more and more people interested in machine learning?
The revival of interest in machine learning is also due to the same factors that data mining and Bayesian analysis are more popular than ever. In terms of similar amounts of growth and available data, computing is more affordable, more powerful, and affordable for data storage.
All of the above factors imply that machine learning can produce models faster and more automatically to analyze larger, more complex data, with faster transmissions and more accurate results—even on very large scales. The result is? High-value predictions can produce better decisions and more sensible behaviors without human intervention in reality.
The creation of an automated model is a key to generating sensible action in reality. Analytical thought leader Thomas H. Davenport wrote in the Wall Street Journal that the ever-changing, growing data, "...you need fast-moving modeling streams to keep." And you can do it through machine learning. These ones. He also said, "Humans usually create one or two good models a week; and machine learning can create thousands of models in a week."
What is the application of machine learning today?
Have you ever wondered how an online retailer can instantly provide you with a quote for a product that might be of interest? Or how can a lender provide a near real-time response to your loan request? Many of our daily activities are driven by machine learning algorithms, including:
-
Fraud detection
-
Web search results
-
Real-time ads on web pages and mobile devices
-
Text-based sentiment analysis
-
Credit scoring and next best offers
-
Prediction of equipment failures
-
New pricing models
-
Network intrusion detection
-
Pattern and image recognition
-
Email spam filtering
What are the most popular learning methods in machine learning?
The two most widely adopted machine learning methods are supervised learning and unsupervised learning. Most machine learning (about 70%) is supervised learning. Unsupervised learning accounts for about 10%-20%. Semi-supervised and reinforcement learning techniques are sometimes used.
Supervised learning Algorithms use tag instances for training, just like input that is known to require output. For example, a device can have data points marked as "F" (failed) or "R" (running). The learning algorithm receives a series of inputs with corresponding correct outputs, and the algorithm learns by comparing the actual output with the correct output to find the error. Then modify the model accordingly. Through classification, regression, prediction, and gradient enhancement methods, supervised learning uses patterns to predict the value of additional unlabeled data labels. Supervised learning is commonly used to predict historical events that may occur in the future. For example, it can predict when credit card transactions can be fraudulent or which insurance customer may file a claim.
-
Unsupervised learning uses the opposite data without a history tag. The system will not be informed of the "correct answer." The algorithm must figure out what is being rendered. The goal is to explore the data and find some internal structures. Unsupervised learning works well for transactional data. For example, it can identify groups of customers with the same attributes (can be treated the same in marketing). Or it can find the main attributes to distinguish the customer base from each other. Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering, and singular value decomposition. These algorithms are also used for segment text topics, recommending items, and determining data outliers.
-
The application of semi-supervised learning is the same as supervised learning. But it uses both tagged and unlabeled data for training - usually a small amount of tagged data and a lot of unlabeled data (because unlabeled data is not expensive and can be obtained with less effort) . This type of learning can be used in methods such as classification, regression and prediction. Semi-supervised learning is used when a fully tagged training process is too costly for the relevant tags. An early example of this included identifying a person's face on a webcam.
-
Reinforcement learning is often used for robotics, games, and navigation. Through reinforcement learning, the algorithm discovers the maximum return from actions through trial and error. This type of learning has three main components: an agent (learner or decision maker), an environment (all agent interactions), and an action (what an agent can do). The goal is to act on behalf of the selected action to maximize the expected reward in a given time. With a good strategy, the agent will reach its goal faster. Therefore, the goal of reinforcement learning is to learn the best strategies.
What is the difference between data mining, machine learning and deep learning?
The difference between machine learning and other statistics and learning methods, such as data mining, is another hot topic of debate. In simple terms, although machine learning uses many of the same algorithms and techniques as data mining, one of the differences lies in the predictions of these two disciplines:
Data mining is the discovery of previously unknown patterns and knowledge.
Machine learning is used to reproduce known patterns and knowledge, automatically apply to other data, and then automatically apply those results to decisions and actions.
The increasing power of computers is also stimulating the evolution of data mining for machine learning. For example, neural networks have been used for data mining applications for a long time. As computing power increases, you can create many layers of neural networks. In machine learning languages, these are called "deep neural networks." It is the improvement of computing power that ensures that automatic learning quickly handles many neural network layers.
Further, artificial neural networks (ANNs) are simply a set of algorithms based on our understanding of the brain. ANNs can - in theory - simulate any kind of relationship in a data set, but in practice it is very tricky to get reliable results from neural networks. The study of artificial intelligence dates back to the 1950s—labeled by the success and failure of neural networks.
Today, a new field of neural network research called "deep learning" has achieved great success in many areas where past artificial intelligence methods have failed.
Deep learning combines computing power with a special type of neural network to learn complex patterns in large amounts of data. Deep learning techniques currently work best for identifying words in images and words in sound. Researchers are now looking for ways to identify these successful patterns to more complex tasks such as automated language translation, medical diagnosis, and many other important social and business issues.
Machine learning algorithms and processes
Algorithm
SAS's graphical user interface helps you build machine learning models and implement an iterative machine learning process. You are not required to be a senior statistician. We can choose a comprehensive machine learning algorithm to help you quickly get value from big data, including many SAS products. SAS machine learning algorithms, including:
-
Neural networks
-
Decision trees
-
Random forests
-
Asoosciations and sequence discovery
-
Gradient boosting and bagging
-
Support vector machines
-
Mearest-neighbor mapping
-
k-means clustering
-
Self-organizing maps
-
Local search optimization techniques (eg. genetic algorithms)
-
Expectation maximization
-
Multivariate adaptive regression splines
-
Bayesian networks
-
Kernel density estimation
-
Principal component analysis
-
Singular value decomposition
-
Gaussian mixture models
-
Sequential covering rule building
Tools and processes
As we know now, it's not just an algorithm. Ultimately, the secret to getting the most value out of your big data is to pair the best algorithms with the tasks at hand:
-
Comprehensive data quality and management
-
GUIs for building models and process flows
-
Interactive data exploration and visualization pf model results
-
Comparisons of different machine learning models to quickly identify the best one
-
Automated ensemble model evaluation to identify the best performers
-
Easy model deployment soyou can get repeatable, reliable results quickly
-
An integrated, end-to-end platform for the automation of the data-to-decision process
SAS machine learning experience and expertise
SAS is constantly looking for and evaluating new methods. They have a long history of implementing statistical methods to best solve the problems you face. They combine the rich, complex heritage of statistics and data mining with the latest, state-of-the-art structures to ensure your models run as fast as possible (even in large enterprise environments).
We understand that fast time values not only mean the performance of fast, automated models, but also the time it takes to move data between platforms—especially for big data. High-performance, distributed analytics technology that benefits from massive parallel processing combined with Hadoop, and all major data foundations. You can quickly cycle through all the steps of the modeling process—without moving data.