Generally speaking, the learning process usually means making mistakes and choosing the wrong path first, and then figuring out how to avoid these pitfalls in the future.
Machine learning is no exception.
When you use
machine learning in your business, be careful: some technical marketing may tell you that the process of machine learning is fast and good, but this is an unrealistic expectation of technology. The fact is that errors are bound to occur in the machine learning process. And at least for a considerable period of time, these errors will be coded into the business process. As a result, these errors now happen on a large scale and are usually not directly controlled by people.
Ray Johnson, the chief data scientist of SPR Consulting, said: "Only the desire to go forward blindly and the lack of due pragmatism and diligence will cause the benefits of machine learning to be almost useless."
Detecting errors in the
machine learning process and dealing with them will help you achieve greater technical success and meet your expectations for machine learning.
The following are some questions about the mistakes made by machine learning tools in the learning process. These problems may increase the number of errors and prolong the time to make mistakes-machine learning tools themselves may never recognize and correct these mistakes.
Lack of business understanding of the problem makes machine learning fail
Some data workers who use machine learning models do not really understand the business problem that machine learning is trying to solve, and this may introduce errors into the process.
Akshay Tandon, vice president and head of strategic analysis at financial services website LendingTree, said that when his team uses machine learning tools, he encourages it to start with hypothetical statements. The statement should ask what the problem you are trying to solve and what model you want to build to solve the problem.
Tandon said that from a statistical point of view, the machine learning tools available today are very powerful. In this way, using it correctly becomes a more important responsibility, because these powerful tools, if not used carefully, can lead to wrong decisions and far-reaching consequences. If the data analysis team is not careful, the model they end up with may not match the specific data the team is trying to learn. The result of rapid deterioration, he said, is that a major accident may happen soon.
In addition, many commercial users do not understand that the quality of the model will drop to a certain extent from the moment it is put into production, Tandon said. After realizing this, just like a car or any other machine, the user needs to continuously monitor it and pay attention to how it affects decision-making.
Poor data quality can lead to machine learning errors
Garbage in, garbage out. If the data quality is not up to standard, machine learning will be negatively affected. Poor data quality is one of the most worrying issues for data administrators. No matter how good the original intentions of data scientists and other information professionals are, poor data quality can jeopardize big data analysis and ruin their efforts. It can completely mess up the machine learning model.
Organizations from all walks of life often overestimate the resilience of machine learning algorithms, but underestimate the impact of bad data. Johnson said that poor data quality can lead to poor data results, which in turn can lead to unwise business decisions. The results of these decisions will harm business performance and make future plans difficult to obtain support.
Based on past and current experience, you can find the existence of low-quality data from the results of machine learning, because the results of these data seem to make no sense.
Johnson said that exploratory data analysis (EDA) is a proactive way to solve this problem. EDA can identify basic data quality issues, such as outliers, missing values, and inconsistent threshold values. You can also use techniques such as statistical sampling to determine whether there are enough data point instances to adequately reflect the overall distribution, and to define rules and strategies for data quality remediation.
Incorrect use of machine learning
Sally Epstein, an expert machine learning engineer at Cambridge Consultants, a consulting firm, said: "The most common problem we still see from companies is that companies are eager to use machine learning for no other reason, just because it's fashionable." But she said, it must be right. Use this tool to achieve success. Traditional engineering methods may provide solutions faster and at a much lower cost.
Johnson said that when machine learning may not be the best choice to solve the problem and the use case is not fully understood, it may lead to the wrong problem being solved.
In addition, solving the wrong problem will result in lost opportunities because organizations are struggling to customize their use cases to specific, inappropriate models. This includes waste of resources deployed in personnel and infrastructure in order to obtain results, but this result could have been obtained by simpler alternative methods.
To avoid the wrong use of machine learning, please consider the required business outcome, the complexity of the problem, the amount of data and the number of attributes. Johnson said that relatively simple problems, such as classification, clustering, and association rules with a small amount of data using a small number of attributes, can be handled through visualization or statistical analysis. In these cases, adopting machine learning may require more time and resources.
When the amount of data becomes huge, machine learning may be more appropriate. However, it is not uncommon to pass a machine learning exercise before discovering that the business outcome has not been clearly defined and leads to the wrong problem being solved.
Machine learning models may be biased
Using poor quality data sets may lead to misleading conclusions. Not only does it introduce inaccuracies and missing data, it also introduces bias. Humans may be biased, so models created or inspired by people may also contain biases.
Epstein says that each machine learning algorithm has a different sensitivity to unbalanced classes or distributions. If these problems are not solved, the result you may end up with will be, for example, facial recognition tools that are dependent on skin color, or models with gender bias. In fact, this situation has happened many times in commercial services.
The accuracy of the conclusions—whether it is made by algorithms or humans—depends on the breadth and quality of the information being processed. Vic Katyal, head of consulting and analysis services at the consulting firm Deloitte, said that the financial, legal and reputation risks brought about by the algorithmic biases faced by organizations and individuals are why any company that uses machine learning should use ethics as an example of organizational requirements.
Katyal said that signs of algorithmic deviation have been well documented in public areas such as credit scoring, education courses, recruitment and criminal justice decisions. Improperly collected, planned, or applied data may even introduce bias in the most carefully designed and planned machine learning applications.
He said that inherently biased machine learning systems may put some customer groups or social stakeholders at a disadvantage, and may cause or continue unfair results.
Consulting firm McKinsey pointed out in a 2017 report that algorithm bias is one of the biggest risks of machine learning because it affects the actual purpose of machine learning. The company stated that this is an often overlooked defect that can lead to costly mistakes, and if not controlled, it could lead projects and organizations in completely wrong directions.
McKinsey said that if this problem can be solved effectively from the beginning, it will be rewarded with great returns, thereby maximizing the true potential of machine learning.
Insufficient resources to do well in machine learning
When starting a machine learning program, an organization can easily underestimate the resources it needs in terms of personnel and infrastructure. Machine learning may have a lot of infrastructure requirements, especially in image, video, and audio processing.
Johnson said that if you don't have the necessary processing power and you have to develop a machine learning-based solution in time, it will be difficult at best, and impossible at the worst.
There are also deployment and consumption issues. If there is no prerequisite infrastructure to allow its deployment and users to consume the results, then what is the use of developing machine learning solutions?
Deploying a scalable infrastructure to support machine learning can be expensive and difficult to maintain. However, there are several cloud services that can provide a scalable machine learning platform that can be configured on demand. Johnson said that cloud methods can perform machine learning on a large scale without being constrained by physical hardware acquisition, configuration, and deployment.
Some organizations want to internalize their infrastructure. If this is the case, cloud services can be used as a stepping stone and an educational experience, so that these organizations can understand what machine learning needs from an infrastructure perspective before investing heavily.
From a personnel perspective, the lack of knowledgeable resources, such as data scientists and machine learning engineers, may derail the development and deployment of machine learning. It is important to have talents who understand machine learning concepts and their applications and interpretations to determine whether specific business results have been achieved.
Johnson said that the importance of having a wealth of machine learning skills cannot be underestimated. Knowledgeable people can help identify data quality issues, ensure that machine learning tools are used and deployed correctly, and help establish best practices and management strategies.
Poor planning and lack of management can destroy machine learning
Efforts on machine learning may start with enthusiasm, but then lose motivation and come to a halt. This shows poor planning and lack of management.
If proper guidelines and restrictions are not adopted, machine learning work will continue indefinitely and may result in huge resource expenditures without any benefit, Johnson said.
Organizations need to remember that machine learning is an iterative process, and model modifications may continue to occur over time to support changing requirements. As a result, people engaged in machine learning may lack interest in getting the job done, which may lead to undesirable results. Project sponsors may move to other jobs, and machine learning work will eventually stagnate.
Johnson said that machine learning needs to be monitored regularly to ensure things go smoothly. If progress starts to slow down, it may be time to take a break and revisit the project.
Original: https://www.infoworld.com/article/3310076/machine-learning/6-ways-to-make-machine-learning-fail.html