Many people are brows when they encounter technical debt, but in general, technical debt is not a bad thing. For example, when we need to release a version before the deadline, technical debt is a reasonable means to use. But technical debt has the same problem as financial debt, that is, when it comes to repaying debt, we pay more than we did at the beginning. This is because technical debt has a compounding effect.
Experienced teams know when to repay a pile of debt, but the technical debt of machine learning is piled up very quickly. You may owe debts worth several months in a single day. Even the most experienced team may have huge debts due to temporary negligence, which will take half a year to recover, which is enough to kill a fast iterative project.
Here are three wonderful papers that explore this issue:
Machine Learning: High-interest credit card for technical debt NIPS'14 (NIPS: Neuro Information Processing System Conference)
Hidden technical debt in machine learning systems NIPS'15
What is your machine learning test score? NIPS'16
These papers introduce dozens of machine learning anti-patterns that may become time bombs in the software infrastructure. Here, I will only discuss the three most important anti-patterns.
Feedback loop
When the output of the machine learning model is indirectly fed into its own input, a feedback loop is generated. It sounds easy to circumvent, but in practice it's the opposite. There are many variations of the feedback loop. A paper from NIPS '14 introduces a very typical example, and I will give a more realistic example here.
Example
Suppose your company has a shopping site. The back-end team has proposed a recommendation system that will decide whether to pop up notifications based on the customer's profile and past purchase records. Naturally, you will train this recommendation system based on the customer's previous click or ignore notifications (this is not a feedback loop). After enabling this feature, you will be very excited to see the percentage of click notifications getting higher and higher. You will attribute this growth to artificial intelligence, but what you don't know is that the front-end team implements a fixed threshold, and if the recommended confidence is less than 50%, the notifications are hidden because they don't want to show potential bad recommendations to the customer. . Over time, recommendations that were previously within the 50-60% confidence range are now considered to be less than 50%, so that in the end there is only a 50-100% confidence recommendation. This is a feedback loop. Although your indicator has grown, the system has not improved. Not only do you have to use a machine learning system, but you also have to explore it yourself to avoid using fixed thresholds.
In small companies, controlling the feedback loop is relatively easy, but in a large company with dozens of teams, dozens of complex systems are interrelated, and the feedback loop is likely to be ignored.
If some metrics improve slowly even if optimization is not initiated, then the feedback loop is working. Finding and resolving loops is not easy because it involves cross-team collaboration.
Correction cascade
When the machine learning model no longer continues to learn, and you finally patch the output of the machine learning model, a correction cascade is generated. As the patch builds up, you end up creating a thick layer of heuristics on top of the machine learning model called the correction cascade. In order to take into account the rare special cases that machine learning has not learned, it is very easy to use filters on the output of the machine learning system.
When training all the indicators of the entire system, the correction cascade correlates the indicator that the machine learning model is trying to optimize. As this level of correction cascading becomes more complex, you will not be able to determine if changes to the machine learning model will improve the final metrics and ultimately not know how to improve.
Hobo features
The wanderer feature is a feature that is useless but not negligible in machine learning systems. There are three types of wanderer features:
Bundle features
Sometimes, when there is a new set of features that need to be evaluated together, we will bundle them together and submit them as a whole. But unfortunately, only some of these features are useful, while others are even counterproductive.
ε feature (ε-features)
Sometimes, even if a feature only promotes a little improvement in quality, we tend to add this feature. Then, if the underlying data changes slightly, this feature can quickly become ineffective or even counterproductive.
Legacy features
Over time, we added some new features to the project and did not re-evaluate them. After a few months, some of these features may become completely useless or replaced by new features.
In a complex machine learning system, the only way to effectively remove the characteristics of a wanderer is to try to clear only one feature at a time. That is, only delete one feature at a time, then train the machine learning system and use your metrics for evaluation. If the system takes a day to train, then we can run 5 trainings at a time. If we have 500 features, it takes 100 days to clear all of these features. But unfortunately, features may interact with each other, which means you have to try to clear all possible subsets of features, and the difficulty will rise exponentially.
Combine our strength
If you include these three anti-patterns in the machine learning infrastructure, it will be a disaster for the entire project.
Because of the existence of feedback loops, your metrics will not reflect the true quality of the system, and machine learning models will learn to use these feedback loops instead of learning useful things. In addition, over time, your model may be inadvertently shaped by the engineering team to make more use of these loops.
The correction cascade reduces the correlation between the indicators directly measured by the machine learning model and the entire system. It may have a positive effect on the machine learning model, but the impact on the overall system indicators is random.
Because of the characteristics of the wanderer, you won't even know which of the hundreds of features are useful, and the cost of clearing them all is too great. In daily work, the monitored indicators may fluctuate randomly, as some garbage features will work randomly.
In the end, the indicators of the project will fluctuate randomly and cannot reflect the actual quality and cannot be improved. The only way out is to reinvent the whole project.