in theMachine Learning, there are many ways to build a product or solution, and each assumes a different scenario. Many times, it is not easy to browse and identify which assumptions are justified。for the newly-started machine learning of children's shoes, in the work may often encounter a variety of problems, this article and we share is the machine learning engineers often committed6a mistake, take a look at it, hope to work and study for everyone to help. take the default loss function as a matter of coursein the beginning, the mean square error is very good, it can be said that this is an amazing default settings, but for real-world applications, this non-specifically designed loss function can rarely give the optimal solution. take fraud detection as an example. To align with your business goals, what you really want is to penalize the negatives in proportion to the amount of dollars lost as a result of fraud. The use of mean square error may give you good results, but will never give you the most advanced results at the moment. Become a machine learning engineer|Section3Step: Choose your tool to see this article and see what you can do with the differentMLtools. Important: Always build a custom loss function that fits perfectly with your solution goals. Use an algorithm/method for all problems Many people will complete their first tutorial and immediately start using the same algorithms that they can imagine for each use case. This is very familiar and they think it can work like any other algorithm. This is a false hypothesis and can lead to bad results. Let your data choose the model for you. After the data is pre-processed, it enters many different models and views the results. You'll get a good idea of which models work best, and which are poorly designed. Become a machine learning engineer|Section2Step: Select a process to see this article and master your process. Important: If you find yourself using the same algorithm over and over again, it may mean that you don't get the best results. Ignore outlier valuesbased on context, outliers may be important or completely ignored. In the case of pollution forecasts, there may be a large peak in air pollution, and it is best to look at them and understand why they occur. In the case of exceptions caused by some type of sensor error, it is safe to ignore them and remove them from the data. from a model point of view, some people are more sensitive to outliers than others. InAdaboostas an example, it gives a great deal of weight to outliers, and the decision tree may simply treat each outlier as an incorrect classification. Become a machine learning engineer|Section2Step: Select a process to avoid this error through best practicesImportant: Always keep a close eye on your data before you start working, and determine whether to ignore outliers or look at outliers does not handle periodic features correctlyThe hours of the day, the days of the week, the months of the year, and the wind are examples of cyclical characteristics. Many new machine learning engineers do not believe that these functions can be converted to preserve such23hours and0The expression of information that is near and not far from each other. take the hour as an example, the best way to deal with this problem is to calculateSinand theCoscomponent to represent your loop characteristics as(x,y)the coordinates of the circle. Within this hour of expression, atand the0The hours are next to each other in numbers, just as they should be. IMPORTANT: If you have recurring features and you haven't converted them, then you're starting to use model junk data. No standardized l1/l2 regularization L1and theL2regularization of penalty large coefficients is a common method of adjusting linear or logistic regression, however, many machine learning engineers do not know that this is important for standardized functions before applying regularization. Imagine that you have a linear regression model in which a trade is a feature. Standardize all functions and place them on an equal footing, so that normalization is the same in all functions. Important: Regularization is great, but if you don't have standardized features you might be more troublesome interpret coefficients of linear or logistic regression as feature importancethe linear regression function usually returns for each coefficientPvalues. Many times, these coefficients make the novice machine learning engineer think that the larger the system value, the more important the feature is for a linear model. Because the size of the variable changes the absolute value of the coefficients, this situation is almost nonexistent. If the characteristics are collinear, the coefficients can be transferred from one feature to another. The more features a dataset has, the more likely it is to feature collinearity, and the less reliable the simple interpretation of feature importance. key: It is important to understand which features affect the results, but do not assume that you can view the coefficients. They often do not tell the whole“Story". doing several projects and getting good grades could win 1 million of dollars. You work hard and it turns out you're doing well, but like any other industry, the devil hides in details and even fantasy plots can hide prejudices and mistakes. This list does not mean exhaustive, but simply for the reader to think about all the minor problems that might be hidden in the solution. In order to achieve good results, it is important to follow your process, and often carefully check whether you have made some common mistakes. Source: Network
What are the mistakes that novice machine learning engineers often make?