Eight common mistakes in smart applications

Source: Internet
Author: User

Eight common mistakes in smart applications

 

At this point, we have introduced the basic knowledge related to smart applications. Now you have an overall understanding of what smart applications are and how to use them. You may not be disappointed to start writing code. Apart from this chapter, each chapter will introduce you to new valuable code.

But before entering the exciting and economically interesting Smart Application World, let's look at some common mistakes or misunderstandings in projects with smart functions. You may be familiar with eight misunderstandings of distributed computing (if you have not read it, refer to Van den Hoogen's industry comments), which lists common mistakes of programmers who develop distributed applications for the first time. Similarly, based on this tradition, we will introduce eight common mistakes in intelligent application development.

Misunderstanding 1: Data is reliable

Many factors may result in unreliable data. Before considering specific intelligent algorithm solutions, you must first determine whether the data is trustworthy. If there is a problem with the data, even the smartest person will usually come to the wrong conclusion.

There are many possible causes of data errors, which cannot be listed here. The following lists some representative factors that may cause data errors:

· Data used during development cannot represent data in the product environment. For example, users in a social network are classified as "high", "General", and "short" by height ". If the shortest user's height in the data used in the development phase is, it is possible to come to the ridiculous conclusion that "You are too short to reach.

· Data may contain missing values. In fact, unless the data is artificial, the data will certainly contain missing values. The processing of missing values requires a lot of skill. Generally, you can either keep the missing values unchanged or fill them with some default values or calculated values. Both cases may result in unstable implementation.

· Data may change. Database policies may change, or the semantics of data in the database may also change.

· Data is not normalized. Let's assume that we focus on the weight of each person. In order to draw a meaningful conclusion based on the weight, all units should be uniform: lbs or kg, the two can only take one, not the combination of the two.

· The desired algorithm may not be suitable for data. Data has different forms, that is, data types. Some datasets are of the numerical type, while others are not. Some datasets can be sorted, while others cannot. Some data sets are discrete (for example, number of people in the room ), some are continuous (for example, temperature ).

Misunderstanding 2: computing can be completed immediately

Computing of any solution takes time, and the speed of application feedback is crucial to the economic success of the business. We cannot blindly assume that the application can complete all calculations on all datasets within a limited feedback time. Therefore, we need to carefully test the performance of algorithms in various operations.

Misunderstanding 3: No need to consider the data size

When we discuss smart applications, scale is very important. The scale of data has two impacts on the entire application. The first is the speed of feedback mentioned in error 2. Second, it is how to obtain meaningful results from massive data. If there are only 100 users, the system may recommend very reliable movies or music to users. However, when the same algorithm faces 100000 users, it may become very bad.

In some cases, the more data the application is, the more intelligent the application is. Therefore, the impact of data scale is multidimensional. You should always ask yourself: Do I have enough data? If data is increased by 10 times, what will happen to my smart applications?

Misunderstanding 4: do not consider the scalability of the Solution

Another misunderstanding related to errors 2 and 3 is that intelligent application solutions can be continuously expanded by increasing the number of computers. Do not think that any solution is scalable. Some algorithms are scalable, while others are not. For example, we want to divide billions of new news headlines into several groups based on similarity. Not all clustering algorithms (See Chapter 4th) can be parallel. Therefore, the scalability should be considered in the application design stage. In some cases, data can be split, and then the algorithm is applied to smaller data after segmentation to achieve parallel processing. If you are lucky, the algorithms selected in the design may have parallel versions. However, because of the importance of algorithms in intelligent applications, many infrastructure and business logic are centered on algorithms, therefore, in the initial stage of design, we need to focus on the scalability of the selected algorithm.

Misunderstanding 5: use the same method everywhere

If the same mature technology can be used repeatedly to solve various problems related to intelligent behavior in applications, this is undoubtedly full of temptation. Try your best to reject this temptation! I have seen someone trying to use the Lucene search engine to solve all the problems in the world. If you are doing the same thing, remember this experience: if you hold a hammer in your hand, everything looks like a nail in your eyes.

 

Similar to other software, Smart Application Software has certain application fields and limitations. Perform a thorough test before applying your favorite solution to a new field. In addition, we should observe every problem from a new perspective. Different algorithms may be able to solve new problems more efficiently.

Misunderstanding 6: always know the computing time

A typical example of such misunderstandings involves optimization issues. In some applications, minor changes in parameters can significantly change the computing time. People always hope to solve the problem within the limited feedback time after changing the problem parameters. If we only calculate the distance between two geographic locations on the earth, the computing time certainly has nothing to do with the specific geographical location, but this does not apply to all problems. In some cases, slight changes in data can cause dramatic changes in computing time, sometimes even the difference between several seconds and several hours.

Misunderstanding 7: Better complex models

Nothing is too late. First, we should start with the simplest model. Then, in the solution, gradually add other smart elements to improve the effect. The kiss (keep itsimple, stupid) principle will always be a good partner of software engineers.

Misunderstanding 8: biased Model

If someone says this, there are only two reasons: ignorance or prejudice! Bias has been introduced when selecting the desired model and data used for training algorithms. Here, we cannot discuss the prejudices in the learning system from a scientific perspective, but we should note that the prejudices in the solution always tend to describe our model and our data. In other words, prejudice limits our solutions to known facts or methods for obtaining these facts, while generalization attempts to deduce unknown things from known facts.

 

This article is excerpted from intelligent web algorithms.

Book details: http://blog.csdn.net/broadview2006/article/details/6702401

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.