Website Data analysis: Difficult to interpret data exceptions

Source: Internet
Author: User
Keywords Website Data analysis

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest stationmaster buy cloud host technology Hall

In the analysis of data, there are always some data anomalies can not find appropriate reasons for reasonable explanation, may be a different perspective to see these anomalies. Why is there a big fluctuation in the data? We racked our brains still can't find the reasonable reason, these exactly is what kind of anomaly, whether there are some commonalities, or these anomalies are not what we usually say, or should be grouped into other categories, may call them "difficult to explain the anomaly."

Reading the book "Thinking, Fast and slow" recently, the author Kahneman's point seems to give us some answers. Kahneman is a master of psychology and decision making, and he tells us how to avoid the misunderstanding of brain thinking so as to make cognitive and decision-making more rationally. Here are two points of reference in the book that relate to the question "unexplained anomalies" above:

Regression mean effect: things experience random fluctuations of good or bad, but eventually return to the average level.

Explain random events with causal relationships: people are always trying to find explanations for some changes.

Matt and Rafa Benitez

For the regression mean effect (score reversion), Kahneman a number of sports-related examples, indeed this phenomenon is more common in sports: why the golfer can't play the day before, the player why the second season can not reproduce the previous season's glory ... This reminds me of the recent Chelsea change.

In fact, there are some interesting similarities between Andy and Rafael Benitez: 1 are champions of the Champions League, and 2 are not fully accredited. If Matt does not have enough time to justify his coaching skills, it is clear that Rafa Benitez's choice has led to doubts about his ability.

  

In the back of last season, Matt has taken over Chelsea from the assistant coach, and as a caretaker manager all the way through, and finally win the Champions League, the success of the team to bring back the first big Ear cup is enough to be able to be at the end of the season, but because the lack of coaching experience can not let the picky boss have enough trust, So when the fans and the club are still immersed in the glory of last season's Champions League, and the team's performance can not continue to "deserve" the glory, the fate of the class is doomed. In the Premier League, where the competition is so intense, Chelsea cannot shake off the return effect, and if Chelsea were to succeed in the Champions League on the basis of a combination of good factors and some luck in the past season, the benefits of the season are no longer focused on them, and their luck seems to have "run out", The average level before the return was normal, and the fans and clubs in the past had apparently seen it as an "anomaly," and Matt became the victim of a return effect.

In fact, this kind of events in football, the World Cup 98 France, 25 star Brazil, 06 Italy are difficult to escape the return effect, after the title decline, and many coaches also resigned after the title, because they also know to continue to glory (out of the return effect) is so difficult, Scolari, Lippi and so made a wise choice, and the champions of the team's coach is often the most bumpy, after all, like Bosque so that the Spanish continued to continue the brilliant coach is really not much, and Benitez is just a sad reminder of the successor.

Rafael Benitez in 2010 to replace Jose Mourinho to become the three-king Inter Milan manager, the three Crown King's aura is too dazzling, and the lineup of aging plus bad, doomed to let Inter to embark on the road of return effect, so the season less than half, Benitez is fired. In fact, Rafa Benitez's previous career was not too bad, famous for the Crazy "Istanbul Night", but it is because the legend of the war has become an insurmountable monument, even after helping Liverpool to win a number of races in the championship, it can not make the club and fans really satisfied, And Benitez's decision is not in any brilliant or nearly brilliant (07 although Milan revenge Athens, but is also a Champions League runner-up) time to opt out, until the end of the Liverpool record can not see the status of the loser to leave. Benitez really should be to the Silver Fox ready or cunning Jose Mourinho to learn what is called the retire.

And this time, Benitez has once again chosen the Champions League champions under the halo of Chelsea, although the halo has faded, we can only wish him good luck.

Collapsed bridges with sudden quiet classrooms

  

Resonance (resonance) spawned the Big Bang, forming the stars and the Sun and the world, resonance is one of the most common phenomena in nature. A group of soldiers riding through a bridge in Ange, France, resonance caused the collapse of the bridge, this example was introduced into junior high school physics textbooks, which became our understanding of the resonance principle of the Enlightenment memory. But what causes the resonance, and then the collapse of the bridge such unusual events, under normal circumstances, a group of soldiers also march through the same bridge, perhaps tens of thousands of times before a bridge collapsed, soldiers are ordinary soldiers, the bridge is a normal bridge, resonance is a random event, but it is because the probability of such events is too small, So people are always trying to find a reason for a soldier or a bridge (but sometimes it's because of a bridge).

Then there is the question of knowing: why are the noisy classrooms suddenly quiet when everyone is talking? This may have been encountered, but also a similar small probability event, everyone in the classroom is talking on and off, normally the size of the sound is always kept in a horizontal fluctuation, But there may be a moment when the number of people talking at the same time decreases, and the sound is randomly fluctuating to a low point, and this time everyone will think that it is not what happened, the teacher came? So they did not speak, the classroom suddenly silent, a silence. Everyone felt the "anomaly" of the classroom sounds and tried to find possible causes for the anomaly.

What caused these "anomalies"?

First of all, the regression mean effect, the general performance of things in a certain period of time performance is very good, and then return to the normal level of a process. This is a normal process, because things in a lot of factors under the common influence there are always some random fluctuations, the key is that people always hope that the good state can continue, and when things from a great state of decline, because the gap is large, so it is easy to return to the mean after the state as an "anomaly." The following figure:

  

The curve of section A, even if it swings up and down, but it is generally not considered unusual, but the C curve is easily mistaken for an anomaly because it is easy to compare C and B, rather than the mean level of a segment (the Green line shows that C and a are not significantly different). Because this gives a complete curve change trend, so the possibility of making this error will be reduced, but when we compare the data changes in a short time, or simply look at the same chain of data, it is easy to mistake the regression mean as an anomaly. Therefore, the analysis of data to combine long-term trends, when the state of things did not occur qualitatively and the data rose to a higher level, do not think that good data performance can always be sustained, because good data performance is only a normal random fluctuations caused.

The regression mean effect is explained, also need to make clear is that although most of the time there are small random fluctuations, but occasionally there will be large fluctuations, that is, excellent or very poor state, as in the state of the B section above, how do we determine that this state is also random, not abnormal it, Can't it be because it's hard to explain and not to take too big a data fluctuation as an anomaly?

This problem can be explained from the point of view of physics, first look at the superposition principle of the next wave (superposition principle):

  

The following 2 waves in the left figure synthesize a larger amplitude after superposition, while the following 2 waves in the right graph interfere with each other, and the amplitude is reduced to zero after synthesis. Extended to the data changes in the situation, the general one index will be affected by a number of factors, such as the number of visits to the site will be affected by multiple channels of data fluctuations, search engines, external links, social media, pay ads and other external channels brought about by the flow is always changing, the following figure:

  

When the flow of a certain channel is abnormal, such as a line, or because of external factors, such as spring Festival or holidays, all channels of traffic may be generally reduced, such as the B line, these can lead to the total number of access anomalies, these anomalies can be explained. C-line data for each channel does not appear obvious exception, but because the flow of multiple channels because of random fluctuations happened at the same time to a lower point, the total number of visits will also appear significantly below the normal level of the situation, so there is a "difficult to explain the exception."

So, these "unexplained anomalies" mystery can be revealed, when many factors at the same time in a certain indicator, even if all the factors are not significant anomalies, the indicator data may still behave abnormally, although this probability is very low, but it does occur, This is due to the superposition effect of a combination of factors, and if there is no obvious anomaly through the influence of the subdivision index, then do not try to find the explanation for the "unexplained anomaly".

» This article uses the»in agreement, reprint please indicate the source: the Website Data analysis» "The difficult explanation data anomaly"

Original: http://webdataanalysis.net/personal-view/unexplained-anomaly/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.