Data basketball Now the NBA's measurable data is as vast, and every little action on the player's pitch can deepen your understanding of its value. Let's go into the big data age together. Author: KIRK Goldsberry
On February 13, 2013, the San Antonio Spurs came to Cleveland against the Cavaliers and played a very anxious game. Until the fourth quarter, Knight rookie point guard Dieng-Wetters hit his rookie season's biggest heart goal, a difficult jumper, to help the team in the last 9.5 seconds scored 2 points in the lead. But the problem is, he left the Spurs 9.5 seconds, the audience has been sniffing the taste of the comeback, the speed center of the Knights of the fan mood even began to lose control.
The Spurs called a time-out and got a chance to play a front ball, and they decided to hit one of their favorite tactics. Matt-Bonner quickly sent the ball to Tony Parker, 30 feet away from the rim, after Duncan made a solid block, forcing Taylor-Zeller rotation Parker, Parker, to seize the moment and quickly break from the left. With 6.7 seconds left in the game, Parker controlled the balance and was ready to come up with one of his highest-hit layup to tie the score. Suddenly he saw something and then changed his mind.
Cohwai-Leonard stayed quiet at the weak side of the corner, and no one noticed him. Because Parker was so adamant and aggressive, his breakthrough succeeded in attracting Leonard's defender, Wetters, who has been drawn to the forbidden area. And Leonard was quietly pestle in that No-man's-land, Parker soon noticed this, so Parker wrist a fling, send out a precise assists, the ball along the line straight to Leonard, Wetters desperate to jump up also just in vain. The rest is routine, at which point the beautiful assists have been completed, and Leonard only needs to cast his favorite corner three points in his favorite position. He throws in, and the Spurs take the advantage of a point to win.
Statistically, this nifty tactic is simplified for several basic numbers: The Spurs 2nd player Cohwai-Leonard adds 1 shots, 1 hits and 3 points. Tim-The background-Duncan's defenders have no record, and Parker's sharp cover break and brilliant pass are finally credited with an assist.
After that, Parker mentioned the final game: "I think I could have hit that layup, but I saw the vacant branch." I'm not just going to win a victory, I'm going to organize my teammates so they can play well all night, so I'm willing to make the right decision at the last minute. ”
The creation of a research group
Shortly after the 2012 MIT Sloan QSL Analysis Conference, I received a call from Brian Coop, who was St. John of the NBA player tracking data world and head of the Stats LLC project at the Chicago SportVU Company. When I was working at Harvard, Coop shared his remarkable academic achievement-basketball dataset; He asked me if I would also like to "play some optical tracking data". I missed the chance because I had absolutely no idea what I was going to do.
It was a few weeks after the call, and I first saw the basic data for that project, which consisted of many, if ever, changes in the way the basketball was analyzed; This is definitely a "super God" (Holy shit!) moment. At that time I was using a huge 27-inch Apple Computer, but when I double-clicked the first file of the SPORTVU, the data filled the entire screen immediately. All I see is a vast ocean of decimal points and tracking data and hundreds of of XML tags interspersed with them. I immediately realized that this was obviously the most "big" data I have ever seen. I'll never forget how surprised I was when the player in the screen was being followed by the tracking data from one game to the next. I have thousands of of these files, and I think I need to find some help.
I found Luke Burke, a young professor of space statistics, and I told him about my predicament. Luke suggested that we set up a research team in the school to use the data to build the project. The group soon attracted 4 doctoral students in data statistics and computers. By the beginning of the 2013, each student had established a different project. We call these items "XY Hoops".
Dan-Sevogne and Alex-De Amor are our first two members. These two students are 27-year-old four-year doctoral students in the Harvard study in the field of statistics, they all like sports, but they prefer the data coding. After looking at the pile of data, we soon had some brainstorming, and then they joined our team and presented a revolutionary, almost impossible idea.
The plight of the forerunners
Generally we all pursue the best analytical equipment, but the pioneers often inevitably have the problem that there is no best analytical equipment. There is only one metric to explain life, but you can't find a measure to explain basketball. In the Contemporary Movement data analysis field, it is difficult for you to not be inappropriate to promote the "Big data" role, but it is very risky to idealize the legend. The data must be simplified as an intermediary form in order to unite the players ' performance and statistical analysis, while the motion analysis is built on the large coding and decoding mechanism, the premise of which is a flawed assumption-"data can represent movement".
But the reality is that--nba's new president, Adam Huanhua, installed a camera for each arena in 2014 to measure each player's movements. The tracking cameras hanging at the top of the arena generate thousands of of megabytes of data, which are potentially vital information for the video and trainer. Our new bottlenecks are not the data, but the lack of human resources, our analysts are always overworked, they lack hardware software support, lack of professional training, but the most difficult since the project was built-how to carry out these newly generated tasks.
Still, with a group of smart, well-equipped statisticians in charge, SportVU's data is truly astonishing, and its potentially massive information will help us to have a better understanding of the alliance we love. In Coop's words, "We've just done some basic data research, and it takes a lot of time and effort to turn these data into advanced analytics and methodologies." "The big data age of the NBA is just beginning, and people tend to be a pretty dunk, which can cheer the team, the players, the media, and, more importantly, the fans to be excited." We can't guarantee that, but by quoting Parker, we're just making sure we "finally made the right decision."
Why Innovate
Tony Parker is one of the best attacking creators in the world (playmaker). For more than 10 years he has been pushing the Spurs and activating their rigorous and inflexible offensive. Although he has won 3 championships and a finals MVP Note 1, Parker has been underestimated and is considered not a real superstar. This year again, Parker becomes the All-star bench, in front of him is a small talent pitcher. Perhaps this is because he is a foreigner, perhaps, because he plays in the center of Texas, the market is small.
Note 1: Not to mention last year Parker almost got his fourth championship and the second Finals MVP.
But perhaps this is because our data underestimate some of the "details" of players like Parker on the pitch, but overestimate the numbers that are easiest to quantify – such as scoring, such as rebounds, such as assists.
On the one hand, we can't deny the importance of Leonard's three-point lore in Cleveland, after all, he's the guy who throws the ball, but on the other hand, giving Leonard the applause is as much a tribute to George Clooney as the praise of gravity.
"We practiced that ball 1000 times, so I knew we could do it," he said. "San Antonio coach Greg Popovich said after the game.
If we compare this traditional basketball (statistics) to chess, you will find that we attach too much importance to the movement of each step, but ignore the overall situation of those mobile relationship related strategic arrangements. The winning and losing of chess is often not the last step, and the same is true of every ball of basketball. The final shot doesn't mean anything, like Parker and Paul, who can help the team to win a good position from all sides.
In the big Data age, the current statistical system--our data sheet--is a pure input mechanism, yes, this upright and reliable. But it was simply a product of pencil and paper records that didn't really measure the role and contribution of 10 players on the field. It is true that data sheets are so useful now that it is impossible to move from the bill Russell period to the Michael Jordan years and even to the LeBron James era. The theoretical definition derived from it has been translated into what we call "advanced data" and "basketball analysis".
In the past few decades, pioneers such as Ken Pomeroy, Dean Oliver and John Hellfire have introduced the analysis of basketball data into the computer age. They are effective in using spreadsheets and other new computer age-specific computational formulae and analytical methods. We need to use their theoretical thinking to continue to learn, because the innovation of these things continue.
concept, definition and presentation
Earlier in the spring semester of 2013, Sevogne and De Amor planned to build a new project to measure NBA performance (configured value). Their motives for proposing this idea are simple, but the effort needed to achieve their hypothetical estimates is not necessarily. Their core assumptions are:
Each basketball ball has a value of "state". This value is determined by the probability of a basketball event, and the result is the total expected score of the ball right. The average NBA scoring averages close to 1 points, and the exact value of the expected score fluctuates with the change of time, and these fluctuations are caused by a variety of unexpected events in the field.
Not only that, they are also convinced that, using the SPORTVU data, the inspiration we can &m