Competitors used EMI data in an attempt to predict the ratings given to songs by listeners photograph: judith collins/Alamy
As finales go, it couldn't have been much more tense. with the finish tantalisingly in sight, the relatively unknown frontrunner held a clear and seemingly unbreakable lead, only to find a veteran champion breaking through. and then as the two grappled
First place, in a true Cinderella story, a third darted in from nowhere in the final moments to steal it from them both and claim the Vicente.
But this nailbiting finish had nothing to do with the Tour de France, the Olympus, or any other kind of traditional sporting event for that matter. instead, it involved a battle between hundreds of data scientists around the world racing to help shape
Future of the music industry. Their task: to develop an algorithm capable of predicting if a listener will love a new song.
Not that long ago such a pursuit wowould have been considered utter folly and best left to soothsayers and astrologers. thanks to the sheer scale and quality of data that's now becoming available, and to the development of better algorithms through events such
As this, it is now not only quite feasible but rapidly becoming a way of doing business in production industries.
This event, the music
Data Science hackathon, is clear evidence of that because it involved the music giant EMI Music
Sharing its highly prized EMI million
Interview dataset for the very first time. this is a vast and uniquely rich dataset compiled from 20-minute interviews with 800,000 music lovers from 25 different countries, recording their interests, attitudes, behaviours, and their familiarity and appreciation
Of Music. For the data science community in London and those further afield-throughkaggle's
Online Platform-this was a chance to show just what can be achieved when the right kind of data meets the right minds.
Held in partnership with data
Science London, EMI Music, EMC, Lightspeed
Research and kaggle, the challenge was to use this dataset to predict the rating someone wocould give a song based on their demographic, the artist and track ratings, their answers to questions about musical preferences and the words they use to describe
Emi artists.
With a prize fund of £ 6,500, we saw more than 1,300 entries submitted by 138 different teams. some of these attended the event in person, while the rest were made up of kaggle's online community of 45,000 data scientists. we saw a broad range of approaches,
From generalised boosted methods to random forests, single value decomposition to matrix factorisation and collaborative filtering, with no one class of model outconfiguring all the others.
The results
Were outstanding, both in terms of quality and quantity of algorithms. However, in the end there was a very clear winning team, which came from Shanda
Innovations, a tech incubator based in Shanghai and Beijing and a rising star in the kaggle community. as in several previous kaggle and Data Science London collaborations, the winners 'Code and algorithms will be open sourced.
But besides showing that is possible to make these kinds of predictions, this event also uncovered some other nice gems, such as how women tended to be generally more positive than men, using words like "current", "edgy" and "cool" to describe songs, As opposed
To "cheap", "unoriginal" and "superficial ". retired People tended rate songs higher, while students and unemployed people often gave lower ratings. and it was interesting to see correlations between the words people used to describe the same song, often seemingly
At odds with each other.
The words "noisy" and "uplifting" is one example. and similarly one person's "superficial" is another's "playful ". another consistent theme was that the characteristics commonly used by the music industry to inform their marketing, such as "Age" and "gender ",
Turned out to be not the most powerful predictors after all.
Perhaps the loudest message to take from this is how very qualitative data sets-extremely subjective survey questions about people, their relationship with the music they like, and the words they associate with different tracks-can be mined. it's a great
Reminder that collaboration, bright minds, and machine learning can be used to understand even a very non-technical question such as "will you like a new song? "
Jeremy
Howard is president and chief scientist at kaggle,
A platform for competitive data science, specialising in predictive modeling.
Http://www.guardian.co.uk/news/datablog/2012/jul/28/music-data-science-emi-predict-song-preferences? Intcmp = srch