Now big data has become the object of media hype. With the ability to process vast amounts of information at an ultra-fast pace, the business model has begun to subvert worldwide. But big data, like all other technologies, is equally risky. All technical users should be aware: Big data means great responsibility.
A recent study co-chaired with others concluded that simply using publicly available Facebook "Like" information can also generate extremely detailed user psycho-demographic information including race, personality, IQ scores, happiness Sense, drug use, sexual orientation, political opinions and religious beliefs and other effective personal statistical information.
Once you have the data, it is not hard to establish a model that automatically updates itself. We extracted "likes" and personal attribute information from 58,000 Facebook feeds and measured them on our own questionnaires - we have no reason to think the findings are not representative.
Conjecture is not based on those small, one can see the link "like" data set. Interest in science is associated with a high level of intelligence, but the likes of potato rings or the likes of Morgan Freeman are also informative. By aggregating thousands of similar data together, one can effectively infer personal characteristics.
Facebook is just the beginning. Like is a type of digital record that can be used for speculation, while other types of digital records include Twitter messages, email, web searches, browsing history, credit card transactions, and online / offline shopping information.
Like any great technology, this presumptive function can be good or evil.
Rapid, automated psychological assessment may have a revolutionary impact on hiring. Why not first evaluate (with their consent) millions of candidates and invite the smaller one to the interview? This is a time-saver way for recruiters and candidates alike. Why not automatically adjust products and services based on personal characteristics? Imagine, the Financial Times can be based on personal character and mood to recommend online articles. Think again, open, outgoing and conservative, introverted people searching for "overnight London" can get different results.
Of course, this also has a negative side. Personalized ads may be considered beneficial to both users and advertisers, but if the balance of power tilts toward advertisers, they may be able to play with customers. An emotionally unstable user may be motivated to purchase unnecessary insurance because of their psychological characteristics. The ability to guess certain features can even be dangerous to people. It has been possible to infer the user's sexual orientation or religious beliefs, which could jeopardize their safety - not just in less liberal countries.
With awareness that playlists, shopping history, and "likes" can reveal so much information, many may be discouraged from online technology. In my opinion, this "digital exclusion" is not good for both the individual and the economy. The potential for speculating on personal traits and preferences is enormous. I am not a policymaker, but I believe we should devise policies and tools that minimize the associated risks. We should follow two principles: transparency and control.
First, we need to help users understand which of their personal data is public and what is the current and potential use of the data. Second, we need to give our users complete control over their own data and decide for themselves how the data will be used. Technical solutions may have emerged in both areas, but there is also a need to develop user awareness and establish a suitable legal framework.
The user should have complete control over the data that can be used to infer. The storage and management of personal data by third parties, such as companies and governments, has become a common practice. But do this? Just think "like" or buying records are not stored on social networks or online stores, but are securely stored on your PC or personal cloud account. Conjecture still works, but it's under the control of users, allowing users to review inferences about personal characteristics.
I love Facebook. It is a great technique to connect people together. I hope to help make sure that we continue to use this technology knowing that personal information is safe.
The writer is a research fellow at the Cambridge University's Psychometrics Center in Cambridge, UK. He co-authored this study of personal characteristics with David Stillwell, a colleague of Psychometric Center, and Thore Graepel of Microsoft Research.