The big data age cannot be anonymous

Source: Internet
Author: User

Big Data has been very popular in recent years, but privacy has become a concern. Big Data collects people's highly private data, such as medical records or shopping records, and then uses a separate identification program to be anonymous. In this way, people do not have to worry about personal privacy leaks. But is it really that simple?

Anonymity in the big data era is not 100%

In an article published in 2006, Princeton's computer scientist avendonnell Yanan said that in the Internet movie data that should have been anonymous, netflix users' historical rental data can be leaked during the cross-reference process. According to a report in January July 9, anonymous supporters (people who think it is feasible to be anonymous) may not have a hunch about the dangerous vulnerabilities in big data.

In theory and practice, anonymization does not play a major role. Those who think that anonymity is safe are actually creating a "false sense of security", seriously underestimating the threat of those data hackers, they are likely to steal personal information from big data.

Personal geographic location data is not hidden

According to a report in May 2013, 95% of mobile phone users can be monitored from the large data groups collected by People's mobile phones, because they always leave logon marks on the Internet, real-time photo sharing information. Anonymous experts will tell you frankly that there is no way to hide users' geographic location information.

Experts cannot predict how fragile the database defense line is.

In a medical record case study of 0.11 million patients, anonymous processing expert emaman estimates that less than 1% of patient data can be reidentified. However, it is estimated that more than 12% of patient data can be locked. Attackers can easily Lock target data in the database.

Anonymization is difficult, and data re-identification can be permanent

The data anonymization process is challenging and prone to errors. In the recent release of data on taxis in the 0.1 billion group of taxis in New York City, including drivers may be re-identified, this is because the data hash method used on the driver's license (a method that converts character strings to fixed-length values or index values) is also crude.

If someone's anonymous data is disclosed, it will remain on the Internet and will not be eliminated. This is much more serious than data intrusion into a company or application. When a company's database is intruded, it only needs to do a good job of security: Fix Data vulnerabilities, notify users of alerts, and proceed as usual. However, this does not mean that we have to discard the data. We only need to discard the leaked data account.

In a medical record case study of 0.11 million patients, anonymous processing expert emaman estimates that less than 1% of patient data can be reidentified. However, it is estimated that more than 12% of patient data can be locked. Attackers can easily Lock target data in the database.

The data anonymization process is challenging and prone to errors. In the recent release of data on taxis in the 0.1 billion group of taxis in New York City, including drivers may be re-identified, this is because the data hash method used on the driver's license (a method that converts character strings to fixed-length values or index values) is also crude.

If someone's anonymous data is disclosed, it will remain on the Internet and will not be eliminated. This is much more serious than data intrusion into a company or application. When a company's database is intruded, it only needs to do a good job of security: Fix Data vulnerabilities, notify users of alerts, and proceed as usual. However, this does not mean that we have to discard the data. We only need to discard the leaked data account.

So, do we need to break our cell phones, give up medical care (medical data leaks), and live in seclusion directly? However, Professor emaman does not agree. He strongly supports anonymization and said: "more than 12% of patient data can be locked, however, he did not re-identify the data of a single patient. If Yinan is the leader in the reidentification technology field, anonymization is very feasible ."

This is good news for us in the big data age. However, the anonymization of big data does not collapse, and it does not mean that the anonymization technology is indestructible.

The big data age cannot be anonymous

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.