Big Data has been very popular in recent years, but privacy has become a concern. Big Data collects people's highly private data, such as medical records or shopping records, and then uses a separate identification program to be anonymous. In this way, people do not have to worry about personal privacy leaks. But is it really that simple?
Anonymity in the big data era is not 100%
In an article published in 2006, Princeton's computer scientist avendonnell Yanan said that in the Internet movie data that should have been anonymous, netflix users' historical rental data can be leaked during the cross-reference process. According to a report in January July 9, anonymous supporters (people who think it is feasible to be anonymous) may not have a hunch about the dangerous vulnerabilities in big data.
In theory and practice, anonymization does not play a major role. Those who think that anonymity is safe are actually creating a "false sense of security", seriously underestimating the threat of those data hackers, they are likely to steal personal information from big data.
Personal geographic location data is not hidden
According to a report in May 2013, 95% of mobile phone users can be monitored from the large data groups collected by People's mobile phones, because they always leave logon marks on the Internet, real-time photo sharing information. Anonymous experts will tell you frankly that there is no way to hide users' geographic location information.
Experts cannot predict how fragile the database defense line is.
In a medical record case study of 0.11 million patients, anonymous processing expert emaman estimates that less than 1% of patient data can be reidentified. However, it is estimated that more than 12% of patient data can be locked. Attackers can easily Lock target data in the database.
Anonymization is difficult, and data re-identification can be permanent
The data anonymization process is challenging and prone to errors. In the recent release of data on taxis in the 0.1 billion group of taxis in New York City, including drivers may be re-identified, this is because the data hash method used on the driver's license (a method that converts character strings to fixed-length values or index values) is also crude.
If someone's anonymous data is disclosed, it will remain on the Internet and will not be eliminated. This is much more serious than data intrusion into a company or application. When a company's database is intruded, it only needs to do a good job of security: Fix Data vulnerabilities, notify users of alerts, and proceed as usual. However, this does not mean that we have to discard the data. We only need to discard the leaked data account.
In a medical record case study of 0.11 million patients, anonymous processing expert emaman estimates that less than 1% of patient data can be reidentified. However, it is estimated that more than 12% of patient data can be locked. Attackers can easily Lock target data in the database.
The data anonymization process is challenging and prone to errors. In the recent release of data on taxis in the 0.1 billion group of taxis in New York City, including drivers may be re-identified, this is because the data hash method used on the driver's license (a method that converts character strings to fixed-length values or index values) is also crude.
If someone's anonymous data is disclosed, it will remain on the Internet and will not be eliminated. This is much more serious than data intrusion into a company or application. When a company's database is intruded, it only needs to do a good job of security: Fix Data vulnerabilities, notify users of alerts, and proceed as usual. However, this does not mean that we have to discard the data. We only need to discard the leaked data account.
So, do we need to break our cell phones, give up medical care (medical data leaks), and live in seclusion directly? However, Professor emaman does not agree. He strongly supports anonymization and said: "more than 12% of patient data can be locked, however, he did not re-identify the data of a single patient. If Yinan is the leader in the reidentification technology field, anonymization is very feasible ."
This is good news for us in the big data age. However, the anonymization of big data does not collapse, and it does not mean that the anonymization technology is indestructible.
The big data age cannot be anonymous