Proactive strategy to manage large data privacy

Source: Internet
Author: User
Keywords Large data or personal data


It seems that everyone is looking at big data, even the US federal government. At the end of last year, the Federal Trade Commission (FTC) issued an order to 9 companies in the data brokerage industry asking them to provide information on customer data and usage. The FTC's actions make it clear that while the rise of big Data offers many commercial benefits, it also poses significant privacy concerns.



Why use large data?



Large data is different from past data warehouses because it almost analyzes all types of data files or formats, including images, videos, and data collected from social media. Another feature of large data is that it does not have a "one-to-one" relationship to the data store, but relies on virtualized architectures to extract content from large content stores and files as a single global resource.



In business executives and line managers, the biggest motivation for using large data is to produce more accurate, detailed forecasts or speculations that provide potential benefits to the enterprise. Large data brings a wide range of business advantages, from new product development and improvement to best pricing, to screening job resumes and designing effective marketing campaigns. In fact, the political movement has begun to use the Big Data analysis: The 2012 Obama campaign used a big data analysis to determine the likely voters, then influence them, through them to raise campaign funds and win votes, which is Obama's ultimate victory in the key strategy.



Large Data privacy issues



The FTC's recent actions are aimed specifically at data brokers: companies that collect and analyze specific consumer behavior data and then sell the results to companies that want to improve their marketing and sales performance. However, the need to recognize that the use of large data to bring more and more privacy issues, this is not limited to these traditional data brokerage companies. The Economist Information Division (the Independent Business unit within the Economist Group) publishes 19 industry areas that use large data, including manufacturing, it and technology, financial services, professional services, healthcare, pharmaceuticals and biotechnology, and consumer goods. There is no doubt that the big data revolution has begun.



Based on the characteristics of large data, and the business motives used for large data, the most critical privacy issue is, simply, the quality or accuracy of the data, and the negative impact that the enterprise may have on individuals by using the data to make decisions. For example, is the accuracy of personal information obtained from social media? Can the information from social media or other sources be used to screen or rank job applications or raise the price of health insurance? Basic personal data, such as age, marital status, education, or employment, are usually unproven. There is no such validation in the free e-mail service, and almost all users will click to accept the terms of use and privacy statements, indicating that they agree to waive a certain degree of privacy rights for data aggregation.



Another quality problem is that when collecting Internet search terms or phrases, they may be misunderstood. Examples of companies using bad data include using Internet search terms to evaluate product pricing, or potential target customers. You know, there may be multiple users on your home computer, and there are a number of reasons why someone is searching the web for topics unrelated to them. This type of data, analysis, and use can produce problematic analysis results, leading to erroneous decisions that ultimately result in a mutually-devastating situation for individuals and for data analysis. This lack of control over the quality of large data points us to another privacy principle that collects personal data that meets and is suitable for a given goal.



Best practices for large data privacy



The best practices for enterprise processing of large data remain uncertain, but there are some lessons to be learned to ensure that large data innovations are driven without sacrificing personal data privacy.



The first step in effectively using large data is to properly procure and manage cloud services, which is a prerequisite for making large data cost-effective: most businesses cannot or will not invest in the IT infrastructure needed to support large data initiatives, but rely on cloud computing applications, infrastructure, and processing capabilities. Moreover, even those willing to invest will find it difficult to do so without the flexibility that cloud computing offers. It also exposes the weaknesses of many businesses, which are generally unable to ensure the security and privacy of data in cloud computing. It is not enough for an enterprise to implement standard General safety contract terms. For specific data privacy controls, cloud service providers and cloud service users are clear about the responsibilities they must take. Cloud services must also be continuously detected and audited, with relevant metrics to display data integrity, confidentiality, and availability. The perfect data protection resource for using cloud computing services is the Cloud Security Alliance (Cloud), which has a number of instructional documents available on its website.



From past experience, when deploying cloud services, it is best to perform large data prototypes in a public cloud and then move to a private cloud. Why? Public cloud deployments, as the name suggests, are within a third party environment and may be accessed by "untrusted" parties. Private cloud deployments are directly controlled and managed by organizations or enterprises, and even though data computing facilities may be outside the enterprise, private cloud deployments can only be accessed by trusted parties.



The next strategy for better leveraging large data is to deploy converged storage. Fusion storage is more efficient and can reduce the likelihood of errors that can affect data quality or accuracy. The key feature associated with data quality and accuracy for fusion storage is data deduplication, which is also cost-effective.



Another best practice is to clean the data properly to help avoid some of these privacy issues. Emory University data warehousing expert Amy Dean says: "Filter, clean, subtract, align, match, connect, and diagnose data as early as possible." "In view of the impact of data quality on the analysis, Dean recommends measuring or evaluating diverse and varied data." Dean also suggested that in order to query, data sources should be linked or available, so any data elements that have problems can be traced back to their source.



Ultimately, the best way to ensure the accuracy of personal data (and thus to ensure better data privacy) is to encourage and require consumers to view, review, and correct information gathered about themselves, not just the enterprise itself. In addition, the consumer review process is easy to use and does not require consumers to spend money. This is hard work for older data users because they typically collect a lot of data that they never use, and it can be tricky to manipulate. And businesses may also worry that consumers will see them collect such detailed personal information. But this transparency is the best way for consumers to decide how to use big data and build confidence. The credibility reporting entity has been making it a long-term practice for consumers to access, review and correct data, a requirement for the industry by US regulators. Similarly, privacy tips, Web site statements (which contain detailed contact details to answer questions) can achieve greater transparency and are a way to handle error data.



Big Data Puzzle



The most controversial concept of corporate privacy is to obtain consent or permit the collection and use of personal data. If time could turn back and everything would start again, it would be an ideal basic rule. However, it is too late to seek personal consent to collect personal data, since a large amount of personal data has been collected and widely shared. The indisputable fact is that it is impossible to identify all the businesses that may have collected personal data.



There is a way to help individuals regain control of their personal data by allowing them to completely delete and erase their data. Of course, large data users are not happy to provide this functionality, and this is a "severe test" of whether consumers are aware of and believe that the use of their data can bring advantages. Regulators are bound to ask for the ability to delete data when they consider protecting consumers ' privacy rights. As large data use continues to evolve, companies should consider providing the ability to allow individuals to delete specific data fields during the technical design and architecture phases of large data deployments.



Similarly, from the perspective of protecting individual privacy rights, a better way to use personal data is to "anonymously" handle all personal data. However, the concept of anonymity (that is, deleting any identifiable field or attribute) is not proven to be feasible. As early as 2000, Dr. LaTanya Sweeney (now a professor at Harvard) showed that only three information was needed to determine 87% of Americans: Zip Code, date of birth, and sex, which can be found in public records. Given these findings, even with the deployment of an anonymous system, we are still able to redefine the identity of any individual consumer residing in the United States.



In view of all these issues and strategies, the solution to protecting individual privacy rights in the booming large data area is to ensure reliable and accurate personal data and to interpret it appropriately. At the same time, enterprises should include the above privacy principles in their large data development and use, only in this way, enterprises can obtain the best results, or, at least, the consumer resistance.


 





(Responsible editor: Fumingli)


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.