Large data governance is part of a broader information governance program that adjusts the objectives of multiple functions to develop strategies related to large data optimization, privacy, and monetization. However, large data governance is meaningless without understanding the underlying http://www.aliyun.com/zixun/aggregation/18278.html "> Data types."
Figure 1. A 3-D framework for large data governance
Graphic: Industry and function, large data type, information governance
This article provides a large data governance framework. As shown in Figure 1, the mine framework is composed of 3 dimensions:
Large data types large data can be divided into 5 categories: Web and social media, machine to Machine (machine-to-machine), large transaction data, biometrics and human-generated.
Information Governance discipline information governance traditional disciplines – organization, metadata, privacy, data quality, business process integration, master data integration, and information lifecycle management – also apply to large data. For example, sensor data needs to be integrated into a preventative maintenance process. However, it is difficult to simplify the maintenance process if sensors on different machines generate inconsistent event codes.
Industry and functional data analysis is driven by use cases specific to a given industry or function, such as marketing, customer service, information security, or information technology.
As mentioned above, large data is divided into 5 categories:
1. Web and social media data include streaming and interactive data from social media, such as Facebook, Twitter, LinkedIn, and blogs.
2. Machine to machine data includes data read from sensors, gauges, and other devices belonging to the so-called "Internet of things".
3. Large business data include medical claims, telecommunications call detail Records (CDR), and an increasing number of utility billing records provided in semi-structured and structured formats.
4. Biometric data includes fingerprints, genes, handwriting, retinal scans, and similar types of data.
5. Human-generated data includes a large number of unstructured and semi-structured data, such as call center agents ' notes, recordings, e-mails, paper documents, surveys, and electronic medical records.
Large data frames look different depending on industry and functionality.
Healthcare providers
Solution: Patient Care
Large data type: Machine-to-machine data
Disciplines: Data quality, information lifecycle management, privacy
In the neonatal intensive care unit, the hospital uses streamline analysis techniques to monitor the health status of newborns. Using these techniques, hospitals can predict diseases that have not yet occurred 24 hours before any symptoms occur. These technologies rely on a large amount of time series data – but when a patient moves, such data sometimes disappears, which can lead to a separation and a stop to provide readings. In these cases, the streamline platform uses linear and polynomial regression to fill time series data gaps using historical readings. The hospital also tagged all time series data modified by the software algorithm. If a lawsuit or medical investigation case occurs, the hospital 211.html "> feels the need to generate both original readings and revised readings." In addition, the hospital has established policies around the maintenance of protected health information.
Solution: Predictive modeling based on electronic medical records
Large data type: artificially generated data
Discipline: Data quality
The hospital's analysis department constructed a predictive model based on 150 variables and 20,000 admissions patients to determine the likelihood that a patient could be hospitalized again for 30 days of congestive heart failure treatment. In a predictive model validation example, the analysis team identified the patient's smoking status as a key variable. At first, only 25% of the structured data on smoking status was filled with a two-yuan answer "yes/no". However, by using content analysis based on electronic medical records, the analysis team increased the proportion of smokers to 85% per cent of the admissions, which included Doctor's orders, discharge summaries and patient check-ups – The result is that the analysis team improves the quality of sparse, structured data by using unstructured resources.