A framework that focuses on "data" in large data governance

Last Update:2014-12-22 Source: Internet

Author: User

Keywords Large data

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Large data governance is part of a broader information governance program that adjusts the objectives of multiple functions to develop strategies related to large data optimization, privacy, and monetization. However, large data governance is meaningless without understanding the underlying http://www.aliyun.com/zixun/aggregation/18278.html "> Data types."

Figure 1. A 3-D framework for large data governance

Graphic: Industry and function, large data type, information governance

This article provides a large data governance framework. As shown in Figure 1, the mine framework is composed of 3 dimensions:

Large data types large data can be divided into 5 categories: Web and social media, machine to Machine (machine-to-machine), large transaction data, biometrics and human-generated.

Information Governance discipline information governance traditional disciplines – organization, metadata, privacy, data quality, business process integration, master data integration, and information lifecycle management – also apply to large data. For example, sensor data needs to be integrated into a preventative maintenance process. However, it is difficult to simplify the maintenance process if sensors on different machines generate inconsistent event codes.

Industry and functional data analysis is driven by use cases specific to a given industry or function, such as marketing, customer service, information security, or information technology.

As mentioned above, large data is divided into 5 categories:

1. Web and social media data include streaming and interactive data from social media, such as Facebook, Twitter, LinkedIn, and blogs.

2. Machine to machine data includes data read from sensors, gauges, and other devices belonging to the so-called "Internet of things".

3. Large business data include medical claims, telecommunications call detail Records (CDR), and an increasing number of utility billing records provided in semi-structured and structured formats.

4. Biometric data includes fingerprints, genes, handwriting, retinal scans, and similar types of data.

5. Human-generated data includes a large number of unstructured and semi-structured data, such as call center agents ' notes, recordings, e-mails, paper documents, surveys, and electronic medical records.

Large data frames look different depending on industry and functionality.

Healthcare providers

Solution: Patient Care
Large data type: Machine-to-machine data
Disciplines: Data quality, information lifecycle management, privacy

In the neonatal intensive care unit, the hospital uses streamline analysis techniques to monitor the health status of newborns. Using these techniques, hospitals can predict diseases that have not yet occurred 24 hours before any symptoms occur. These technologies rely on a large amount of time series data – but when a patient moves, such data sometimes disappears, which can lead to a separation and a stop to provide readings. In these cases, the streamline platform uses linear and polynomial regression to fill time series data gaps using historical readings. The hospital also tagged all time series data modified by the software algorithm. If a lawsuit or medical investigation case occurs, the hospital 211.html "> feels the need to generate both original readings and revised readings." In addition, the hospital has established policies around the maintenance of protected health information.

Solution: Predictive modeling based on electronic medical records
Large data type: artificially generated data
Discipline: Data quality

The hospital's analysis department constructed a predictive model based on 150 variables and 20,000 admissions patients to determine the likelihood that a patient could be hospitalized again for 30 days of congestive heart failure treatment. In a predictive model validation example, the analysis team identified the patient's smoking status as a key variable. At first, only 25% of the structured data on smoking status was filled with a two-yuan answer "yes/no". However, by using content analysis based on electronic medical records, the analysis team increased the proportion of smokers to 85% per cent of the admissions, which included Doctor's orders, discharge summaries and patient check-ups – The result is that the analysis team improves the quality of sparse, structured data by using unstructured resources.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More