Seven steps to large data delivery

Source: Internet
Author: User
Keywords Big data steps these we solutions

First, they need to know what big data is. The following is how I define the concept of large data:


"Emerging technologies and practices that make it fast and cost-effective to collect, process, discover, and store large amounts of structured and http://www.aliyun.com/zixun/aggregation/13739.html > unstructured data." ”


The big data covers a wide range of social life categories-from financial transactions to the human genome, from car telemetry sensors to social media logs on the Internet. Using traditional database methods to process and store these large data is quite expensive. To address this problem, new technologies are used to efficiently store data using open source solutions and commercial hardware, parallel workloads, and provide fast processing capabilities.


As more and more IT departments begin to study the alternatives to large data, discuss the center stack, processing speed and platform. These IT departments do not have a good grasp of the limitations of their existing technologies, and many fail to articulate the commercial value of these alternatives, not to mention how they will classify and prioritize data into large data governance.


In fact, the new big data requirements that we see, and the discussion of their processing platforms and processes, are just a part of the big transfer package. In reality, the delivery process for all of the potentially large data implemented requires seven steps:


Collection: Collecting data from a data source and distribution at multiple nodes-typically a grid-a subset of each process, parallel data.


Process: The system then uses the same high power in parallel execution to quickly compute the data on each node. The node "compresses" the resulting data to more consumer data, and the resulting dataset can be used manually (in the case of analysis) or machine (in the case of large results interpretation).


Management: Processing large data is often heterogeneous, from different trading systems. This data usually needs to be understood, defined, commented on and, for security reasons, scanned and audited.


Measurement: Companies tend to measure the rate of data, can be integrated with other customer's behavior or records, and over time to decide whether to integrate or correct it. Business requirements should inform the type of measurement and continuous tracking.


Consumption: The resulting usage data should conform to the processing flow of the original request. For example, if you use hundreds of TB of social media data to interact and help us understand how social media data drives users to buy additional products, we should establish rules for how social media data should be accessed and updated. This is as important as the machine's data access to the machine.


Storage: As data is a service trend, more and more data is being stored in a single location for process access. Data is used for short-term storage batches or long-term retention, and storage solutions should be handled carefully.


Data management: Data governance is the decision and supervision data that drives the business. According to the definition of data governance, data governance is suitable for large data transfer in six previous phases. Sanction the behavior surrounding data by establishing processes and guidelines. Large data needs to be governed by its intended consumption. Other risks are dissatisfaction with the distribution of data, let alone overinvestment.


Most staff are responsible for investigating and acquiring large data solutions that focus on collecting and storing steps, sacrificing other steps. Their question is: "How do we collect all this data and where do we store that data?" ”


But many IT departments still evade the process of defining discrete large data business requirements. And business people often regard the trend of large data as just an excuse for it refurbishment, no clear end of the game. This cynical environment is why big data does not go beyond the "preliminary investigation stage".


As the author of "ITBusinessEdge", Lorinrausen, said in his recent blog, "The only way to ensure that your share is reasonable is to ensure that you have an effective plan for managing large data." ”


Tap the data governance process and do your best to ensure that the data:


Business value and ideal results are clear


Policies for handling critical data have been approved


Expertise applied to large data issues


Rules defining key data are clear


Conflict and problem escalation there is a process


Data management--tactics for implementing data governance policies are relevant


In the development phase of the key issues, the right to decision


Implementing Data Privacy Policy


In summary, data governance means that large data applications are useful and relevant. This insurance policy is a right question. Ensure that we do not waste new and large data, making processing, storage, and delivery more cost-effective and more flexible than ever before.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.