Mining new business insights with big data

Source: Internet
Author: User
Tags advantage

Market power

In recent years, the web and businesses have witnessed data inflation. There are a number of reasons for this, for example, the commercialization of inexpensive terabyte-level storage hardware, which has been close to critical enterprise data over time, and the criteria for allowing easy information availability and exchange.

From an enterprise perspective, growing information is hard to store in standard relational databases or even data warehouses. These questions refer to some difficult problems that have existed for many years in practice. For example: How to query a 1 billion-row table? How do I run a query across all the logs on all the servers in the data center? The more complicated problem is that the large amount of data that needs to be processed is unstructured or semi-structured, which is even harder to query.

When data is present in this number, a processing limitation takes a lot of time to move data, and Apache Hadoop solves these problems by moving the work to data in its unique way, rather than moving it in the opposite direction. Hadoop is a cluster technology that consists of two separate but integrated runtime components: A Distributed File System (Hadoop Distributed File System,hdfs) that provides redundant storage of data; Map/reduce, which allows concurrent running of user-submitted jobs. Process the data stored in the HDFS. Although Hadoop does not fit every scenario, it provides a good performance benefit. Using Hadoop, the community found that it was not just for data processing, but also opened the door to a variety of interesting data analysis.

With Hadoop, we can linearly expand clusters running on commodity hardware to integrate larger and richer datasets. These datasets provide a new perspective, first, to run the analysis on heterogeneous data sources that were not previously consolidated, and then to run the analysis on the same data scale. This structure is somewhat similar to the paradigm shift, as described by the Flip Kromer (Infochimps founder): "The web has grown from a place that knows a little about everything to a place where all things are known to one thing". Kromer continues to take this scenario as an example, and one day baseball fans will want to know the details of each game in the last 100 (player details, game scores, game venues). If you want to combine data sets and shared location values for all weather stations at the same time, you can predict how a 38-Year-old pitcher will perform at the Wrigley Field at a 90-degree high temperature.

Big Data Ecological System

The important point to note is that the big Data space is still relatively new, and there are still some technical hurdles to take advantage of these opportunities. As mentioned above, data is handled as "jobs" in Hadoop, which is written using a paradigm called map/reduce, in the Java™ programming language. While some work has been done to promote Hadoop to allow other languages, it is still not a straightforward process to correctly understand how to analyze business problems and break them down into solutions that can be run as map/reduce jobs.

To really take advantage of the opportunities around Hadoop, a lot of support technology is needed to move Hadoop out of the developer's field of view to reach a wider audience.

Figure 1. An overview of the big Data ecological system

The advent of an ecosystem provides tools and support around Hadoop. Each component, along with other components, provides a number of methods, as shown below, to implement most user scenarios.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.