Must read! Big Data: Hadoop, Business Analytics and more (1)

Source: Internet
Author: User
Keywords Large data traditional business analytics data warehousing

Translation: Cady Wang (Wang Nanan)

You want to know big data, but you don't know anything about jerky terminology? Do you want to know the market and application of large data, but there are no good cases and explanations? Don't worry, this little book from the Wikibon community wants to help you.

Yes, this is a small book rather than an article, because it is detailed and meticulous to let you from a completely do not understand the large data technology and related applications of the layman to become a familiar with its concept and meaning of the "inner pedestrian", so it is great!

Translator Cady Wang Nanan spent a lot of effort to translate this small book, the deficiencies also asked friends to correct the point. We divided it into several articles.

--World wide data observation, sing

Main content

· 1 large data manifesto from the Wikibon community · 2 Data processing and Analysis: Traditional mode · 3 changes in the nature of large data · 4 new methods of data processing and Analysis 4.1 Hadoop4.2 NoSQL4.3 large-scale parallel analysis database · 5 complementarity of large data methods · 6 Large data supplier development status · 7 Big Data: actual use cases · 8 Large data Skill gaps · 9 Big Data: The next step for companies and suppliers is the Big Data manifesto from the Wikibon community

Providing effective business analysis tools and technologies to the company is the chief information Officer's priority. Effective business analysis (from basic reporting to advanced data Mining and predictive analysis) enables data analysts and business people to gain insights from the data, which, when translated into action, can lead to greater efficiency and profitability for the company.

All Business Analytics are data based. In the traditional sense, this means that the organization creates and stores its own structured data, such as customer data in CRM systems, operational data in ERP systems, and financial data in the accounting database. Thanks to the popularity of social media and network services (such as facebook,twitter), data sensors and network devices, machines and human-generated online transactions, and other sources of unstructured and semi-structured data, The volume and type of existing data in an enterprise and the demand for near-real-time analysis resulting from the pursuit of maximum commercial value are rapidly increasing. We call these big data.

Traditional data management and business analysis tools and technologies are under pressure from large data, while new methods to help companies gain insights from large data analysis are emerging. These new methods are data processing, analysis and application in a way that is completely different from traditional tools and techniques. These new methods include open source framework Hadoop,nosql databases (such as Cassandra and Accumulo) and large-scale parallel analysis databases (such as EMC's Greenplum, HP's Vertica and Teradata ASTER Data). This means that businesses also need to rethink their approach to business analysis from a technical and cultural two perspective.

For most companies, this shift is not easy, but for companies that accept the shift and make big data the cornerstone of their business analytics practices, they will have a significant competitive advantage over timid rivals. Large data-assisted complex business Analytics can bring unprecedented insights into customer behavior and volatile market environments, enabling them to make data-driven business decisions faster and more efficient than their competitors.

The emergence of large data provides a significant opportunity for hardware, software, and service providers, from server-side technology that stores and supports large data processing to front-end data visualization tools that bring fresh insights to end users. These vendors that help companies transition to big data practitioners, whether providing large data use cases that increase business value or developing technologies and services that make big data real, will thrive.

Large data is the competitive advantage of new authority in all industries. Companies and technology providers who think Big Data is a flash in the pan will soon find themselves in need of hard work to keep up with the competitors who think ahead. In our opinion, they are very dangerous. For businesses that understand and embrace big data realities, the possibilities for new innovation, high flexibility, and high profitability are almost limitless.

Data processing and analysis: traditional way

Traditionally, data processing for specific analysis purposes is based on fairly static blueprints. Through regular business processes, organizations create structured data based on stable data models through applications such as CRM, ERP, and financial systems. Data integration tools are used to extract, transform, and load data from enterprise applications and transactional databases to a staging area where data quality checks and data normalization are done, and the data is eventually patterned into neat rows and tables. This modeled and purged data is loaded into the enterprise Data Warehouse. This process occurs periodically, such as daily or weekly, and sometimes more frequently.

Figure the traditional data processing/analysis Source: Wikibon 2011

In traditional data warehouses, data Warehouse administrators create schedules, periodically compute standardized data in warehouses, and assign resulting reports to business units. They also create dashboards and other limited-functionality visualization tools for administrators.

At the same time, business analysts use data analysis tools to perform advanced analysis in the Data Warehouse, or, in general, import sample data into the local database due to data volume constraints. Non-professional users base data visualization and limited analysis of data warehouses through front-end business intelligence tools (SAP BusinessObjects and IBM Cognos). Traditional data warehouses have little more than a few terabytes of data, because large volumes of data consume data warehouse resources and degrade performance.

Changes in the properties of

large data

The advent of Web, mobile devices, and other technologies has led to fundamental changes in the nature of the data. Large data has an important and unique feature that distinguishes it from "traditional" enterprise data. No longer centralized, highly structured, and manageable, the data is now highly dispersed, loosely structured (if there is a structure) and larger in size than ever before.

Specifically:

· Volume-The amount of data that is generated by the web, mobile devices, IT infrastructure, and other sources is multiplied every year. Type-Increased diversity of data types, including unstructured text data and semi-structured data (such as social media data, location-based data, and log file data). Speed-Thanks to digital transactions, mobile computing and the high number of users of the Internet and mobile devices, the need for new data to be created and real-time analysis is increasing.

Broadly speaking, large data is generated from multiple sources, including:

· Social networks and media: There are more than 700 million Facebook users, 250 million Twitter users and 156 million public-open blogs. Every update, tweet, and blog post and comment on Facebook creates several new data points (including structured, semi-structured, and unstructured) that are sometimes referred to as "data exhaust". · Mobile devices: More than 5 billion mobile phones are in use worldwide. Each call, SMS, and instant message are recorded as data. Mobile devices, especially smartphones and tablets, make it easier to use applications such as social media, and the use of social media produces a lot of data. Mobile devices also collect and transmit location data. · Online transactions: Billions of of online shopping, stock trading and so on every day, including countless automated transactions. Each transaction generates a large number of data points that are collected by retailers, banks, credit cards, credit institutions and other institutions. Network devices and sensors: various types of electronic devices (including servers and other IT hardware, smart meters, and temperature sensors) Create semi-structured log data to record each action.

Figure Numpad Traditional Data v. Large data source: Wikibon 2011

From the time or cost-benefit, the traditional data warehouse and other data management tools can not achieve large data processing and analysis work. That is, the data must be organized into relational tables (neat rows and columns of data), and traditional enterprise data warehouses can be processed. It is impractical to apply this structure to massive amounts of unstructured data due to the time and manpower costs needed. In addition, the expansion of traditional enterprise-class data warehouses to accommodate potential PB-level data requires huge investments in new dedicated hardware. Because of the bottleneck of data loading, the performance of traditional data warehouse will be affected.

Therefore, new methods for processing and analyzing large data are needed.

To be continued, please see part II: Must-Read! Big Data: Hadoop, Business Analytics and more (2)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.