Open source HBase database and hive management Tools application Scenario

Source: Internet
Author: User

A. Hive Application scenario

This article mainly describes the practice of using Hive, business is not the key, a brief description of the business scene, the task is to search the log data for statistical analysis.

Group Search just online soon, log volume is not big. These logs are distributed in 5 front-end machines, saved by the hour, synchronized with hourly data from the last hour to the Log analyzer, and statistics are required to be updated by the hour. These statistical items,

including keyword search volume PV, category Access, traffic per second TPS, and so on.

Based on the Hive, we will these data by day, a table, the background script based on the time stamp will be synchronized every hour over the 5 front-end machine log data into a log file, import Hive system, hourly synchronization of log data

is appended to the day datasheet, and when the import completes, the statistics for the day are recalculated and the statistics are printed out.

If the above requirements are directly based on the development of Hadoop, we need to manage the data independently, develop different map/reduce operations for multiple statistical requirements, customize the merging and sequencing operations, and check the operation status of the task, the workload is not small. But

Using Hive, from import to analysis, sorting, weight, result output, these operations can be resolved using HQL statements, a statement processed by the resolution of several tasks to run, even if the number of keyword access increment this need to access multiple days

According to the more complex requirements can also be through the table association such statements automatically completed, saving a lot of work.

Two. HBase Application Scenario

1, the Reptile website URL writes.

2, Taobao before 2011 all of the back-end persistent storage is basically on MySQL (not excluding a small number of oracle/bdb/tair/mongdb, etc.), MySQL due to open source, and the ecosystem is good, itself has a separate tables and other solutions, So for a long time to meet the needs of a large number of Taobao business.

But because of the diversified development of business, more and more business system requirements began to change. In general, there are several types of changes:

The amount of data is becoming more and more, in fact, Taobao almost any user-related online business data volume in the billion level, the number of daily system calls from billion to tens of billions of times, and historical data can not be easily deleted. This requires a massive distributed file system that provides online services for terabytes and even petabytes of data

Data volume growth is very fast and not necessarily accurate forecast, most of the application system from the line up in a period of time the data volume has been rising rapidly, so from the point of view of the cost of the system level to expand the ability to have a strong demand, and do not want to have a single point of restriction

Only a simple kv read, no complex join and other requirements. But there is a high demand for system concurrency and throughput, response latency, and it is hoped that the system will maintain strong consistency

Often the system writes very frequently, especially when a large number of systems rely on real-time log analysis

Want to be able to quickly read bulk data

Three. Summary

Hive large data calculations, based on Reducemap

HBase large data store, write and read

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.