What is a column-oriented open source distributed database HBase-products and Technologies
Source: Internet
Author: User
KeywordsNbsp;blog facing open source
What is a column-oriented open source distributed database HBase release time: 2012.04.16 14:41 Source: News and author: News
HBase is a distributed, column-oriented open source database developed by the Apache Foundation. Unlike a generic relational database, HBase is a database suitable for unstructured data storage. It is based on a column rather than a row based pattern. The user stores data rows in one table. A data row has a selectable key and any number of columns. Users can access rows based on keys and scan and filter a series of rows. HBase a table storage system that can be expanded horizontally to provide extremely fast low-level updates for large-scale data. It is mainly used for random access, real-time reading and writing of large data. This is the function that information system needs.
The following example shows the process of migrating data that was originally stored in a blog in MySQL into hbase:
Existing table structure in MySQL: Table blogtable represents the blog itself, including 5 fields, blogid the corresponding blog ID number for each user, type int, as the primary key field; Author is the user's blog name, the type is varchar; Title for the user's blog signature, type varchar; URL for the blog URL link, type for varchar; Text is the content of the blog, the type is varchar. Comment says blog comments, including 5 fields. ID is the user ID that publishes the comment, the type is int, as the primary key field; blogID is the original ID of the blog, the type is varchar. Where blogID as a foreign key to the comment table, points to the primary key of the table blogtable. Title for comment, type of varchar; Author is the name of the user to publish the comment, the type is varchar; The text field is commented on and the type is varchar.
Migration HBase Table Structure: HBase stores data as a table. The table consists of rows and columns. Columns are divided into several column families (row accessibility). Table Blogtable represents the blog itself. The ID is row key, the primary key of the table, used to retrieve records. Table has one or more column accessibility in the horizontal direction. The table blogtable contains 5 column accessibility, and the INFO data structure is divided into info:author,info:title,info:url. If you want to add another property X, the corresponding structure is info:x. It is necessary to note that column accessibility supports dynamic scaling without the need to define the number and type of column, but all of the column is stored in binary format and the user needs to do the type conversion themselves.
In short, the hbase goal relies heavily on horizontal scaling to increase computing and storage capabilities by increasing the availability of Low-cost commercial servers. Simple understanding, hbase between NoSQL and RDBMS. HBase can only retrieve data from a primary key (row key) and a range of primary keys, does not support conditional queries and sorting, and supports only single-line transactions. Habase is primarily used to store unstructured and semi-structured loose data. For HBase, a data warehousing tool for Hadoop hive made up for it. Hive can map structured data files to a database table, provide complete SQL query functionality, and convert SQL statements to MapReduce tasks. Hive also provides extensions for HBase single-line transaction restrictions. Facebook is said to have opted for hbase because they hbase to handle two types of data patterns: 1. Temporary data that is often changed by a group; 2. A growing group of data that is rarely accessed.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.