Features of the table in HBaseBig: A table can be made up of tens of billions of rows, millions of columns (Riedo, insertion slows down) Column-oriented: Column (family)-oriented storage and permission control, column (family) Independent retrievalSparse: For columns that are empty (null) and do not occupy storage space, the table can be designed to be very sparse.Multiple Versions: The data in each cell can be made up of multiple versions, and by default the version number is automatically assigned as a timestamp.Type Unique: Data in HBase is a string, no type
so ..... Column qualifiers so many, many friends.
Big Data emphasizes 3V characteristics, Volume (magnitude) varity (kind) velocity (velocity)
Most companies are just beginning to understand big data. People spend most of their time on structured data of 20%, but the fact is that 80% of the data is unstructured.
NoSQL Database Advantages
1 Strong extensibility
2 Concurrency performance is good
The performance of NoSQL Big data is good because of its weak relation, the structure of database is simple.
In general, MySQL uses query cache, and the cache is effective whenever an update occurs, a large-grained cache that is used frequently in interactions with web2.0, and cache performance is not high. The NoSQL cache is record-level and is a fine-grained cache, so NoSQL has a lot of performance at this level.
3 Flexible Data Model
Hbase Properties
HBase is a typical NoSQL database that supports only one-line things. The HBase design goal relies heavily on scale-out to increase computing power by increasing the number of inexpensive commercial servers.
1 large capacity
HBase single table can have tens of millions of rows, million columns, data matrix landscape and vertical two latitude support data magnitude is very elastic. Tens of millions of billions of dollars may be timed out. If the qualifying column does not show a timeout problem
2 Column-oriented
HBase is column-oriented storage and permission control, and supports independent retrieval.
Columnstore its data is stored in a column in a table, which can greatly reduce the amount of data read when the query requires only a few fields. For example, a single field of data aggregation storage. It is easier to design a better compression algorithm for this clustered storage.
"" The traditional line database features are as follows: ""
The data is stored on a row
Queries that do not have an index use a large number of I/O.
Building indexes and materialized views takes a lot of time and resources
In the face of query requirements, the database must be inflated to meet demand
"Attribute of the column database" "
Data is stored in columns, that is, each column is held separately.
Data is indexed
Access only the columns involved in the query, which can reduce the system I/O massively
Each column is handled by a clue, which is the high performance of concurrency processing for queries.
Data types are consistent, data features are similar and can be compressed efficiently.
Columnstore not only solves the problem of data sparsity, but also minimizes storage overhead, and when a query occurs, it retrieves only the columns involved in the query and can significantly reduce disk I/O. These features also support hbase to ensure a certain read and write performance.
3 sparsity
In most cases, row-stored data tends to be sparse, where there are a large number of empty (null) columns that occupy storage space, resulting in wasted storage space. For HBase, an empty column does not occupy storage space, so the table can be designed to be very sparse.
4 Extensibility
The underlying is dependent on HDFS. At the same time, the concept of the region and regenserver of HBase can be partitioned and the data can be located on different machines, so the hbase core architecture is extensible. The extensibility of hbase is thermal expansion, where nodes can be added or reduced at any time without stopping existing services.
5 High reliability
The 1 Wal mechanism guarantees that data writes are not lost due to cluster exceptions: The Replication mechanism guarantees that data will not be lost or corrupted when there are serious problems with the cluster. And the HBase bottom uses HDFs HDFs itself to have backups.
6 High Performance
The unique design of the underlying LSM data structure and the Rowkey ordered arrangement makes HBase very high write performance. The region segmentation, primary key index, and caching mechanism enable hbase to have random read performance under massive amounts of data, which can reach the millisecond level for Rowkey queries. HBase also has a good adaptability for high concurrency scenarios. This feature is also a very important point for many companies in the industry to choose HBase as a storage database
Rocky
From for notes (Wiz)
HBase features 3v characteristics, Volume (magnitude) varity (kind) velocity (velocity)