Hadoop Internship Exercise 1 (Hive and HBase)

Source: Internet
Author: User

Chapter 1: Introduction

Recently, the telecommunications group held a big data technology training class, according to the requirements, Hadoop small white I made a comparison between the two, to do a practical operation to do a record it, hey ...

The similarities between the two:
Both 1.hbase and Hive are architected on top of Hadoop. is using Hadoop as the underlying storage
The difference between the two:

2.Hive is a batch system built on top of Hadoop to reduce the work of mapreduce jobs, HBase is a project to support the shortcomings of Hadoop for real-time operations .

3. Imagine you are operating the Rmdb database, if it is a full table scan, use Hive+hadoop, if it is indexed access, use Hbase+hadoop.

4.Hive query is the MapReduce jobs can be from 5 minutes to several hours, hbase is very efficient, certainly more efficient than Hive.

The 5.Hive itself does not store and calculate data, it relies entirely on table-pure logic in HDFs and mapreduce,hive.

6.hive borrowing Hadoop's mapreduce to do some of the commands in hive

7.hbase is a physical table, not a logical table, provides a large memory hash table, the search engine through it to store the index, convenient query operation.

8.hbase is a column store. So hbase can change the data to delete operations, but hive is row, only append data.

9.hdfs as the underlying storage, HDFS is the system for storing files, and HBase is responsible for organizing the files.

10.hive need to use HDFs storage file, need to use the MapReduce computation framework.


Chapter 2: Hive Operation

Basic operations:

Log in to Hadoop's master node--"switch to Hadoop account--use hive to view the table and exit:




Federated query:

Internal connection:
hive> SELECT sales.*, things.* from Sales JOIN things on (sales.id = things.id);

See how many mapreduce jobs are used by hive for a query
hive> Explain SELECT sales.*, things.* from Sales JOIN things on (sales.id = things.id);

External connection:
hive> SELECT sales.*, things.* from sales to OUTER JOIN things on (sales.id = things.id);
hive> SELECT sales.*, things.* from sales right OUTER JOIN things on (sales.id = things.id);
hive> SELECT sales.*, things.* from sales full OUTER JOIN things on (sales.id = things.id);

in query: Hive is not supported, but can use left SEMI JOIN
hive> SELECT * from things left SEMI joins sales on (sales.id = things.id);

CREATE TABLE ... As SELECT: New table does not exist pre-existing
Hive>create TABLE Target as SELECT col1,col2 from source;


View query:

To Create a view:
hive> CREATE VIEW valid_records as SELECT * from records2 WHERE temperature!=9999;

To View details about a view:

hive> DESCRIBE EXTENDED valid_records;

Differences and operations of external tables, internal tables, partitioned tables:

At this point, a new data storage location for the TT table is created on HDFs, for example, the author is in Hdfs://master/input/table_data

Upload the HDFS data to the table:

load data inpath '/input/data ' into table tt;

Load data inpath '/input/data ' into table TT;

The data under the/input/data directory on HDFs is transferred to the/input/table_data directory.

After deleting the TT table, the data and metadata information of the TT table will be deleted, that is, there is no data at the end of the/input/table_data, of course, there is no data/input/data under the previous step!

If no location is specified when creating the internal table, a new table directory is created under/user/hive/warehouse/and the remainder is the same.

The point to note is that load data will be transferred!

2. External table:

Create external table et (name string, age string);

Create external table ET (name string, age string);

At this point, a new table directory is created in/user/hive/warehouse/et

Load Data Inpath '/input/edata '   into table et;

The data under/input/edata/on HDFS will be transferred to/user/hive/warehouse/et. , after deleting this external table, the data under/user/hive/warehouse/et will not be deleted, but the data under/input/edata/is no longer in the previous step! the location of the data has changed! The essence is that data on an HDFS will be transferred when it is load!

3. Partition table


to import data for a partition of an internal table, hive will create the directory and copy the data into the partition
LOAD DATA LOCAL inpath ' ${env:home}/california-employees '
into TABLE employees
PARTITION (country = ' US ', state = ' CA ');
add data to a partition of an external table
ALTER TABLE log_messages ADD IF not EXISTS PARTITION (year = $, month = 1, day = 2)
Location ' hdfs://master_server/data/log_messages/2012/01/02 ';
Note: Hive does not care about partitioning, whether the directory exists, whether there is data, which can result in no query results

Load data from the local file system. Load DATA local Inpath "/opt/data/1.txt" into TABLE table1 means that the Table1 from the local file system/opt/data/1.txt loaded into hive. Hive reads the file and writes the contents to the location where Table1 is located in HDFs.
Load the data from HDFs to the Inpath "/data/datawash/1.txt" into TABLE table1, which means that the/data/datawash/1.txt from HDFs is written to the directory where the Table1 resides.
This is true of overwrite in loading. LOAD data LOCAL Inpath "/opt/data/1.txt" OVERWRITE into TABLE table1, and if you add OVERWRITE, overwrite the original already existing, if you are sure there is no data, you can write.


Chapter 3: hbase Operations

Installation: Install zookeeper before installing HBase, see http://jeffxie.blog.51cto.com/1365360/329212

Login:


Grammar:


Operation Command-expression
Create a table

Create ' table_name ', ' family1 ', ' family2 ', ' Familyn '

Add a record Put ' table_name ', ' rowkey ', ' family:column ', ' value '
View Records Get ' table_name, ' Rowkey '
View the total number of records in a table Count ' table_name '
Deleting records Delete ' table_name ', ' rowkey ', ' family:column '
DeleteAll ' table_name ', ' Rowkey '
Delete a table First disable ' table_name '
Drop ' table_name ' again
View all records Scan "table_name", very dangerous best add limit:scan ' table_name ', limit=>10
View all data in a column of a table Scan "Table", {COLUMNS =>[' family1: ', ' Family2 ' versions=2]} VERSIONS Optional
Practice:


Status//View server state

Version//Querying hbase versions

Create ' test ', ' course ', ' device '// creation Table

List//List all tables

Exists ' test '//query table exists

Put ' test ', ' Li Lei ', ' Course:math ', ' 90 '

Put ' test ', ' Han meimei ', ' course:english ', ' 92 '//INSERT Record

Get ' test ', ' Li Lei '//Get all data for an ID

Get ' test ', ' Li Lei ', ' device '//Get an ID, all data for a column family

Get ' test ', ' Li Lei ', ' device:laptop '// get an ID, all data for one column in a column family

Update a record:

Get ' test ', ' Li Yang ', ' device:laptop ', ' Asus '

Count ' test '//See how many rows are in the table

Delete ' test ', ' Li Yang ', ' device:laptop '// Remove the ' device:laptop ' field with the value ID ' Li Yang '

deleteall ' test ', ' Li Yang ' //delete entire row

To delete a table:
Disable ' test '
Drop ' test '

Exit:

Exit

Hadoop Internship Exercise 1 (Hive and HBase)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.