first, as follows:
Hive> CREATE TABLE Wyp
> (id int, name string,
> Age int, tel string)
> ROW FORMAT Delimited
> Fields TERMINATED by ' \ t '
> STORED as Textfile;
Ok
Time taken:2.832 seconds
Copy CodeThis table is very simple, only four fields, the specific meaning I will not explain. The local file system has a/home/wyp/wyp.txt file with the following contents:
We can see that the logs directory contains two sub-directories dt = and Country = cn.
Query operation:
Select ts, DT, line from logs where country = 'cn ',
In this case, our query operation only scans file1.txtand file2.txt files.
4.Bucket ):The preceding table and partition are directory-level split data, while the bucket splits data on the data file of the data source. Using a bucket table will split the source data file into multiple files according to certain rules. To u
--service hwi
Browser access: http: // hostname: 9999/Hwi
Common hive commands
First, we will introduce several frequently-used hive commands. Later, we will introduce the usage of various commands in detail.
View All databases: Show Databases
Switch to hive Database: Use hive
Show tables
Create Table Test (
/selectcat/er//m/er/49.html ........ .....
...... ......
Description: Platform and USER_ID also represent platform and user id;seq fields that represent the order in which users are sorted by time, and From_url and To_url on which page the user jumps to on behalf of each page. For the first access record for a user on a platform, its from_url is null (NULL).
Problem-solving requirements: need to be done in two ways:
1. Implement a
Label:Hive Architecture:is the Data warehouse infrastructure built on top of Hadoop. Similar to the database, except that the database focuses on some transactional operations, such as modify, delete, query, in the database this piece occurs more. The Data Warehouse is primarily focused on querying. For the same amount of data in the database query is relatively slow, in the Data Warehouse query efficiency is relatively fast. The Data warehouse is query-oriented, and the amount of data processed
table08_index ontable table08 (columns)As 'comput'Withdeferred rebuildTblproperties ("prop3" = "value3", "prop4" = "value4 ");
I. If the index exists, delete it.
Drop index if exists table09_indexon table09;
J. Re-indexing on partitions
Alter index table10_index on table10partition (columnx = 'valueq ', columny = 'valuer') rebuild;
4. index test
(1) query the number of rows in the table
Hive (hive)> select
access metabase, a metastoreserver is started on the server side, and the client accesses the metabase through Metastoreserver using the Thrift protocol.
Attach a meta-database data dictionary (if you use mysql,show tables): BUCKETING_COLS:COLUMNS:Hive table field information (Fields note, Field name, field type, field ordinal) DBS: metabase information, Storing HDFs path information partition_keys:hive partition Table partitioning key SDS: The HDFS Data directory and data format for all
jdbcconnect from the previous hive client to the lib of the new hive0140.
Cp mysql-connector-java-5.1.23-bin.jar ../hive0140/lib/
6 put the hive-site.xml configured earlier versions, the hive-env.xml and the hive-log4j.properties back under the current version of conf.
7. The upgrade is complete. You can perform
statement into the corresponding MapReduce program, run the job through the MapReduce computation framework, and then get our final analysis results.In the run of Hive, the user only need to create tables, import data, write SQL analysis statements, the rest of the process will be completed automatically by the Hive framework, and create tables, import data, write SQL analysis statements is actually the kn
First, the historical value of hive1, Big Data is known for Hadoop, and Hadoop is useful because of hive. Hive is the killer on Hadoop application,hive is the Data Warehouse on Hadoop, while Hive has both the storage and query engines in the Data warehouse. And Spark SQL is a much better and more advanced query engine
Cause: The above problem is usually caused by a script running hive under the bin/directory.
Explanation: assume that the hive source check out to the local hive-trunk directory, and compile the source without specifying the "Target.dir" attribute, if the hive_home variable points to the Hive-trunk directory, $hive_ A
/36629339
What is Hive?
Hive provides a way for you to use SQL to query data. But it is best not to use Hive for real-time queries. Hive is very slow because the implementation principle of Hive is to convert SQL statements into multiple Map Reduce tasks. The official docume
First, the Hive function1. Hive Built-in function(1) More content, see "Hive Official Documents"Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF(2) Detailed explanation:Http://blog.sina.com.cn/s/blog_83bb57b70101lhmk.html(3) test the shortcut to the built-in function:1. Creating a dual Table CREATE t
Tags: lazy expand lib Access time info version MySQL database blog artOverviewThe metadata information of Hive is usually stored in the relational database, and the common MySQL database is managed as a meta-database. The previous installation of Hive also stores metadata information in the MySQL database.Hive metadata information has 57 tables in MySQL dataOne, the metadata table (version) that stores the
kylin2.3 version enables JDBC data sources (you can generate hive tables directly from SQL, eliminating the hassle of manually conducting data to hive and building hive tables)DescriptionThe JDBC data source, which is essentially a hive data source.Performance is still not good because of the database Big Table Associa
1. Hive architecture and basic composition the following is the schema diagram for hive. Figure 1.1 Hive Architecture
The architecture of hive can be divided into the following parts: (1) There are three main user interfaces: Cli,client and WUI. One of the most common is when CLI,CLI starts, it initiates a
Questions Guide:
1. What three types of user access does hive provide?
2, when using Hiveserver, you need to start which service first.
3, Hiveserver's Start command is.
4. Hiveserver is the service through which to provide remote JDBC access.
5, how to modify the default boot port of Hiveserver.
6. Which packages are required for the Hive JDBC driver connection.
7, HiveServer2 and Hiveserver in the use of
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.