Conditions for using PHP to connect to hive
1. Install Thrift
#./Configure -- without-Ruby
# Make make install
If libevent-devel is not installed, install the two dependent libraries Yum-y install libevent-devel first.
Start hive thrift after installation
#./Hive -- service hiveserver>/dev/null 2>/dev/null
Check whether the default port 10000 of h
The hive query is first converted to a physical query plan, and the physical query plan typically contains multiple mapreduce jobs, and the output of one mapreduce job can be used as input to another mapreduce job. The MapReduce job designed by Hive for
Tags: hadoop data hive When creating a table, hive will specify a delimiter. For example, it is set to tab to separate attribute columns \ n to separate records.
However, if the format of the document we uploaded is not as follows, the record is saved but the query result is indeed a column of null.
The format of the TXT file to be uploaded,
you to use multiple map tasks to complete.Set mapred.reduce.tasks=10;CREATE TABLE A_1 asSELECT * from aDistribute by RAND (123);This will be a table of records, randomly scattered into the a_1 table containing 10 files, and then replaced by a_1 in the SQL table A, you will use 10 map tasks to complete.Each map task handles more than 12M (millions of records) of data, which is certainly much more efficient.Looks like these two kinds of contradictions, one is to merge small files, one is to take
Today, when using hive to query the maximum value of a certain analysis data, there is a certain problem, in hive, the phenomenon is as follows:caused by:java.io.filenotfoundexception://http://slave1:50060/tasklog?attemptid=attempt_201501050454_0006_m_00001_1Then take a look at the Jobtracker log:2015-01-05 21:43:23,724 INFO Org.apache.hadoop.mapred.jobinprogress
statement can use a regular expression to make a column selection, and the following statement queries all columns except DS and HR:SELECT ' (ds|hr)? +.+ ' from TestFor exampleSearch by First piecehive> SELECT A.foo from invites a WHERE a.ds= ' To output query data to a directory:hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out ' SELECT a.* from invites a WHERE a.ds= ' output query results to a local direct
Querying for different Androidid
Select COUNT (Distinct androidid) from table where dt= ' date ' and androidid are not null and Androidid
and Androidid
Query the total number of unique users, because a user is determined to have a unique value of four attributes, so add and then go back
Hive> Select COUNT (Distinct concat (NVL (IDFA, '), Nvl (Mac, '), NVL (IMEI, '), NVL (Androidid, '))) from table wheRe
These subqueries can be executed in databases such as Oracle and MySQL, but are not supported in hive, but we can change these query statements to join operations:
-- 1. Subquery
Select
*
from
a a
where
a.update_time = (select min (b.update_time) from a B)
-- 2.in operation
Select
*
from
a a
where
a.dept = ' IT '
and
Change to join operation as follows:
-- 1
Select
same key is aggregated together, and subsequent must be the aggregation operation
Order BY and Sort by
Order by ensures global order
Sort by simply ensures that each reduce has an ordered output, and if there is only one reduce, the same as the order by effect
Application Scenarios
Too many small files (control the number of output files by the number of reduce)
File is super large
File size of map output is not uniform
The file size of the reduce output is not uniform
Cluster by
Bring togeth
In big data scenarios, using hive to do query statistical analysis should be aware that the computational delay is very large, may be a very complex statistical analysis needs, need to run more than 1 hours, but compared to the use of MySQL and other relational database analysis, the execution speed much faster. Using HIVEQL to write SQL-like query parsing statem
1. Where statement
Query the list of English scores greater than or equal to 70:
Select Name,ceil (Salary) as salary,age from employees where score[' 中文版 ']>=70;
Output Result:
Name Salary Age
WANGWU1 5500 20
WANGWU3 8400 20
Wangwu4 8400 20
Use the like statement to blur the view of list information
Select Name,ceil (Salary) as salary,age,address.province from employees where address.province like ' river% ';
Output Result:
Name Salary Age Province
Hive Error when executing query statement: org.apache.hadoop.ipc.RemoteException:java.io.IOException:java.io.IOException:
Hive> Select product_id, track_time from Trackinfo limit 5;
Total MapReduce jobs = 1 Launching Job 1 out of 1 number of reduce tasks are set to 0 since there ' s no reduce operator Org.apache.hadoop.ipc.RemoteException:java.io.IOExcepti
Recently, users have complained that the hive Web client does not return results to the front end after submitting some queries, such as a statement that joins five tables, and only one join is removed.
Query to write a temporary table, and then join the last table to do.
I later debug, the Discovery statement is really successful execution, and the result file has been dump into the
First look at the query syntax of the Xia Guan Network:
[With Commontableexpression (, commontableexpression) *] (Note:only available starting with Hive 0.13.0)
SELECT [All | DISTINCT] select_expr, select_expr, ...
From Table_reference
[WHERE where_condition]
[GROUP by col_list] [
ORDER by col_list]
[CLUSTER by Col_ List
| [Distribute by col_list] [SORT by col_list]
]
[LIMIT number]
WHE
Mysql uses indexes for query optimization and mysql index Query Optimization
The purpose of indexing is to improve the query efficiency. It can be analogous to a dictionary. If you want to query the word "mysql", you must locate t
result set type returned by the subquery is a simple value.
b) Single-row sub-query.
The result set type returned by the subquery is 0 or one unit group. Similar to the scalar subquery, but may return 0 tuples.
c) Multiline single-row subquery.
The result set type returned by a subquery is a multi-tuple but has only one simple column.
d) Table sub-
Database optimization tutorial (3) Slow query of records and database optimization tutorial Query1. Slow query foundIn the previous section, we made data preparation for slow queries. This section allows us to find slow queries and record them to files.
3. Slow query of reco
Database optimization tutorial (3) Slow query of records and database optimization tutorial Query1. Slow query foundIn the previous section, we made data preparation for slow queries. This section allows us to find slow queries and record them to files.
3. Slow query of reco
Yesterday's SQL Server query performance optimization-index creation principle (I) mainly introduced the principle. today are some of the main principles and checks the created indexes.
Iii. indexing principles
In general, building indexes depends on the data usage scenarios. In other words, which SQL statements are commonly used to access data? Are these statements missing indexes (or there may be too many
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.