I have already set up the hadoop and hive environments, created a table page in hive, and loaded the data in. Now I want to count the traffic of each URL from this table and put it in another relational database or display it on the page. What should I do?
Go to the official website and check whether Java, Python, and PHP can be used for implementation. The following is a simple script written in Python.
From hive_service import thrifthive
From hive_service.ttypes import hiveserverexception
From Thrift import Thrift
From Thrift. Transport Import tsocket
From Thrift. Transport Import tTransport
From Thrift. Protocol import tbinaryprotocol
Urlcount = {}
Def getflowbyhive ():
Try:
Transport = tsocket. tsocket ('2017. *. *. 100', 219)
Transport = tTransport. tbufferedtransport (Transport)
Protocol = tbinaryprotocol. tbinaryprotocol (Transport)
Client = thrithive. Client (Protocol)
Transport. open ()
Client.exe cute ("select URL, count (*) from page group by URL ")
While (1 ):
Row = client. fetchone ()
SP = row. Split ('/t ')
If (LEN (SP) <2 ):
Continue
If (ROW = none ):
Break
Urlcount [Sp [0] = Sp [1]
Print SP [0], SP [1]
Transport. Close ()
Setting t thrift. texception, TX:
Print '% s' % (TX. Message)
In the 219. *. *. 200 start the hive service on the server and use it to connect programs such as Python and Java to hive. bin/hive -- service hiveserver 10001 does not support starting the execution program in the nohup background with the default port number 10000, after urlcount is complete, the URL: Count key-value pair should be stored, and you will be able to process the data in the database or display it.
Hwi is short for hive web interface and is a Web replacement solution of hive CLI. 0.7 default is built-in Hwi, CONF/hive-default.xml files are default, start Hwi service in the background
Nohup bin/hive -- service Hwi>/dev/null 2>/dev/null &
Enter http: // 10.20.151.7: 9999/Hwi/in the browser. You can view the data warehouse and run the statement.