Requirements: Draw the daily trend of the channel users (a set of data per minute 1440 groups a day, 2000+ channels, distinguish between new/old users, 2*1440*2000+=576 million +/per day), need to save 90 days.
Query criteria: Channel number, new or old user, date
Rowkey: Channel _ date _ New or old user _ hour minute (HHMM)
Connect HBase
fromThrift.protocolImportTbinaryprotocol fromThrift.transportImportTsocket fromThrift.transportImportTtransport fromHBaseImportHbasedefHbase_connect (): Transport= Tsocket.tsocket ('*', 9090) #transport = Tsocket.tsocket (' 10.50.14.105 ', 9090)Hbase_transport =Ttransport.tbufferedtransport (Transport) Protocol=Tbinaryprotocol.tbinaryprotocol (transport) Hbase_client=hbase.client (Protocol) Hbase_transport.open ()returnHbase_transport, Hbase_client
To create a table:
def create_hbase_table (): = Hbase_connect () = client.gettablenames () print tables Client.createtable ('client_rt_pv'default' ) ]) = client.gettablenames () print tables
Insert data:
Mutationsbatch = []## # LoopRowkey ='_'. join ([Tmp_pub, Daystr,'AC', TMP_CT]) Mutations=[hbase.mutation (column="DEFAULT:PV", value=str (TMP_PV)), hbase.mutation (column="Default:uv", value=str (TMP_UV)), hbase.mutation (column="Default:pvdivuv", Value=str ('%.2f'% (Tmp_pv/float (TMP_UV)ifTmp_uv! = 0Else0,)), Hbase.mutation (column="Default:tm", value=str (TMP_CT)), hbase.mutation (column="default:pub", Value=str ("". join ([Tmp_pub,' '])), Hbase.mutation (column="Default:pubname", Value=pub_id.get (Tmp_pub,'Unknown'))]mutationsbatch.append (hbase.batchmutation (Row=rowkey, mutations=mutations))## # End LoopHbase_client.mutaterows ("CLIENT_RT_PV", Mutationsbatch, None) hive_transport.close ()
HBase Rowkey Design Example