Python Data analysis Real IP request pandas detailed

Source: Internet
Author: User
Objective

Pandas is a data analysis package built on Numpy that contains more advanced structures and tools similar to the core of Numpy is the Ndarray,pandas also revolves around Series and DataFrame two core data structures. Series and DataFrame correspond to one-dimensional sequences and two-dimensional table structures, respectively. The following are the conventional methods of importing pandas:

From pandas import Series,dataframeimport pandas as PD


1.1. Pandas Analysis steps

1. Loading log data

2. Loading AREA_IP data

3. Count the number of REAL_IP requests. SQL similar to the following:

SELECT Inet_aton (L.REAL_IP),  count (*),  a.addrfrom log as Linner JOIN area_ip as a on  a.start_ip_num <= ine T_aton (L.REAL_IP)  and A.end_ip_num >= Inet_aton (l.real_ip) GROUP by Real_iporder by Count (*) LIMIT 0, 100;


1.2. Code

Cat pd_ng_log_stat.py#!/usr/bin/env python#-*-coding:utf-8-*-from ng_line_parser import nglineparser import Pandas a    s pdimport socketimport struct class Pdnglogstat (object): Def __init__ (self): Self.ng_line_parser = Nglineparser ()        def _log_line_iter (self, pathes): "" "parses each line in the file and generates an iterator" "for Path in Pathes:with open (path, ' R ') as F:     For index, line in enumerate (f): Self.ng_line_parser.parse (line) yield self.ng_line_parser.to_dict () def _ip2num (self, IP): "" "for IP conversion to digital" "" Ip_num =-1 Try: # convert IP to int/long number Ip_num = Socket.ntohl (s Truct.unpack ("I", Socket.inet_aton (str (IP))) [0]) except:pass finally:return ip_num def _get_addr_by_ip (Self, IP): "" Gets the Address "" by the IP given ip_num = self._ip2num (IP) try:addr_df = self.ip_addr_df[(self.ip_addr_df.ip_ Start_num <= ip_num) & (Ip_num <= self.ip_addr_df.ip_end_num)] addr = addr_df.at[addr_df. Index.tolist () [0], ' addr'] Return addr Except:return None def load_data (self, Path): "" Generates DataFrame "" by loading data to the file path SELF.DF = PD. DataFrame (PATH) def uv_real_ip (self, top = 100): "" "Statistics CDN IP Volume" "" Group_by_cols = [' Real_ip '] # need to group columns, only calculate and display the column # Direct Statistics URL_REQ_GRP = self.df[group_by_cols].groupby (self.df[' real_ip    ']) return Url_req_grp.agg ([' Count ']) [' Real_ip '].nlargest (top, ' count ') def uv_real_ip_addr (self, top = 100): "" "Statistics real IP address Amount" "" CNT_DF = Self.uv_real_ip (top) # Add IP Address column cnt_df.insert (Len (cnt_df.columns), ' a DDR ', Cnt_df.index.map (SELF._GET_ADDR_BY_IP)) return CNT_DF def load_ip_addr (self, Path): "" "Load IP" " "cols = [' id ', ' ip_start_num ', ' ip_end_num ', ' ip_start ', ' ip_end ', ' addr ', ' operator '] self.ip_addr_df = PD. Read_csv (path, sep= ' \ t ', Names=cols, index_col= ' id ') return self.ip_addr_df def main (): File_pathes = [' www.ttmark.co M.access. Log '] Pd_ng_log_stat = Pdnglogstat () pd_ng_log_stat.load_data (file_pathes) # load IP address area_ip_path = ' Area_ip.csv  ' Pd_ng_log_stat.load_ip_addr (Area_ip_path) # Statistics user real IP traffic and address print pd_ng_log_stat.uv_real_ip_addr () if __name__ = = ' __main__ ': Main ()


Run statistics and output results

Python pd_ng_log_stat.py           count  addrreal_ip           60.191.123.80  101013 Hangzhou city, Zhejiang-        32691  None218.30.118.79  22523   Beijing ... 136.243.152.18   889   Germany 157.55.39.219   889   US 66.249.65.170   888   USA  [2 rows X Columns

Summarize

The above is the entire content of this article, I hope that the content of this article on everyone's study or work to bring certain help, if there are questions you can message exchange.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.