Python data analysis-real IP request Pandas, pythonpandas

Source: Internet
Author: User

Python data analysis-real IP request Pandas, pythonpandas

Preface

Pandas is a data analysis package built based on Numpy that contains more advanced data structures and tools. Similar to Numpy, the core is ndarray, and pandas is centered around the two core data structures of Series and DataFrame. Series and DataFrame correspond to one-dimensional sequences and two-dimensional table structures respectively. Pandas uses the following methods to import data:

from pandas import Series,DataFrameimport pandas as pd

1.1 Pandas Analysis Steps

1. Load Log Data

2. Load area_ip data

3. COUNT the number of real_ip requests. Similar to the following SQL:

SELECT inet_aton(l.real_ip),  count(*),  a.addrFROM log AS lINNER JOIN area_ip AS a  ON a.start_ip_num <= inet_aton(l.real_ip)  AND a.end_ip_num >= inet_aton(l.real_ip)GROUP BY real_ipORDER BY count(*)LIMIT 0, 100;

1.2. Code

Cat pd_ng_log_stat.py #! /Usr/bin/env python #-*-coding: UTF-8-*-from ng_line_parser import NgLineParser import pandas as pdimport socketimport struct class PDNgLogStat (object ): def _ init _ (self): self. ng_line_parser = NgLineParser () def _ log_line_iter (self, pathes): "" parse each row in the file and generate an iterator "for path in pathes: with open (path, 'R') as f: for index, line in enumerate (f): self. ng_line_parser.parse (line) yield self. ng_line_parser.to_dict () def _ ip2num (self, ip): "used to convert an IP address to a number" ip_num =-1 try: # convert the IP address to the INT/LONG number ip_num = socket. ntohl (struct. unpack ("I", socket. inet_aton (str (ip) [0]) failed T: pass finally: return ip_num def _ get_addr_by_ip (self, ip ): "Get address by IP Address" "ip_num = self. _ ip2num (ip) try: addr_df = self. ip_addr_df [(self. ip_addr_df.ip_start_num <= ip_num) & (ip_num <= self. ip_addr_df.ip_end_num)] addr = addr_df.at [addr_df.index.tolist () [0], 'addr '] return addr counter T: return None def load_data (self, path ): "generate DataFrame by loading data to the file path" "self. df = pd. dataFrame (self. _ log_line_iter (path) def uv_real_ip (self, top = 100 ): "cdn ip count Statistics" group_by_cols = ['real _ ip'] # columns to be grouped, only this column is calculated and displayed # The number of times url_req_grp = self. df [group_by_cols]. groupby (self. df ['real _ ip']) return url_req_grp.agg (['Count']) ['real _ ip']. nlargest (top, 'Count') def uv_real_ip_addr (self, top = 100): "" count the number of real ip addresses "cnt_df = self. uv_real_ip (top) # Add the ip address column cnt_df.insert (len (cnt_df.columns), 'addr ', cnt_df.index.map (self. _ get_addr_by_ip) return cnt_df def load_ip_addr (self, path): "load IP" cols = ['id', 'IP _ start_num ', 'IP _ end_num ', 'IP _ start', 'IP _ end', 'addr', 'operator'] self. ip_addr_df = pd. read_csv (path, sep = '\ t', names = cols, index_col = 'id') return self. ip_addr_df def main (): file_pathes = ['www .ttmark.com. access. log'] pd_ng_log_stat = PDNgLogStat () loads (file_pathes) # loads the ip address area_ip_path = 'area_ip.csv 'loads (area_ip_path) # counts the user's real IP traffic and address print outputs () if _ name _ = '_ main _': main ()

Running statistics and output results

Python route count addrreal_ip 60.191.123.80 101013 Hangzhou City, Zhejiang Province-32691 Beijing city 22523 ...... 136.243.152.18 889 Germany 157.55.39.219 889 USA 66.249.65.170 888 USA [100 rows x 2 columns]

Summary

The above is all about this article. I hope this article will help you in your study or work. If you have any questions, please leave a message.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.