Python uses pandas to implement data splitting instance code, pythonpandas

Source: Internet
Author: User

Python uses pandas to implement data splitting instance code, pythonpandas

This article focuses on the Python programming to divide data into data blocks with the same time span through pandas. The details are as follows.

First, the data is shown in the following dataframe format. The column names are date and ip addresses. I need to count the ip addresses that appear in every five seconds and the frequency of these ip addresses.

 ip   date0 127.0.0.21 15/Jul/2017:18:22:161 127.0.0.13 15/Jul/2017:18:22:162 127.0.0.11 15/Jul/2017:18:22:173 127.0.0.11 15/Jul/2017:18:22:204 127.0.0.21 15/Jul/2017:18:22:215 127.0.0.13 15/Jul/2017:18:22:226 127.0.0.14 15/Jul/2017:18:26:367 127.0.0.16 15/Jul/2017:18:32:158 127.0.0.11 15/Jul/2017:18:36:03

I have been searching for python for a long time on the Internet, but I haven't seen any python answers yet. But I found the R language solution in stackoverflow. If you are interested, please refer to it.

Inspired by it, I have implemented my needs in a less elegant way. If you have a better solution, please kindly advise:

Step 1: Convert the date format in the data to the standard format
# Date_ip is my dataframe data date_ip ['date'] = pd. to_datetime (date_ip ['date'], format = '% d/% B/% Y: % H: % M: % s ')
Step 2: divide the data start time and end time by 5s (because the time period may not be a multiple of 5s, to avoid the loss of the last time, add 5 s at the end)
frequency = 5time_range = pd.date_range(date_ip['date'][0],    date_ip['date'][date_ip.shape[0]-1]    +frequency*Second(), freq='%sS'%frequency)
Step 3: Change date to index
date_ip = date_ip.set_index('date')

Step 4: Calculate the frequency of the data in each time period (because the first and last data are contained in the label slice, to avoid repeated calculation, 1 s is subtracted at the end)

for i in xrange(0,len(time_range)-1): print get_frequency(date_ip.loc[time_range[i]:time_range[i+1]-1*Second()])
Complete code
import pandas as pdfrom pandas.tseries.offsets import Seconddef get_frequency(date_ip): ip_frequency = {} for i in xrange(0,date_ip.shape[0]): ip_frequency[date_ip['ip'][i]] = ip_frequency.get(date_ip['ip'][i], 0) + 1 return ip_frequency,date_ip.shape[0]if __name__ == '__main__':  date_ip['date'] = pd.to_datetime(date_ip['date'], format='%d/%b/%Y:%H:%M:%S') frequency = 5 time_range = pd.date_range(date_ip['date'][0], date_ip['date'][date_ip.shape[0]-1]    +frequency*Second(), freq='%sS'%frequency)  date_ip = date_ip.set_index('date') for i in xrange(0, len(time_range) - 1): print get_frequency(date_ip.loc[time_range[i]:time_range[i + 1]-1*Second()])

Data running result at the beginning of the article:

({'127.0.0.21' : 1, '127.0.0.13' : 1, '127.0.0.11' : 2}, 4)({'127.0.0.21': 1, '127.0.0.13': 1}, 2)({'127.0.0.14': 1}, 1)({'127.0.0.16': 1}, 1)({'127.0.0.11': 1}, 1)
Summary

The above is all about python's use of pandas to implement data splitting instance code. I hope it will be helpful to you. If you are interested, you can continue to refer to other related topics on this site. If you have any shortcomings, please leave a message. Thank you for your support!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.