[Data analysis tool] Pandas function introduction (I), data analysis pandas
If you are using Pandas (Python Data Analysis Library), the following will certainly help you.
First, we will introduce some simple concepts.
DataFrame: row and column data, similar to sheet in Excel or a relational database table
Series: Single Column data
Axis: 0: Row, 1: Column
This article describes how the pandas series with the index index is vectorized:1. Index indexed arrays are the same:S1 = PD. Series ([1, 2, 3, 4], index=['a','b','C','D']) S2= PD. Series ([ten, +, +], index=['a','b','C','D'])PrintS1 +s2a11b22C33D44Dtype:int64Add the values corresponding to each index directly2. Index indexed array values are the same, in different order:S1 = PD. Series ([1, 2, 3, 4], index=['a','b','C','D']) S2= PD. Series ([ten, +,
, how to do? For more information please go to other blogs, where more detailed instructions are available .Pandas import time data for format conversion Draw multiple graphs on one canvas and add legends1 fromMatplotlib.font_managerImportfontproperties2Font = fontproperties (fname=r"C:\windows\fonts\STKAITI. TTF", size=14)3colors = ["Red","Green"]#the color used to specify the line4Labels = ["Jingdong","12306"]#used to specify the legend5Plt.plot (
This article mainly introduces the use of Python in the Pandas Library for CDN Log analysis of the relevant data, the article shared the pandas of the CDN log analysis of the complete sample code, and then detailed about the pandas library related content, the need for friends can reference, the following to see together.
Objective
Recent work encountered a dema
This article mainly gives you a detailed explanation of python in pandas. Dataframe exclude specific Line Method sample code, the text gives the detailed sample code, I believe that everyone's understanding and learning has a certain reference value, the need for friends to see together below.
Pandas. Dataframe Exclude specific lines
If we want a filter like Excel, as long as one or more of the rows, you c
It's been a lot of red boxes all afternoon.
Python2 and Python3 version conflicts
Pip version IssuePip-v
Updatesudo apt-get update
sudo apt-get install Python-dev
Finally do not know how to install, feeling is one of the following two ways‘‘‘ C++ sudo easy_install -U setuptools ‘‘‘ ‘‘‘ C++ sudo pip install --upgrade setuptools ‘‘‘
(Just beginning to try also not, do not know why suddenly magic can.) If not again, run both sides, see there is an answer is to run on both
]]=1# the selected location data is replaced with 1
4) Use Dataframe to filter the data (like where in SQL):
Alist = [' 023-18996609823 ']df_obj[' user number '].isin (alist) #将要过滤的数据放入字典中, use Isin to filter the data, return the row index and the results of each row filter, and return if the match is Turedf_obj [df_obj[' User number '].isin (alist)] #获取匹配结果为tu
']], columns=['p1', 'p2 ...: ', 'p3'])In [4]: dfOut[4]: p1 p2 p30 GD GX FJ1 SD SX BJ2 HN HB AH3 HEN HEN HLJ4 SH TJ CQ
If you only want two rows whose p1 is GD and HN, you can do this:
In [8]: df[df.p1.isin(['GD', 'HN'])]Out[8]: p1 p2 p30 GD GX FJ2 HN HB AH
However, if we want data except the two rows, we need to bypass the point.
The principle is to first extract p1 and convert it to a list, then remove unnecessary rows (values) from the list, and the
the original value, which is different from ndarry, for example, the drop line after the call to the original object, found that there is no change Drop column: Obj4.drop (' Nevada ', Axis=1)In the parameters of many functions of Python, the default is to consider row, so there is axis (axis) This parameter Axis=1 is vertical, that is, the columnAxis=0 is a horizontal, 4.2 Select selection, slice slicing, index A: Select a separate column, which will return a Series, df[' a ' an
Preface
Recent work encountered a demand, is to filter some data according to the CDN log, such as traffic, status code statistics, TOP IP, URL, UA, Referer and so on. Used to be the bash shell implementation, but the log volume is large, the number of logs of G, the number of rows up to billies level, through the shell processing a little bit, processing time is too long. The use of the data Processing library for the next Python pandas was studied
This article mainly introduces you to the pandas in Python. Dataframe to exclude specific lines of the method, the text gives a detailed example code, I believe that everyone's understanding and learning has a certain reference value, the need for friends to see together below.
Objective
When you use Python for data analysis, one of the most frequently used structures is the dataframe of pandas, about
Python pandas usage Daquan, pythonpandas Daquan
1. Generate a data table
1. Import the pandas database first. Generally, the numpy database is used. Therefore, import the database first:
import numpy as npimport pandas as pd
2. Import CSV or xlsx files:
df = pd.DataFrame(pd.read_csv('name.csv',header=1))df = pd.DataFrame(pd.read_excel('name.xlsx'))
3. Create a da
specific valueL Boolean Index1. Use a single column value to select the data:2. Use the where operation to select the data:3, using the Isin () method to filter:L Set1. Set a new column:2. Set the new value by tag:3. Set a new value by location:4. Set a new set of values through a numpy array:The results of the above operations are as follows:5. Set the new value by the Where operation:Iv. processing of missing valuesIn
Summary One, create object two, view data three, select and set four, missing value processing Five, related Operations VI, aggregation seven, rearrangement (reshaping)Viii. Time Series Nine, categorical type ten, drawing Xi. Import and save data content# Coding=utf-8import pandas as PDimport NumPy as NP# # # One, create object# 1. You can pass a list object to create a Series,pandas the integer index is
label as a numpy array of Python objects
Int64index
Special index for integers
Multiindex
A hierarchical Index object that represents a multi-level index on a single axis. Can be seen as an array of tuples
Datetimeindex
Memory nanosecond timestamp (denoted by NumPy's Datetime64 type)
Periodindex
Special index for period data (time interval)
2.2.d.1 Primary Index Property
function
This article mainly introduces pandas in python. the DataFrame method for excluding specific rows provides detailed sample code. I believe it has some reference value for everyone's understanding and learning. let's take a look at it. This article mainly introduces pandas in python. the DataFrame method for excluding specific rows provides detailed sample code. I believe it has some reference value for ever
transformations, we can save it to a file.Groupd=df.groupby ('CID')['Atimes']#Save to CSV file, Keep indexGroupd.mean (). To_csv ('E:\log\channel_add\group10.csv', index=True)#output to Excel fileDf3=pd.to_excel (r'E:\log\token0722v1.xlsx')#Save to CSV file, do not keep indexDf.to_csv ('E:\log\lost.txt', Index=false)3. Screening of dataOften we don't need all the data in the file, just need a part of it, pandas provides a lot of ways to cut it out#Lo
methodRanking:Rank ()Axis index with duplicate valuesThe Is_unique () property of the index can tell you if its value is uniqueSummary and calculation of descriptive statisticsSUM ()Mean ()Describe ()Describing and summarizing statistical functionscorrelation coefficients and covarianceThe series and Dataframe methods are computed for the parameter pairs.Unique value, value count, and membershipUnique value: Unique () methodValue count: The Value_counts () method calculates how often each value
How do I delete the list hollow character?Easiest way: New_list = [x for x in Li if x! = ']This section mainly learns the basic operations of pandas based on the previous two data structures.设有DataFrame结果的数据a如下所示: a b cone 4 1 1two 6 2 0three 6 1 6
First, view the data (the method of viewing the object is also applicable for series)1. View Dataframe before XX line or after XX lineA=dataframe (data);A.head (6) indicates that
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.