index-feature name-Attribute-easy to understand
2. filter the row and column data of dataframe
import pandas as pd,numpy as npfrom pandas import DataFramedf = DataFrame(np.arange(20).reshape((4,5)),column = list('abcde'))
1. df [] df. Select column data
Df.Df [['A', 'B']
2. df. loc [[index], [colunm] use tags to select data
When you do not filter rows, enter "(cannot be blank)" in "[index]", that is, "df
The previous Pandas array (Pandas Series)-(3) Vectorization, said that when the two Pandas series were vectorized, if a key index was only in one of the series , the result of the calculation is nan , so what is the way to deal with nan ?1. Dropna () method:This method discards all values that are the result of NaN , which is equivalent to calculating only the va
Pandas Select Data Iloc and LOC are not used the same way, Iloc is based on the index, LOC is based on the value of the row>>>importpandasaspd>>>importos>>>os.chdir ("d:\\") >>>d=pd.read_csv ("Gwas_water.qassoc",delimiter= "\s+") >> >d.loc[1:3]CHRSNPBPNMISS BETASER2 tp11. 447440.18000.17830.02369 1.0090.318521.449 440.27850.24730.029311.1260.26653 1.452440.1800 0
Sometimes you need to do some work on the values in the Pandas series , but without the built-in functions, you can write a function yourself, using the Pandas series 's apply method, You can call this function on each value inside, and then return a new SeriesImport= PD. Series ([1, 2, 3, 4, 5])def add_one (x): return x + 1print s.apply ( Add_one)# results:0 6dtype:int64A chestnut:Names =PD. Serie
Data conversionDelete duplicate elements The duplicated () function of the Dataframe object can be used to detect duplicate rows and return a series object with the Boolean type. Each element pairsshould be a row, if the row repeats with other rows (that is, the row is not the first occurrence), the element is true, and if it is not repeated with the preceding, the metaThe vegetarian is false.A Series object that returns an element as a Boolean is of great use and is particularly useful for fil
data (like select in SQL):DataFrame #从pandas库中引用DataFrameDf_obj = DataFrame () #创建DataFrame对象Df_obj.dtypes #查看各行的数据格式Df_obj.head () #查看前几行的数据, default first 5 rowsDf_obj.tail () #查看后几行的数据, default after 5 rowsDf_obj.index #查看索引Df_obj.columns #查看列名Df_obj.values #查看数据值Df_obj.describe #描述性统计Df_obj. T #转置Df_obj.sort (columns = ") #按列名进行排序Df_obj.sort_index (by=[","]) #多列排序, use the Times this function is obsolete, use sort_valuesDf_obj.sort_values (by=[",
The following for you to share a pandas implementation of the selection of a specific index of the row, has a good reference value, I hope to be helpful to everyone. Come and see it together.
As shown below:
>>> Import numpy as np>>> import pandas as pd>>> Index=np.array ([2,4,6,8,10]) >>> Data=np.array ([3,5,7,9,11]) >>> DATA=PD. DataFrame ({' num ':d ata},index=index) >>> print (data) num2 910 11
7jeff-5ryan 3
DataFrame
Pandas reading files
in [+]: df = pd.read_table (' pandas_test.txt ', sep= ', names=[' name ', ' age ')) in [+]: dfout[30]: name age0 Bob 261 Loy A 222 Denny 203 Mars 25
Dataframe Column Selection
Df[name]
In [to]: df[' name ']out[31]: 0 Bob1 Loya2 Denny3 marsname:name, Dtype:object
Dataframe Row Selection
Df.iloc[0,:] #第一个参数是第几行, the second argument is a column. This refers to row No. 0 all columns df.iloc[:,0] #全部行, No. 0 c
This article describes how the pandas series with the index index is vectorized:1. Index indexed arrays are the same:S1 = PD. Series ([1, 2, 3, 4], index=['a','b','C','D']) S2= PD. Series ([ten, +, +], index=['a','b','C','D'])PrintS1 +s2a11b22C33D44Dtype:int64Add the values corresponding to each index directly2. Index indexed array values are the same, in different order:S1 = PD. Series ([1, 2, 3, 4], index=['a','b','C','D']) S2= PD. Series ([ten, +,
, how to do? For more information please go to other blogs, where more detailed instructions are available .Pandas import time data for format conversion Draw multiple graphs on one canvas and add legends1 fromMatplotlib.font_managerImportfontproperties2Font = fontproperties (fname=r"C:\windows\fonts\STKAITI. TTF", size=14)3colors = ["Red","Green"]#the color used to specify the line4Labels = ["Jingdong","12306"]#used to specify the legend5Plt.plot (
First, Generate data table1, first import Pandas Library, general will use to NumPy library, so we first import backup:import pandas as pd2. Import csv or xlsx files:df = pd.DataFrame(pd.read_csv(‘name.csv‘,header=1))df = pd.DataFrame(pd.read_excel(‘name.xlsx‘))3. Create a data table with pandas:df = pd.DataFrame({"id":[1001,1002,1003,1004,1005,1006], "date":pd.date_range(‘20130102‘, periods=6), "city":[‘
the original value, which is different from ndarry, for example, the drop line after the call to the original object, found that there is no change Drop column: Obj4.drop (' Nevada ', Axis=1)In the parameters of many functions of Python, the default is to consider row, so there is axis (axis) This parameter Axis=1 is vertical, that is, the columnAxis=0 is a horizontal, 4.2 Select selection, slice slicing, index A: Select a separate column, which will return a Series, df[' a ' an
. Display indexes, columns, and underlying numpy data:3. The describe () function is a quick statistical summary of the data:4. Transpose the data:5, by axis to sort6. Sort by valueThird, the choiceWhile the standard python/numpy selection and setup expressions can come in handy, we recommend using optimized pandas data access as the code used for the project:. At,. IAT,. Loc,. Iloc and. IX For details see
Ming 6.0 - Name:price, Dtype:float64 -Zhang San 1.2 theReese 1.0 -Harry 2.3 -Chen Jiu 5.0 -Xiao Ming 6.0 +Name:price, Dtype:float64 In general, we often need to value by column, then Dataframe provides loc and Iloc for everyone to choose from, but the difference is between the two.1 Print(frame2)2 Print(frame2.loc['Harry'])#Loc can use the index of the string type, whereas the Iloc can only be of type int
Preface
Recent work encountered a demand, is to filter some data according to the CDN log, such as traffic, status code statistics, TOP IP, URL, UA, Referer and so on. Used to be the bash shell implementation, but the log volume is large, the number of logs of G, the number of rows up to billies level, through the shell processing a little bit, processing time is too long. The use of the data Processing library for the next Python pandas was studied
from:76713387How to iterate through rows in a DataFrame in pandas-dataframe by row iterationHttps://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandasHttp://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandasWhen it comes to manipulating dataframe, we inevitably need to view or manipulate the data row by line, so what's the efficient and fast way to do it?Index o
It's been a lot of red boxes all afternoon.
Python2 and Python3 version conflicts
Pip version IssuePip-v
Updatesudo apt-get update
sudo apt-get install Python-dev
Finally do not know how to install, feeling is one of the following two ways‘‘‘ C++ sudo easy_install -U setuptools ‘‘‘ ‘‘‘ C++ sudo pip install --upgrade setuptools ‘‘‘
(Just beginning to try also not, do not know why suddenly magic can.) If not again, run both sides, see there is an answer is to run on both
Function Prototypes:Https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html#pandas.DataFrame.fillnaPad/ffill: Fills the missing value with the previous non-missing valueBackfill/bfill: Fills the missing value with the next non-missing valueNone: Specify a value to replace the missing value
123456789101112131415161718192021st22232425262728293031323334353637383940414243444546474849505152535455565758596061 62 63
Query Write operations Pandas can have powerful query functions like SQL and is simple to do: printtips[[' Total_bill ', ' tip ', ' smoker ', ' time ']] #显示 ' total_bill ', ' tip ', ' Smoker ', ' time ' column, functionally similar to the Select command in SQL printtips[tips[' time ']== ' Dinner ']# Displays data equal to dinner in the time column, functionally similar to the where command in SQL printtips[(tips[' size ']>=5) | (tips[' Total _bill ']>
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.