What are the methods of dataframe queries in pandas

Last Update:2018-04-12 Source: Internet

Author: User

Tags fecha python list

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This time to bring you pandas in the Dataframe query what methods, pandas in the Dataframe query of what matters, the following is the actual case, together to see.

Pandas provides us with a variety of slicing methods, which are often confusing if you don't know them well. The following are examples of how these slices are described.

Data introduction

A random set of data is generated first:

In [5]: Rnd_1 = [Random.randrange (1,20) to X in Xrange (+)]  ...: rnd_2 = [Random.randrange (1,20) for x in Xrange (10 (xx)]  ...: rnd_3 = [Random.randrange (1,20) for x in Xrange ()] ...  : fecha = pd.date_range (' 2012-4-10 ', ' 2015-1- 4 ') ...:..:   data = PD. DataFrame ({' Fecha ': fecha, ' rnd_1 ': rnd_1, ' rnd_2 ': rnd_2, ' rnd_3 ': Rnd_3}) in [6]: Data.describe () out[6]:        rnd_1    rnd_2    rnd_3count 1000.000000 1000.000000 1000.000000mean   9.946000   9.825000   9.894000std    5.553911   5.559432   5.423484min    1.000000   1.000000   1.00000025%    5.000000   5.000000   5.00000050%   10.000000  10.000000  10.00000075%   15.000000  15.000000  14.000000max   19.000000  19.000000  19.000000

[] Slicing method

You can slice dataframe with square brackets, which is a bit like a python list slice. Row selection or column selection or chunk selection can be achieved by index.

# line selection in [7]: data[1:5]out[7]: Fecha rnd_1 rnd_2 rnd_31 2012-04-11 1 16 32 2012-04-12 7 6 13 2012-04-13 2    16 74 2012-04-14 4 17 7# column selection in [ten]: data[[' rnd_1 ', ' Rnd_3 ']]out[10]: rnd_1 rnd_30 8 121 1 32 7   13 2 74 4 75 12 86 2 127 9 88 13 179 4 710 14 1411 19 1612 2 1213 15     1814 13 1815 13 1116 17 717 14 1018 9 619 11 1520 16 1321 18 922 1 1823 4 324   6 1125 2 1326 7 1727 11 828 3 1229 4 2.. ...  ... 970 8 14971 19 5972 13 2973 8 10974 8 17975 6 16976 3 2977 12 6978 12 10979 15 139   80 8 4981 17 3982 1 17983 11 5984 7 7985 13 14986 6 19987 13 9988 3 15989 19 6990 7 11991 11 7992 19 12993 2 15994 10 4995 14 13996 12 11997 11 15998 17 14999 3 8[1 Rows x 2 columns]# block selection in [one]: data[:7][[' rnd_1 ', ' rnd_2 ']]oUT[11]: rnd_1 rnd_20 8 171 1 162 7 63 2 164 4 175 12 196 2 7

However, for multi-column selection, you cannot use the 1:5 method to select as a row selection.

in [[]: data[[' rnd_1 ': ' Rnd_3 ']] File "<ipython-input-13-6291b6a83eb0>", line 1  data[[' rnd_1 ': ' Rnd_3 ']         ^syntaxerror:invalid Syntax

Loc

LOC allows you to select rows and columns by index.

In []: data.loc[1:5]out[13]:     fecha rnd_1 rnd_2 rnd_31 2012-04-11   1   2012-04-12   7   6   2012-04-13   2   2012-04-14   4   2012-04-15   8

It is important to note that the LOC differs from the first method in that the 5th row is also selected, and the first method selects only the 4th line.

Data.loc[2:4, [' rnd_2 ', ' Fecha ']]out[14]:   rnd_2   fecha2   6 2012-04-123 2012-04-134   17 2012-04-14

Loc is able to select data between two specific dates, and it is important to note that both dates must be in the index.

In []: Data_fecha = Data.set_index (' fecha ')  ...: Data_fecha.head () out[15]:       rnd_1 rnd_2 Rnd_3fecha             2012-04-10   8   122012-04-11   1   32012-04-12   7   6   12012-04-13   2   72012-04-14   4   7In [16]: # Generate two specific dates ...  : fecha_1 = Dt.datetime (4, +)  ...: fecha_2 = Dt.datetime (4, ...): ...:   # Generate tile data ...  : data_fecha.loc[fecha_1:fecha_2]out[16]:       rnd_1 rnd_2 rnd_3fecha             2013-04-14   4   52013-04-15   1   2   182013-04-17   9   12013-04-18   7   17

Update: If there is no special requirement, it is highly recommended to use LOC with minimal use [], as Loc avoids chained indexing problems when Dataframe is re-assigned, using [] The compiler is likely to give settingwithcopy warnings.

See the official documentation for details: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Iloc

If Loc is selected by the index value, then the Iloc is selected by the index location. Iloc does not care about the exact value of the index, only the number of locations, so use Iloc to use only numeric values in square brackets.

# line selection in [+]: data_fecha[10:15]out[17]:       rnd_1 rnd_2 rnd_3fecha             2012-04-20   6   142012-04-21   162012-04-22   2   6   122012-04-23   8   182012-04-24   8   18# column selection in []: Data_fecha.iloc[:,[1,2]].head () out[18]:       rnd_2 rnd_3fecha          2012-04-10   122012-04-11   32012-04-12   6   12012-04-13   72012-04-14   7# Slice Select in [ []: data_fecha.iloc[[1,12,34],[0,2]]out[19]:       rnd_1 rnd_3fecha          2012-04-11   1   32012-04-22   2   122012-05-14   10

At is used similar to LOC, but has faster access to data than Loc, and can access only a single element and cannot access multiple elements.

In []: Timeit data_fecha.at[fecha_1, ' rnd_1 ']the slowest run took 3783.11 times longer than the fastest. This could mean a intermediate result is being cached.100000 loops, best of 3:11.3µs per Loopin []: Timeit Data_ Fecha.loc[fecha_1, ' rnd_1 ']the slowest run took 121.24 times longer than the fastest. This could mean a intermediate result is being cached.10000 loops, best of 3:192µs per Loopin []: data_fecha.at[ Fecha_1, ' Rnd_1 ']out[22]: 17

Iat

The IAT's relationship to Iloc, like at for Loc, is a faster index-based selection method that can access only a single element at the same location as at.

In []: data_fecha.iat[1,0]out[23]: 1In []: Timeit data_fecha.iat[1,0]the slowest run took 6.23 times longer than the F Astest. This could mean a intermediate result is being cached.100000 loops, best of 3:8.77µs per Loopin [+]: Timeit data_ fecha.iloc[1,0]10000 loops, Best of 3:158µs per loop

Several of the methods mentioned above require that the rank of the query be in the index, or that the position does not exceed the length range, and IX allows you to obtain data that is not in the Dataframe index.

in [+]: Date_1 = dt.datetime (1, 8, ...)  : date_2 = Dt.datetime (1, 4, ...) ...:..:   # Build Slice data  ...: data_fecha.ix[date_1:date_2]out[28]:       rnd_1 rnd_2 rnd_3fecha             2013-01-11   192013-01-12   9   172013-01-13   3   10

As shown in the example above, January 10, 2013 is not selected because this point of time is considered to be 0:0, earlier than 8:30.

Believe that you have read the case of this article you have mastered the method, more exciting please pay attention to the PHP Chinese network other related articles!

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More