The data index and the selection of axis label information in the Pandas object have many effects:
Use known indications to determine data (that is, providing metadata), which is important for analysis, visualization, and display of the interactive console to enable automatic and explicit data alignment allows you to intuitively get and set a subset of datasets in this part, we will devote ourselves to the ultimate purpose: how to slice, Dice and generally get and set a subset of the Pandas object. The articles will be focused on series and dataframe, as they have great potential. It is hoped that more effort will be devoted to high-dimensional data structures (including panel) in the future, especially in the context of advanced label-based indexing.
Tip: Index operations for Python and bumpy [] and property operations. Provides a very quick and easy way to pandas data structures. If you already know how to manipulate Python dictionaries and numpy arrays, there's nothing new. However, because the type of data cannot be predicted in advance, there are some optimization limitations to using standard operations directly. For the product code, we recommend that you take advantage of the optimized pandas data usage shown in this article.
Warning: Whether a set operation returns a copy or a reference may depend on the situation. This is sometimes referred to as "chained assignment", and we should avoid this situation.
Warning: In version 0.15.0, like other Pandas objects, index is no longer a subclass of Ndarray, but a subclass of Pandasobject. This has little effect.
A variety of indexing methods
To achieve a simpler location-based index, the object selection method adds some user requests. Pandas now supports three types of multi-axis indexes.
. Loc is the most basic label-based index, but it can also be used with Boolean arrays. When item cannot be found,. LOC will produce keyerror. Legal inputs are: a single label, such as 5 or "a", (note that 5 is indexed as an index label, not an integer position index) a list or array label ["A", "B", "C"] a slice object with the label "a": "F" (note that, in contrast to the Python slice, The first and last of these slices are contained inside! A Boolean array is a callable function (called series, Dataframe or panel) and returns the valid output of the index (one of the above). Iloc is the most basic integer-based index (from the No. 0 bit of the axis to the length-1 bit), but it can also be used for Boolean arrays. In addition to indexers that allow a hyper-scoped index, if a requested index is outside the index range, the. Iloc will produce indexerror. The legal input is: an integer. Such as 51 lists or arrays of integers. such as [3,0,4] An Integer slice object, such as 1:71 Boolean array a callable function (call series, dataframe or panel) and return the valid output of the index (one of the above). IX supports mixed indexes based on integers and labels. It is primarily label-based, but unless the corresponding axis is an integer type, it will return to the integer location for access.. IX is the most pervasive and can support. Loc and. Iloc any input: IX also supports floating-point labels. When processing a hierarchical index that is based on a mix of locations and labels. IX is particularly useful. However, an integer-based axis supports label-based indexing only, and does not support location-based indexing. Therefore, in such cases, the use of. Iloc or. Loc will usually
More explicit.
. Loc,. Iloc,. IX, and [] indexes can accept a callable object as an indexer. Use the following tags to get values from a multi-axis object (using. Loc For example, but also for. Iloc and. ix). Any axis accessor may be an empty slice: the axis is assumed to be nonstandard. (e.g. p.loc[' a '] equivalent to p.loc[' a ',:,:]) Object Type indexers Series s.loc[indexer] DataFrame df.loc[row_indexer,column_indexer ] Panel p.loc[item_indexer,major_indexer,minor_indexer basic knowledge as mentioned in the data structure in the previous section, the main function of indexing using [] is the equivalent of __ in Python GETITEM__) is the selection of a low-dimensional slice. Therefore, the object type picks the return value type series Series[label] scalar value DataFrame frame[colname] corresponding to the colname Series Panel Panel[itemname] corresponding to the ItemName dat Aframe
Here we build a simple time series data set to illustrate the indexing function:
In [1]: Dates = pd.date_range (' 1/1/2000 ', periods=8) in [2]: df = PD. DataFrame (NP.RANDOM.RANDN (8, 4), index=dates, columns=[' A ', ' B ', ' C ', ' D ']) in [3]: DF out[3]: A B C D 2000-01-01 0.469112 -0.282863-1.509059-1.135632 2000-01-02 1.212112-0.173215 0.119209-1.044236 2000-01-03-0.861849-2.104569-0.494929 1 .071804 2000-01-04 0.721555-0.706771-1.039575 0.271860 2000-01-05-0.424972 0.567020 0.276232-1.087401 2000-01-06-0.67 3690 0.113648-1.478427 0.524988 2000-01-07 0.404705 0.577046-1.715002-1.039268 2000-01-08-0.370647-1.157892-1.344312 0.844885 in [4]: panel = PD. Panel ({' One ': DF, ' II ': Df-df.mean ()}) in [5]: Panel out[5]: <class ' Pandas.core.panel.Panel ' > Dimensions:2 (i tems) x 8 (Major_axis) x 4 (Minor_axis) Items axis:one to, Major_axis axis:2000-01-01 00:00:00 to 2000-01-08 00:00:00 Minor_axis axis:a to D Note: Unless otherwise specified, all indexing functions are generic and not only applicable to the time series. Therefore, according to the above, we use [] to achieve the most basic index: in [6]: s = df[' A ' "in [7]: S[dates[5]] out[7]: -0.67368970808837059 in [8]: panel[' out[8]: A B C D 2000-01-01 0.409571 0.113086-0.610826-0.936507 2000-01-02 1.152571 0.222735 1.017442-0.845111 20 00-01-03-0.921390-1.708620 0.403304 1.270929 2000-01-04 0.662014-0.310822-0.141342 0.470985 2000-01-05-0.484513 0.962 970 1.174465-0.888276 2000-01-06-0.733231 0.509598-0.580194 0.724113 2000-01-07 0.345164 0.972995-0.816769-0.840143 2 000-01-08-0.430188-0.761943-0.446079 1.044010 You can select multiple columns in order by passing a list of columns to []. If a column is no longer dataframe, an exception is thrown. You can also set multiple columns in this way. In [9]: DF out[9]: A B C D 2000-01-01 0.469112-0.282863-1.509059-1.135632 2000-01-02 1.212112-0.173215 0.119209-1.044 236 2000-01-03-0.8618492881064151-2.104569-0.494929 1.071804 2000-01-04 0.721555-0.706771-1.039575 0.271860 2000-01- 05-0.424972 0.567020 0.276232-1.087401 2000-01-06-0.673690 0.113648-1.478427 0.524988 2000-01-07 0.404705 0.577046-1. 715002-1.039268 2000-01-08-0.370647-1.157892-1.344312 0.844885 in [ten]: df[[' B ', ' a ']] = df[[' A ', ' B ']] #交换两个列的值 in [1 1]: DF out[11]: A B C D 2000-01-01-0.282863 0.469112-1.509059-1.135632 2000-01-02-0.173215 1.212112 0.119209-1.044236 2000-01-03-2.104569-0.861849- 0.494929 1.071804 2000-01-04-0.706771 0.721555-1.039575 0.271860 2000-01-05 0.567020-0.424972 0.276232-1.087401 2000-0 1-06 0.113648-0.673690-1.478427 0.524988 2000-01-07 0.577046 0.404705-1.715002-1.039268 2000-01-08-1.157892-0.370647 -1.344312 0.844885
When you apply this transformation to a subset of columns in place, you may find the usefulness of this method.
Warning: When you set series and Dataframe from. Loc,. Iloc, and. IX, pandas aligns all axes.
This does not change the DF because the column alignment is done before the value is assigned.
in [[]: df[[' A ', ' B ']] out[12]: A B 2000-01-01-0.282863 0.469112 2000-01-02-0.173215 1.212112 2000-01-03-2.104569-0.8 61849 2000-01-04-0.706771 0.721555 2000-01-05 0.567020-0.424972 2000-01-06 0.113648-0.673690 2000-01-07 0.577046 0.4047 2000-01-08-1.157892-0.370647 in [+]: df.loc[:,[' B ', ' a ']] = df[[' A ', ' B ']] #这种方法无法使列A和列B的值对调 in []: df[[' A ', ' B ']] OUT[14]
A simple time series data set is constructed to illustrate the indexing function.