This article mainly introduces the pandas data processing basis to filter the specified row or the specified column of the relevant information, the need for friends can refer to the following
The main two data structures of Pandas are: series (equivalent to one row or column of data bodies) and dataframe (a tabular data body equivalent to multiple rows and columns).
This article is intended to facilitate understanding of the associative analogy with Excel or SQL operations rows or columns
1. Re-index: Reindex and IX
The default row index after the data read is described in the previous article is 0,1,2,3 ... this sequence number. The column index is equivalent to the field name (that is, the first row of data), where re-indexing means that the default index can be re-modified to look the way you want.
1.1 Series
For example: data=series ([4,5,6],index=[' A ', ' B ', ' C '), row index is a,b,c.
We use Data.reindex ([' A ', ' C ', ' d ', ' e ']) to modify the index and then output:
It can be understood that we set the index with Reindex, according to the index to the original data match the corresponding value, no match is Nan.
1.2 DataFrame
(1) row index modification: Dataframe row index Same series
(2) Column index modification: Carnaby references Reindex (columns=[' M1 ', ' m2 ', ' m3 '), and uses the parameter columns to specify the modification of the column index. Modifying a logical similar row index is equivalent to using a new column index to match the original data, not matching the set Nan
Cases:
(3) Simultaneous modification of row and column indexes is possible with
2. Discard columns on the specified axis (popular parlance is to delete rows or columns):d ROP
Select by index to delete which row or column
data.drop(['a','c']) 相当于delete table a where xid='a' or xid='c'
data.drop('m1',axis=1)相当于delete table a where yid='m1'
3. Select and filter (in layman's terms filter queries by criteria in SQL)
Because there are row and column indexes in Python, it is more convenient to do the data filtering
3.1 Series
(1) Select by row index as
Obj[' B ' is equivalent select * from tb where xid='b'obj['b','a','c']
select * from tb where xid in ('a','b','c')
, and the results are shown in the order of B, A, C, which is the difference from SQL Obj[0:1] and obj[' a ': ' B ') as follows:
#前者是不包含末端, the latter is contained in the end
(2) Filtering by the size of the value obj[obj>-0.6] is equivalent to finding a record with a value greater than 0.6 in the obj data to show
3.2 DataFrame
(1) Select single line with IX or xs:
For example, the row record that filters index B is in the following three ways
(2) Select multiple lines:
How to filter two rows of records indexed to a, b
#以上不能直接写成data [[' A ', ' B ']]
Data[0:2] represents the record from the first row to the second row. The first line defaults from 0, and does not contain the end of 2.
(3) Select single column
Filter all row record data for M1 columns
(4) Select multiple columns
Filter m1,m3 two columns, all rows of recorded data
Ix[:,[' M1 ', ' m2 ']] before: Indicates that all rows are filtered in.
(5) Filter rows or columns based on the size criteria of the values
For example, filtering out all records with a column value greater than 4 is equivalent to the SELECT * from TB where column name >4
(6) If you filter all records with a column value greater than 4, and you only need to show some of the columns
Rows are filtered by criteria, and columns use [0,2] to filter data for the first and third columns