Pandas is the data analysis processing library for Python
Import Pandas as PD
1. read CSV, TXT file
Foodinfo = Pd.read_csv ("pandas_study.csv""utf-8")
2, view the first n, after n information
Foodinfo.head (n) foodinfo.tail (n)
3, check the format of the data frame, is dataframe or Ndarray
Print (Type (foodinfo)) # results: <class ' pandas.core.frame.DataFrame ' >
4. See what columns are available
Foodinfo.columns
5, see a few rows of several columns
Foodinfo.shape
6. Print a line, a few rows of data
foodinfo.loc[0]foodinfo.loc[0:2]foodinfo.loc[[2, 5, ten]] # Note that the inside is an array
7. Print a column, a few columns of data
foodinfo["dti"]foodinfo[["int_rate " " DTI "]] # Note that the inside is an array # or:columns = ["int_rate""dti"] Foodinfo[columns]
8. Print data types for all columns
Foodinfo.dtypes
9, some related operations on the column
Col_columns == [] for in col_columns:if c.endswith (" s " ): New_columus.append (c)print(c) Foodinfo[new_columus]
10, Subtraction: Multiply each line by 100 (subtraction same)
foodinfo[["int_rate""dti"] * 100
11. Add a column
New_col = foodinfo["int_rate"] *foodinfo["new_col "] = New_col
12. Operations between columns
foodinfo["dti"] * foodinfo["int_rate"]
13. View the maximum, minimum, and average values for a column
foodinfo["int_rate"].max () foodinfo["int_rate" ].min () foodinfo["int_rate"].mean ()
14. Sort by a field-ascending
# InPlace whether to create a new dataframe,true does not require foodinfo.sort_values ("int_rate_one", inplace = True) # Sort by a field-descending foodinfo.sort_values ("int_rate_one", inplace = True, ascending = False)
15. View some properties of the data frame: maximum, minimum, mean, four-digit, etc.
Foodinfo.describe ()
16. Null Value related Operations
Pin = foodinfo["pin"# See all empty values # Find all empty lines len (pin_isnull_list) # number of NULL values
17, missing value related operations
# The simple approach is to filter out null values books = foodinfo["life_cycle_books"== foodinfo["life_cycle_books"][book_isnull= = SUM (book_list_isnull)/Len ( Book_list_isnull) # calculate Average
18, according to the criteria to print a column of data
foodinfo[foodinfo["life_cycle_books"] = = 1]
19. Pivot Table
Import NumPy as NP # Index: The column to pivot # values: The relationship columns to compare # Aggfunc: Specific relationship, default value: Np.meandata_foodinfo = foodinfo.pivot_table (index = ["life_cycle_books "potential_value_books" "risk_level ", Aggfunc = np.mean)print(data_foodinfo)
20. Delete missing values
# All lines na_foodinfo = Foodinfo.dropna (axis = 1)# You can specify the column na_foodinfo = Foodinfo.dropna ( Axis = 0, subset = ["life_cycle_books" "potential_value_books" ])
21, free to take data such as: Take 80 rows life_cycle_books column
" Life_cycle_books "]
22. Re-rank Index
Foodinfo.reset_index (drop = True)
23. Custom Function: Returns the number of empty values
def count_null_columns (column): = pd.isnull (column) = Column[column_null] = len (list_null) return count_nullfoodinfo.apply (count_null_columns)
24. Series
# pandas three types of data structures # Series # DataFrame # Panel from Import Series
25. Series shows a column of data
Series_name = taitan["name"]series_name.values
26. Positioning a row of a column
Series_name = taitan["name"= taitan["Age" = Series (series_age.values, index = series_name) series_custom[["Ahlin, Mrs Johan (Johanna Persdotter Larsson)""asplund, Mrs Carl Oscar (Selma Augusta Emilia Johansson) c14> "]# Description: series_custom[" "] by Column series_custom[[" "]] by row
27, take 5-10 rows of data, and the same as above:
SERIES_CUSTOM[5:10]
28. Index Transformation
Old_index == = Series_custom.reindex (sort_index)print(new_index)
29. A series of functions sorted by index and value
SC1 = series_custom.sort_index ()print= series_custom.sort_values ()Print (SC2)
30. Series Filter
Series_custom > 0.5> 0.5> 0.5) & (Series_custom < 0.9)]# Note: &, | They're all single symbols .
31, DataFrame
# Series is a row of data, Dataframe is multiple rows of data # DataFrame can be seen as df = pd.read_csv ("titanic_train.csv") consisting of multiple Series
32. Index transformation of Dataframe
# whether the drop creates a new Df,true false Yes (indicates that a column of name is also retained, otherwise it cannot be evaluated)Df_name = Df.set_index ("name" , drop = False)
33. Dataframe view data of a certain type
types ="float64"].indexdf_name[float_columns]
34, Dataframe to find the variance
FLOAT_DF = df_name[float_columns]float_df.apply (Lambda x:np.std (x))
A brief introduction to Python's Pandas library