A brief introduction to Python's Pandas library

Source: Internet
Author: User

Pandas is the data analysis processing library for Python
Import Pandas as PD

1. read CSV, TXT file

Foodinfo = Pd.read_csv ("pandas_study.csv""utf-8")

2, view the first n, after n information

Foodinfo.head (n) foodinfo.tail (n)

3, check the format of the data frame, is dataframe or Ndarray

Print (Type (foodinfo)) # results: <class ' pandas.core.frame.DataFrame ' >

4. See what columns are available

Foodinfo.columns

5, see a few rows of several columns

Foodinfo.shape

6. Print a line, a few rows of data

foodinfo.loc[0]foodinfo.loc[0:2]foodinfo.loc[[2, 5, ten]]    # Note that the inside is an array

7. Print a column, a few columns of data

foodinfo["dti"]foodinfo[["int_rate " " DTI "]]    # Note that the inside is an array # or:columns = ["int_rate""dti"] Foodinfo[columns]

8. Print data types for all columns

Foodinfo.dtypes

9, some related operations on the column

Col_columns == [] for in col_columns:if c.endswith (" s " ): New_columus.append (c)print(c) Foodinfo[new_columus]

10, Subtraction: Multiply each line by 100 (subtraction same)

foodinfo[["int_rate""dti"] * 100

11. Add a column

New_col = foodinfo["int_rate"] *foodinfo["new_col "] = New_col

12. Operations between columns

foodinfo["dti"] * foodinfo["int_rate"]

13. View the maximum, minimum, and average values for a column

foodinfo["int_rate"].max () foodinfo["int_rate"  ].min () foodinfo["int_rate"].mean ()

14. Sort by a field-ascending

# InPlace whether to create a new dataframe,true does not require foodinfo.sort_values ("int_rate_one", inplace = True) # Sort by a field-descending foodinfo.sort_values ("int_rate_one", inplace = True, ascending = False)

15. View some properties of the data frame: maximum, minimum, mean, four-digit, etc.

Foodinfo.describe ()

16. Null Value related Operations

Pin = foodinfo["pin"#  See all empty values #  Find all empty lines len (pin_isnull_list)        # number of NULL values

17, missing value related operations

# The simple approach is to filter out null values books = foodinfo["life_cycle_books"== foodinfo["life_cycle_books"][book_isnull= = SUM (book_list_isnull)/Len ( Book_list_isnull)    #  calculate Average

18, according to the criteria to print a column of data

foodinfo[foodinfo["life_cycle_books"] = = 1]

19. Pivot Table

Import NumPy as NP # Index: The column to pivot # values: The relationship columns to compare # Aggfunc: Specific relationship, default value: Np.meandata_foodinfo = foodinfo.pivot_table (index = ["life_cycle_books  "potential_value_books" "risk_level ", Aggfunc = np.mean)print(data_foodinfo)

20. Delete missing values

# All lines na_foodinfo = Foodinfo.dropna (axis = 1)#  You can specify the column na_foodinfo = Foodinfo.dropna ( Axis = 0, subset = ["life_cycle_books" "potential_value_books" ])

21, free to take data such as: Take 80 rows life_cycle_books column

" Life_cycle_books "]

22. Re-rank Index

Foodinfo.reset_index (drop = True)

23. Custom Function: Returns the number of empty values

def count_null_columns (column):     = pd.isnull (column)    = Column[column_null]    = len (list_null)    return count_nullfoodinfo.apply (count_null_columns)

24. Series

# pandas three types of data structures # Series # DataFrame # Panel  from Import Series

25. Series shows a column of data

Series_name = taitan["name"]series_name.values

26. Positioning a row of a column

Series_name = taitan["name"= taitan["Age"  = Series (series_age.values, index = series_name) series_custom[["Ahlin, Mrs Johan (Johanna Persdotter Larsson)""asplund, Mrs Carl Oscar (Selma Augusta Emilia Johansson) c14> "]#  Description: series_custom[" "] by Column series_custom[[" "]] by row

27, take 5-10 rows of data, and the same as above:

SERIES_CUSTOM[5:10]

28. Index Transformation

Old_index == = Series_custom.reindex (sort_index)print(new_index)

29. A series of functions sorted by index and value

SC1 = series_custom.sort_index ()print= series_custom.sort_values ()Print (SC2)

30. Series Filter

Series_custom > 0.5> 0.5> 0.5) & (Series_custom < 0.9)]#  Note: &, | They're all single symbols .

31, DataFrame

# Series is a row of data, Dataframe is multiple rows of data # DataFrame can be seen as df = pd.read_csv ("titanic_train.csv") consisting of multiple Series

32. Index transformation of Dataframe

# whether the drop creates a new Df,true false Yes (indicates that a column of name is also retained, otherwise it cannot be evaluated)Df_name = Df.set_index ("name" , drop = False)

33. Dataframe view data of a certain type

types ="float64"].indexdf_name[float_columns]

34, Dataframe to find the variance

FLOAT_DF = df_name[float_columns]float_df.apply (Lambda x:np.std (x))

A brief introduction to Python's Pandas library

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.