[Data analysis tool] Pandas function introduction (I), data analysis pandas

Source: Internet
Author: User
Tags relational database table

[Data analysis tool] Pandas function introduction (I), data analysis pandas

  • If you are using Pandas (Python Data Analysis Library), the following will certainly help you.
First, we will introduce some simple concepts.
  • DataFrame: row and column data, similar to sheet in Excel or a relational database table
  • Series: Single Column data
  • Axis: 0: Row, 1: Column
  • Shape: the number of rows and columns in DataFrame)
1. The loading CSVRead_csv method has many parameters, which can be effectively used to reduce data preprocessing. No one is willing to clean the data, so we will do some simple data processing when loading the data.
  • Direct Loading
    • No parameter Loading


    • Select a specific column to load


    • Time conversion Loading


  • Batch Loading
Sometimes we may need to load a large csv file, which may lead to memory explosion. At this time, we need to load data in batches for analysis and processing.


2. Browse DataFrame data
  • Df. head (n): The first n rows of data. The default value is 5.
  • Df. tail (n): n rows at the end of the browsing data. The default value is 5.
  • Df. sample (n): Randomly browses n rows of data. The default value is 5 rows.
  • Df. shape: the number of rows and columns of the tuple type)
  • Df. describe (): Calculate the evaluation data Trend
  • Df.info (): memory and Data Type
3. It is easy to add columns to DataFrame in DataFrame. The following describes several methods.
  • Simple Method
Directly add new columns and assign values

Df ['new _ column'] = 1

  • Calculation Method
Df ['temp _ diff '] = df ['temp']-df ['temp']
  • Condition Method
We can simply judge the human comfort level based on the wind speed. The temperature that is more comfortable is 0.3 meters/second.


  • Loop Mode
We will convert season to the specific season name 4. selecting a specified cell is similar to selecting an Excel cell. Pandas provides this function, which is easy to operate, but I do not understand it as easy as it looks. Pandas provides three methods for similar operations: loc, iloc, ix, and ix, which are not officially recommended.
  • Loc select loc Based on the tag
Df. loc [row index start position: Row index end position, [column name array]
  • Iloc selected based on Index
Df. iloc [row index start position: Row index end position, column index start position: column index end position]
  • Select row data
  • Df. loc [[row Index Array], df. iloc [[row Index Array]



  • Index start position: Closed Interval
  • Index end location: Open interval
  • When loc and iloc select the entire column of data, it looks the same as df [column name array], but in fact the former returns DataFrame, and the latter returns Series



Zhihu: Pandas function introduction (1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.