Use of Numpy and Pandas in the python tutorial, numpypandas

Source: Internet
Author: User

Use of Numpy and Pandas in the python tutorial, numpypandas

Preface

This article mainly introduces you to Numpy and Pandas in python, and shares them for your reference. I will not talk about them here. Let's take a look at the detailed introduction.

What are them?

NumPy is an extended library of Python. Supports a large number of advanced dimension arrays and matrix operations, and provides a large number of mathematical libraries for Array Operations.

Pandas is a NumPy-based tool created to solve data analysis tasks. Pandas incorporates a large number of databases and some standard data models, providing the tools required to efficiently operate large datasets. Pandas provides a large number of functions and methods that allow us to process data quickly and conveniently.

List, Numpy, and Pandas

Numpy and List

Similarities:

  • You can use subscripts to access elements, such as a [0].
  • Slice access is allowed, for example, a []
  • You can use the for loop for traversal.

Differences:

  • In Numpy, each element type must be the same, while in List, multiple types can be mixed.
  • Numpy is more convenient to use and encapsulates many functions, such as mean, std, sum, min, and max.
  • Numpy can be a multi-dimensional array.
  • Numpy is implemented in C, and the operation speed is faster.

Pandas and Numpy

Similarities:

  • You can use subscript or slice to access the same access element.
  • You can use For loop Traversal
  • There are many convenient functions, such as mean, std, sum, min, max, etc.
  • Support vector operations
  • Implemented in C, faster

Difference: Pandas has Numpy methods that are not available, such as the describe function. The main difference is that Numpy is like a List of enhanced versions, while Pandas is like a collection of lists and dictionaries. Pandas has indexes.

Numpy usage

1. Basic operations

Import numpy as np # create Numpyp1 = np. array ([1, 2, 3]) print p1print p1.dtype
[1 2 3]int64
# Calculate the average value print p1.mean ()
2.0
# Evaluate the standard deviation print p1.std ()
0.816496580928
# Summation, maximization, and minimization print p1.sum () print p1.max () print p1.min ()
631
# Locate the maximum value in print p1.argmax ()
2

2. vector operations

p1 = np.array([1, 2, 3])p2 = np.array([2, 5, 7])
# Adding vectors, adding each element print p1 + p2
[ 3 7 10]
# Multiply the vector by one constant print p1 * 2
[2 4 6]
# Vector subtraction print p1-p2
[-1 -3 -4]
# Multiply the vectors. print p1 * p2 operations are performed between each element.
[ 2 10 21]
# Comparison between a vector and a constant print p1> 2
[False False True]

3. Index Array

First, take a look at the following figure to understand

Then, let's take a look at the code implementation.

a = np.array([1, 2, 3, 4, 5])print a
[1 2 3 4 5]
b = a > 2print b
[False False True True True]
print a[b]
[3 4 5]

In a [B], only the elements whose position B is True in a are retained.

4. In-situ and in-situ

Let's first look at a group of operations:

a = np.array([1, 2, 3, 4])b = aa += np.array([1, 1, 1, 1])print b
[2 3 4 5]
a = np.array([1, 2, 3, 4])b = aa = a + np.array([1, 1, 1, 1])print b
[1 2 3 4]

From the above results, we can see that + = changes the original array, but + does not. This is because:

  • + =: It is calculated in situ and does not create a new array. You can change the element in the original array.
  • +: It is a non-in-situ calculation. A new array is created without modifying the elements in the original array.

5. Numpy slice and List slice

l1 = [1, 2, 3, 5]l2 = l1[0:2]l2[0] = 5print l2print l1
[5, 2][1, 2, 3, 5]
p1 = np.array([1, 2, 3, 5])p2 = p1[0:2]p2[0] = 5print p1print p2
[5 2 3 5][5 2]

As we can see from the above, changing the elements in the slice in the List will not affect the original array; while Numpy will change the elements in the slice, and the original array will also change. This is because: Numpy's slicing program does not create a new array. When you modify the corresponding slice, it also changes the original array data. This mechanism allows Numpy to operate faster than native arrays, but you must pay attention to it during programming.

6. Operations on two-dimensional arrays

P1 = np. array ([[1, 2, 3], [7, 8, 9], [2, 4, 5]) # obtain the one-dimensional array print p1 [0]
[1 2 3]
# Obtain an element. Note that it can be p1 [0, 1] or p1 [0] [1] print p1 [0, 1] print p1 [0] [1]
22
# Summation is used to calculate the sum of all elements and print p1.sum ()
41[10 14 17]

However, when the axis parameter is set, when it is set to 0, it calculates the result of each column and returns a one-dimensional array. If it is set to 1, the result of each row is calculated, and then a one-dimensional array is returned. For two-dimensional arrays, many functions in Numpy can set the axis parameter.

# Obtain the result of each column print p1.sum (axis = 0)
[10 14 17]
# Obtain the result of each row print p1.sum (axis = 1)
[ 6 24 11]
# The mean function can also set axisprint p1.mean (axis = 0)
[ 3.33333333 4.66666667 5.66666667]

Use Pandas

Pandas has two structures: Series and DataFrame. Among them, Series has all the functions of Numpy, which can be considered as a simple one-dimensional array; while DataFrame is a two-dimensional data structure that combines multiple Series by column, each column is a Series.

Let's mainly sort out the functions not available in Numpy:

1. Simple basic use

import pandas as pdpd1 = pd.Series([1, 2, 3])print pd1
0 11 22 3dtype: int64
# Sum and standard deviation print pd1.sum () print pd1.std ()
61.0

2. Index

(1) index in Series

p1 = pd.Series( [1, 2, 3], index = ['a', 'b', 'c'])print p1
a 1b 2c 3dtype: int64
print p1['a']

(2) DataFrame Array

p1 = pd.DataFrame({ 'name': ['Jack', 'Lucy', 'Coke'], 'age': [18, 19, 21]})print p1
 age name0 18 Jack1 19 Lucy2 21 Coke
# Retrieve the name column print p1 ['name']
0 Jack1 Lucy2 CokeName: name, dtype: object
# Obtain the first print p1 ['name'] [0] of the name
Jack
# You cannot obtain the first line using p1 [0], but you can use ilocprint p1.iloc [0]
age 18name JackName: 0, dtype: object

Summary:

  • Obtain the index that uses p1 ['name'] for a column.
  • Obtain a row using p1.iloc [0]

3. apply

Apply can operate on the Elements in Pandas. When no corresponding method is used in the library, it can be encapsulated through apply.

def func(value): return value * 3pd1 = pd.Series([1, 2, 5])
print pd1.apply(func)
0  31  62 15dtype: int64

It can also be used on DataFrame:

pd2 = pd.DataFrame({ 'name': ['Jack', 'Lucy', 'Coke'], 'age': [18, 19, 21]})print pd2.apply(func)
 age   name0 54 JackJackJack1 57 LucyLucyLucy2 63 CokeCokeCoke

4. axis Parameters

Pandas differs from Numpy in setting axis:

  • When axis is set to 'columns ', the value of each row is calculated.
  • When axis is set to 'index', the value of each column is calculated.
pd2 = pd.DataFrame({ 'weight': [120, 130, 150], 'age': [18, 19, 21]})
0 1381 1492 171dtype: int64
# Calculate the value of each row print pd2.sum (axis = 'columns ')
0 1381 1492 171dtype: int64
# Calculate the value of each column print pd2.sum (axis = 'index ')
age  58weight 400dtype: int64

5. Grouping

Pd2 = pd. dataFrame ({'name': ['jack', 'Lucy ', 'coke', 'Pol', 'tude'], 'age': [18, 19, 21, 21, 19]}) # age group print pd2.groupby ('age '). groups
{18: Int64Index([0], dtype='int64'), 19: Int64Index([1, 4], dtype='int64'), 21: Int64Index([2, 3], dtype='int64')}

6. vector operations

Note that when the index array is added, the corresponding index is added.

pd1 = pd.Series( [1, 2, 3], index = ['a', 'b', 'c'])pd2 = pd.Series( [1, 2, 3], index = ['a', 'c', 'd'])
print pd1 + pd2
a 2.0b NaNc 5.0d NaNdtype: float64

The NAN value appears. If we expect NAN to not appear, what should we do? Use the add function and set the fill_value parameter.

print pd1.add(pd2, fill_value=0)
a 2.0b 2.0c 5.0d 3.0dtype: float64

Similarly, it can be applied to Pandas dataFrame, but you must note that columns and rows must correspond to each other.

Summary

This week, I took the basic analysis courses on the UDA campus, using Numpy and Pandas. Numpy has been used in Tensorflow in the past, but it is hard to understand. After this study, it is easy to understand.

Well, the above is all the content of this article. I hope the content of this article has some reference and learning value for everyone's learning or work. If you have any questions, please leave a message to us, thank you for your support.

Reference

Pandas User Guide (I) Basic Data Structure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.