Fourth NumPy basics: arrays and vector calculations
To be honest, the main purpose of using NumPy is to apply vectorization operations. NumPy does not have much advanced data analysis capabilities, and understanding numpy and array-oriented computations can help to understand the pandas behind it. According to the textbook, the author's concern is mainly focused on:
- Fast vectorization operations for data grooming and cleanup, subset construction and filtering, transformation, and more
- Commonly used array solutions, such as sorting, uniqueness, set operations, etc.
- Efficient descriptive statistics and data aggregation/digest operations
- Data alignment and relational data operations for merge/join operations on heterogeneous datasets
- The conditional logic is expressed as an array expression (rather than a loop with a if-elif-else branch)
- Grouping operations of data (aggregation, transformations, function applications, etc.).
The author said, maybe pandas better, I feel obviously pandas more advanced, where the function is really convenient, the data frame is the best data structure. Just, the functions in NumPy are the basics and need to be familiar.
NumPy Ndarray: A multidimensional Array object
The Ndarray object is the most important object of numpy and is characterized by vectorization. Ndarray Each element must have the same data type, each array has two properties: shape and Dtype.
"Data analysis using Python" reading notes--fourth NumPy basics: arrays and Vector computing