Using Python for data analysis (1) brief introduction, python Data Analysis

Source: Internet
Author: User
Tags mathematical functions

Using Python for data analysis (1) brief introduction, python Data Analysis
I. Basic data processing content
Data AnalysisIt refers to the process of controlling, processing, organizing, and analyzing data. Here, "data" refers to structured data, such as records, multi-dimensional arrays, data in Excel, data in relational databases, and data tables.

Ii. Talk about the Python language
Python is one of the most popular dynamic programming languages (including Perl and Ruby ). In recent years, it has become very popular to build websites using Python, such as the Popular Python Web framework Django. Python is known as a scripting language because it can be used to write short and rough small programs, that is, scripts. However, it seems that Python cannot build rigorous software. In fact, after several years of continuous improvement, Python not only has powerful data processing functions, but also can be used to build a production system.
However, because Python is an interpreted language, most Python code is much slower than compiled code (such as C ++ and Java. Therefore, in applications that require very low latency, it is worthwhile to use a lower-level and low-productivity language, C ++, to maximize performance. For highly concurrent and multi-threaded applications, Python is not an ideal programming language, because Python has something called GIL (Global interpreter lock, this is a mechanism to prevent the interpreter from executing multiple Python bytecode commands at the same time. This does not mean that Python cannot execute real multi-threaded parallel code, but that Code cannot be executed in a single Python process.

Iii. Python libraries related to data analysis
NumPyNumPy is the basic package for Python scientific computing. It provides:

  • Fast and efficient multi-dimensional array object ndarray;
  • Directly performs mathematical operations on arrays and functions that perform element-level computing on arrays;
  • Linear Algebra, random number generation;
  • Integrate C, C ++, and Fortran code into Python tools.
It is designed for strict digital processing. It is mostly used by many large financial companies, as well as core scientific computing organizations such as Lawrence Livermore. NASA uses it to process tasks originally used in C ++, Fortran, or Matlab.
PandasPandas provides a quick and convenient way to process a large number of structured data structures and functions.
MatplotlibMatplotlib is the most popular Python library for drawing data charts.
IPythonIPython is an enhanced Python Shell that is an integral part of the Python scientific computing standard toolset. It aims to improve the speed of writing, testing, and debugging Python code. It is mainly used for Interactive Data Processing and visualization of data using matplotlib.

SciPy is a collection of packages that specifically address various standard problem domains in scientific computing. It mainly includes the following packages:

  • Scipy. integrate: Numerical Integration routine and Differential Equation Solver;
  • Scipy. linalg: extends linear algebra routines and Matrix Factorization functions provided by numpy. linalg;
  • Scipy. optimize: function optimizer and root search algorithm;
  • Scipy. signal: signal processing tool;
  • Scipy. sparse: sparse matrix and sparse linear system solver;
  • Scipy. special: SPECFUN (A Fortran library that implements many common mathematical functions.
  • Scipy. stats: Standard continuous and discrete probability distribution, various statistical test methods, and better description of statistical methods;
  • Scipy. weave: Uses inline C ++ code to accelerate array computing.

Iv. Environment installation and configuration
The installation procedure of Mac OS X is as follows:
Check whether the installation is successful: Start IPython, import pandas, and enter plot (arange (100). If a drawing box containing a straight line is displayed, the installation is successful. Start Terminal: the next article is: Using Python for data analysis (2) Try to process a JSON data and generate a bar chart. If you are interested, please follow this blog, you are also welcome to add comments for discussion.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.