first, the basic content of processing data
Data AnalysisRefers to the process of controlling, processing, collating and analyzing data. Here, "data" refers to structured data, such as: Records, multidimensional arrays, data in Excel, data in relational databases, data tables, and so on.
second, to speak of the language of Python
Python is now one of the most popular dynamic programming languages (and Perl, Ruby, and so on). In recent years, it has been very popular to build sites with Python, such as the popular Python Web framework Django. Python's languages are called scripting languages because they can write short, coarse, and small programs, scripts. But it seems that Python is not able to build rigorous software, in fact, after years of continuous improvement,Python not only has powerful data processing capabilities, but it can also be used to build production systems.。
But since Python is an interpreted language,Most Python code is much slower than compiled languages (such as C + + and Java)。 So in applications that require very little latency, in order to optimize performance to the greatest extent possible, using a lower-level, low-productivity language like C + + is worth it.Python is not an ideal programming language for highly concurrent, multi-threaded applications, because Python has a thing called the GIL (Global Interpreter Lock), which is a mechanism that prevents the interpreter from executing multiple Python bytecode instructions at the same time. This is not to say that Python cannot execute real multithreaded parallel code, except that the code cannot be executed in a single Python process.
Third, the Python library related to data analysis
NumPyNumPy is the basic package for the scientific calculation of Python, which provides:
- fast and efficient multidimensional array object Ndarray;
- the functions of performing mathematical operations and performing element-level computations on an array of directly logarithmic groups;
- linear algebra operation and random number generation;
- integrate C, C + +, Fortran code into Python tools, and more.
It is created for rigorous digital processing. Many of the major financial companies used, as well as the core scientific computing organizations such as: Lawrence Livermore,nasa use it to deal with some of the original use of C++,fortran or MATLAB and other tasks.
PandasPandas mainly provides a large number of data structures and functions that handle structured data quickly and easily.
matplotlibMatplotlib is the most popular Python library for plotting data graphs.
IPythonIPython is part of the Python Scientific Computing Standard Toolset and is an enhanced Python Shell designed to improve the speed of writing, testing, and debugging Python code. It is mainly used for interactive data processing and visual processing using Matplotlib.
SciPy
SciPy is a set of packages that specialize in solving various standard problem domains in scientific computing. The following packages are mainly included:
- Scipy.integrate: Numerical integral routines and differential equation solvers;
- SCIPY.LINALG: extended the linear algebra routines and matrix decomposition functions provided by NUMPY.LINALG;
- Scipy.optimize: function optimizer and root lookup algorithm;
- Scipy.signal: Signal processing tools;
- Scipy.sparse: Sparse matrix and sparse linear system solver;
- Scipy.special:SPECFUN (This is a Fortran library that implements many of the commonly used mathematical functions).
- Scipy.stats: standard continuous and discrete probability distributions, various statistical testing methods and better descriptive statistics;
- Scipy.weave: A tool for accelerating array calculations with inline C + + code.
iv. Environment Installation and configuration
Very simple, take the MAC OS X system installation Steps as an example:
- First you need to install Xcode in order to use the GCC C and C + + compilers
- Download and install Unthought Canopy (: https://store.enthought.com/downloads/)
unthought Canopy is a Python installation package for scientific computing and contains libraries such as NumPy, SciPy, Pandas, matplotlib, IPython, etc.
Detect if the installation was successful: Start IPython, import Pandas and enter plot (Arange (100)), and if a drawing box with a straight line pops up, the installation is successful. Open Terminal:
A drawing box that contains a straight line:
to write a complete series, the next essay is: Data analysis using Python (ii) try to process a JSON data and generate a bar chart, interested friends Welcome to follow this blog, but also welcome you to add comments to discuss.
Data analysis using Python (i) Brief introduction