Chapter I preparation 1.3 Important Python database numpy: is the basic package for Python scientific computing, and most of this book is based on NumPy and the library features that are built on it:
-Fast and efficient multidimensional array object Ndarray.
-Functions for performing element-level calculations on an array and for performing mathematical operations directly on an array of groups
-Tools for reading and writing array-based datasets on hard disks
-Linear algebra operations, Fourier transforms, and generation of random numbers
-Mature c API for Python plug-ins and native C C + + FORTRAN code access to NUMPY data structures and calculation tools
Pandas: Provides a large number of data structures and functions for fast and easy processing of structured data. This book uses the most Pandas object when Dataframe, it is a column-oriented (column-oriented) of a two-dimensional table structure, the other is a series, a one-dimensional tabbed array object, Pandas combines numpy high-performance array computing capabilities with flexible data processing capabilities for spreadsheets and relational databases such as SQL. It provides sophisticated indexing capabilities to make it easier to reinvent, slice, and switch, aggregate, and select subsets of data, as data manipulation, preparation, and cleansing are the most important skills in data analysis. Pandas is the focus of this book.
-Function: A tool for financial and business analysis
-A tabbed data structure that supports automatic or clear data alignment, which prevents errors caused by different data due to data misalignment and processing from different indexes.
-Integrated Time series function
-same data structure for processing time series data and non-time series data
-Save the arithmetic operation and compression of the meta-data
-Flexible handling of missing data
-Merging relational operations with other popular databases, such as SQL-based databases
Matplotlib: Is the most popular visual Python library for charting and other two-dimensional data. Ideal for creating charts for use in publications.
-SCIPY: is a set of packages that specialize in solving various standard problem domains in scientific computing, including the following packages:
-scipy.integrate: Numerical integral routines and differential equation solvers.
-scipy.linalg: Extends the linear algebra routines and matrix decomposition functions provided by the NUMPY.LINALG.
-scipy.optimize: Function optimizer (minimized) and Root lookup algorithm
-scipy.signal: Signal Processing tools
-scipy.sparse: Sparse matrix and sparse linear system solver.
-scipy.stats: standard continuous and discrete probability distributions (such as density distributions, samplers, continuous distribution functions, etc.), various statistical tests, and better descriptive statistical methods
-numpy and scipy combine to form a fairly complete and mature computing platform that can handle the problems of traditional scientific computing.
Scikit-learn: is a python-generic machine learning toolkit. Sub-modules include:
-Classification: SVM, neighbor, Random forest, logistic regression, etc.
-Regression: Lasso, ling return, etc.
-Clustering: K-means, spectral clustering, etc.
-dimensionality reduction: PCA, feature Selection, matrix decomposition, etc.
-Selection: Grid Search, cross-validation, measurement.
-pretreatment: Feature extraction, standardization.
Stats models: is a statistical analysis package that contains classical statistics and econometrics algorithms, with the following sub-modules:
-Regression model: linear regression, generalized linear model, robust linear model, linear mixed effect model, and so on.
-Variance Analysis Anova
-Time series Analysis: Ar Arma arima var and other models
-Nonparametric Method: Kernel Density estimation, nuclear regression.
-Visualization of statistical model results.
The second chapter is the basic of Python grammar, ipython,jupyter introspection: Use before and after variables? , you can display information about the object:
b = [1,2,3]b?
-use?? will show the source code of the function
-? Another use is to search for Ipython namespaces. A combination of characters and wildcards can match all names.
%run command: Can be used to run all Python programs, assuming that there is a Python file: shili.py, you can run as follows:
%run shili.py
-This script runs in an empty namespace, so the result is the same as normal run Python script.py, where all defined variables (import, function, and global variables, unless an error is made) can be accessed later in the Ipython command.
* Note: If you want a script to access variables already defined in Ipython, you can use%run-i, in Jupyter, you can also use%load, which pours the script into a code lattice:
-Interrupt in-run code: Press Ctrl-c
To execute a program from the Clipboard:
%paste#%paste可以直接运行剪切板中的代码%cpaste#%cpaste有类似的功能,但是会给出一条提示
keyboard shortcut keys
Magic Command: Ipython special commands, called Magic commands, these commands can make ordinary tasks faster and easier to control the Ipython system, Magic Command SAD is to add a% prefix before the instruction. For example, you can use%timeit to measure the execution time of any Python statement, which can be thought of as a command line running in Ipython, and many magic commands have command-line options that can be passed? View. The Magic function default can not use the% prefix, but cannot have the same variable and function name, this feature is called Automatic magic, you can use%automatic to open and close. Some magic functions are like Python functions, and his results can be assigned directly to a variable.
Some of the commonly used Ipython Magic commands:
One of the reasons that integrated Matplotlib:ipython can be popular in analyzing extreme areas is that it integrates data visualization and other user interfaces very well, such as Matplotlib
-In the Ipython shell, the run%matplotlib can be set up to create multiple drawing windows without interfering with the console session:
%matplotlibUsing matplotlib backend: Qt4Agg
-Commands differ in Jupyter:
%matplotlib inlineimport matplotlib,pyplot as pltplt.plot(np.random.randn(50).cumsum())
Mutable objects and immutable objects: Most objects in Python are mutable, such as lists, dictionaries, numpy arrays, and user-defined types (classes) are mutable, others, such as strings and tuples, are immutable. Bytes and Unicode: Suppose you know the character encoding, you can convert it to Unicode, for example:
val = “dhfhfff”val
-You can encode this Unicode string as Utf-8 with encode:
val_utf8 = val.encode(‘utf8’)
-If you know the Unicode encoding of a Byte object, you can decode it using the Decode method:
val_utf8.decode(‘utf8’)
-Many of the files encountered at work are byte objects, and it is not advisable to blindly encode all the data into Unicode. Although not much, you can add a B to the front of the byte text:
a = b‘this is shuju’b‘this is shuju’decoded = a.decode(’utf8’)‘this is shuju’
Date and Time: The datetime module of the Python built-in function provides the datetime,date and the type of the. The datetime type combines date and time, which is most commonly used:
from date time import date time, date,timedt = datetime(20,11,10,29,20,30,21)print(day)print(minute)
-Depending on the datetime instance, you can extract the individual objects using date and time:
print(dt.data)#输出dt.date()为(20,11,10,29)print(dt.time)#输出dt.time为(20,30,21)#strftime方法可以将detetime格式化为字符串#strptime可以将字符串转换为datetime对象
-When you are clustered alive to group time series, it is sometimes useful to replace datetime fields, for example, with 0 to replace minutes and seconds:
dt.replace(minute=0,second=0)print(datatime.datatime)#输出datetime.datetime为(2011,10,29,20,0)
-Because DateTime is immutable, the above method produces a new object. The difference of two DateTime objects produces a datetime type, and the result (17,7179) indicates the encoding of 17 days and 7,179 seconds.
Chapter III data structure, functions, and file ordering of Python: sort ()
-You can sort a list in place by using sort (do not create new objects)
-sort has some optional parameters, such as sort (key=len) to sort strings by the length of the string
-sorted function ——— Learn later
Binary search and maintenance sorted list: Bisect ()
The-bisect module supports binary queries, and inserts values into the sorted list. Bisect.bisect can find the insertion value without inserting it, but instead returns the position subscript that can be inserted. The position to keep the sort. Bisect.insort is to insert a value at this location that can be inserted:
* Note: The Bisect module does not check whether the list has been sequenced, so the list without sorting is not error-free, but the result is not necessarily correct
Slices: List name (start subscript: End position Subscript) For example: List (1:5)
-slices can also be assigned by sequence:
list = 【1,2,3,4,5,6】list[3:4]= [6,3]#会将列表list中位置下标为3的元素换成列表[6,3]也就是说,序列赋值之后的list比原来的列表多了一个元素。
-The following table of the subscript and end positions of the start position can be omitted, at which point the default start and end ——— negative numbers indicate the slices from the back forward
-You can also set the step size of the slice: List "1:2:2" means to cut down the list of items labeled 1 to 2, slicing method to take one from the other. When the third parameter is-1, the original list can be reversed.
Data analysis with Python-1