Getting started with Python for data analysis--pandas
Based on the NumPy established
from pandas importSeries,DataFrame,import pandas as pd
One or two kinds of data structure 1. Series
A python-like dictionary with indexes and values
Create a series#不指定索引,默认创建0-NIn [54]: obj
Original: Chapter 7
# usual opening
%matplotlib inline
import pandas as PD
import matplotlib.pyplot as Plt
import NumPy as NP
# make diagram Table bigger and prettier
pd.set_option (' Display.mpl_style ', ' Default ')
plt.rcparams[' figure.figsize '] = (5)
plt.rcparams[' font.family ' = ' sans-serif '
# need to show a lot of columns in Pandas 0.12
# in
you install Anaconda: Anaconda Scientific Python Distribution.
Many scientific computing libraries have been integrated (including not only numpy, but also sklearn and pandas ). Python2.7 python3.4. Use Anaconda. This is the method I recorded previously to install numpy: A compilation error occurs when Python 3 installs a third-party library.
Download the zipp
environment configuration can be found in the example in my next blog post:http://blog.csdn.net/tao_627/article/details/44007041================== The following content can be ignored =============================The above installation steps can be basically passed, but I have to pass the following test, but also specifically installed the following several modules, Python-nose module is a unit test module, must be installed.sudo apt-get install Ipython ipython-notebook python-
NumPy is a Python library for scientific computing and is often used in fields such as data mining or machine learning or scientific statistics .In the actual business, in order to play NumPy's high performance, when compiling numpy, it relies on some specially optimized third-party scientific computing libraries. For first-time newcomers to numpy, it is often no
have to build one from the source under Windows is somewhat difficult. Suggested main installation Anaconda:anaconda scientific Python distribution
Many scientific computing libraries have been integrated (including NumPy, Sklearn, pandas, etc.). python2.7 python3.4 all have. Please use Anaconda. This is the way I previously recorded the installation of NumPy:
Although NumPy users are rarely interested in the span information of arrays, they are an important factor in building a non-replicated array view. The span can even be negative, which makes the array move back in memory, such as in the slice obj[::-1] or obj[:,::-1].Advanced Array OperationsIn addition to the fancy index, slice, Boolean conditional take subset and other operations, the array has many ways to operate. While the advanced functions in
also often use Ipython, Jupyter notebook,hdf5, pandas, six, wheel, etc., if each to install, quite troublesome. The installation of NumPy and scipy alone will kill you. Fortunately, now out of a call anaconda, integrated with basically all the common library, directly installed a anaconda on it, do not use a one to toss.: Https://www.continuum.io/downloadsTwo different versions of Python, all with differen
The difference between resample and GroupBy:Resample: Resampling within a given time unitGroupBy: Statistics on a given data entryFunction Prototypes:Dataframe.resample (rule, How=none, axis=0, Fill_method=none, Closed=none, Label=none, convention= ' start ', Kind=None, Loffset=none, Limit=none, base=0)Where the parameters are deprecated.Let's start practicing.Import NumPy as NP import Pandas as PDStart by
Series: A one-dimensional array, similar to a one-dimensional array in NumPy. The two are similar to the Python basic data Structure list, the difference is that the elements in the list can be different data types, and the array and series only allow the same data types to be stored, so that more efficient use of memory, improve the efficiency of operations. Time-series: A Series that is indexed in time. DataFrame: A two-dimensional tabular data stru
We choose the DataFrame from these three levels: rows, regions, cells.The corresponding method of use is as follows:A. Row, column--df[]Two. Area--df.loc[], df.iloc[], df.ix[]Three. Cell--df.at[], df.iat[]Here's how to start the exercise:Import NumPy as NP Import = PD. DataFrame (Np.random.randn (6,4), index=list ('abcdef'), columns=list ('ABCD '))1. DF[]:One-dimensionalRow dimension:Integer slices, label slices, Levi degree:Label index, label list,
Explore the students ' consumption of wineData See GitHubStep 1-Import the necessary librariesImport Pandas as PD Import NumPy as NPStep 2-Data set" ./data/student-mat.csv " Step 3 Name The data studentStudent = Pd.read_csv (PATH4) Student.head ()Output:Step 4 Slice the data from ' school ' to ' Guardian '" School ":"Guardian"]stud_alcoh.head ()Output:Step 5 Create a lambda function that captures a stringL
First, Generate data table1, first import Pandas Library, general will use to NumPy library, so we first import backup:import pandas as pd2. Import csv or xlsx files:df = pd.DataFrame(pd.read_csv(‘name.csv‘,header=1))df = pd.DataFrame(pd.read_excel(‘name.xlsx‘))3. Create a data table with pandas:df = pd.DataFrame({"id":[1001,1002,1003,1004,1005,1006], "date":pd.
Import NumPy as NP import pandas as PD from pandas import series,dataframe ' If copied code, error syntaxerror:invalid character
In identifier, there is a space for the Chinese symbol in the copied code. "DATA=PD." Dataframe (Np.arange (6). Reshape ((3,2)), INDEX=PD. Index ([' A ', ' B ', ' C '],name= ' state '), COLUMNS=PD. Index ([' I ', ' II '],name= ')] Print
Forgive me for not having finished writing this article is a record of my own learning process, perfect pandas learning knowledge, the lack of existing online information and the use of Python data analysis This book part of the knowledge of the outdated,I had to write this article with a record of the situation. Most if the follow-up work is determined to have time to complete the study of Pandas Library,
PS: This blog digest from the Chinese University of the curriculum "Python data analysis and display," recommended just the beginner to learn, this is a very good introductory video.NumPy is a scientific computing library, is a powerful n-dimensional array object Ndarray, is a broadcast function function. Its integration of C/c++.fortran code tools, but also scipy, pandas and other basic. Ndim: Dimensions. Shape: Scale of each dimension (2,5). Size: N
This repo is used to record some python tips, books, learning links, and so on, welcome to star
GitHub Address
The scientific computing package in Python numpy is a great extension tool, numpy the most common is the operation of the array Ndarray, part of the operation and Python's built-in list (its stitching is append and extend) overlap, However, to be aware of the use of the way, in addition, for the
How to quickly get started using Python for financial data analysisIntroduction:This series of posts "quantitative small classroom", through practical cases to teach beginners to use Python, pandas for financial data processing, hope to be helpful to the big home." must -read article": "10 400 times-fold strategy sharing-video-line-guided code""All series article summary": http://bbs.pinggu.org/thread-3950124-1-1.htmlThe first step: curiosityDon't lea
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.