Getting started with Python for data analysis--pandas
Based on the NumPy established
from pandas importSeries,DataFrame,import pandas as pd
One or two kinds of data structure 1. Series
A python-like dictionary with indexes and values
Create a series#不指定索引,默认创建0-NIn [54]: obj = Series([1,2,3,4,5])In [55]: objOut[55]:0 11 22 33 44 5dtype: int64#指定索引In
of the Python program, such as:
CSV, JSON, line-bound JSON, and remote versions of all of the above
HDF5 (standard format and pandas format are available), Bcolz, SAS, SQL database (SQLAlchemy supported), Mongo
An into project can efficiently migrate data between any two formats in the data format, using a pair-switched network (intuitive explanation at the bottom of the article).
How to use it
The into function has two parameters: source and target. It converts data from source to target.
10-lesson from Dataframe to Excel from Excel to Dataframe from Dataframe to JSON, from JSON to Dataframe
Import pandas as PD
import sys
Print (' Python version ' + sys.version)
print (' Pandas version ' + pd.__version__)
Pyt
# Coding:utf-8__author__ = ' Weekyin 'Import NumPy as NPImport Pandas as PDDatas = Pd.date_range (' 20140729 ', periods=6)# first create a time index, the so-called index is the ID of each row of data, you can identify the unique value of each rowPrint Datas# for a quick start, let's look at how to create a 6x4 data: The RANDN function creates a random number, the parameter represents the number of rows and columns, and dates is the index column created in the previous stepDF =
Importprint_functionImportPandas as PD fromSklearn.clusterImportKmeans#Import K-mean clustering algorithmdatafile='.. /data/data.xls' #data files for clusteringProcessedfile ='.. /tmp/data_processed.xls' #file after data processingTypelabel ={u'syndrome type coefficient of liver-qi stagnation':'A', u'coefficient of accumulation syndrome of heat toxicity':'B', u'coefficient of offset syndrome of flush-type':'C', u'The coefficient of Qi and blood defic
import NumPy as NPImport Pandas as PD1 #string Common methods-strip2s = PD. Series (['Jack','Jill','Jease','Feank'])3DF = PD. DataFrame (Np.random.randn (3,2), columns=['Column A','Column B'],index=range (3))4 Print(s)5 Print(df.columns)6 7 Print('----')8 Print(S.str.lstrip (). Values)#Remove the left space9 Print(S.str.rstrip (). Values)#Remove the space on the
median.Df. var () returns the varianceDf. std () evaluate standard deviationDf. mad () calculates the mean absolute spread based on the average value.Df. cumsum () calculates the sumSr1.corr (sr2) returns the correlation coefficientDf. cov () returns the covariance matrix.Df1.0000with (df2) Correlation Coefficient
Pd. cut (array1, bins) for interval distribution of One-Dimensional DataPd. qcut (array1, 4) divides intervals by specified quantiles, an
Pandas mainly has 4 of the time-related types. Timestamp, Period, Datetimeindex,periodindex.ImportPandas as PDImportNumPy as NP##TimestampPd. Timestamp ('9/1/2016 10:05am')#output:timestamp (' 2016-09-01 10:05:00 ')##PeriodPd. Period ('1/2016')#output:period (' 2016-01 ', ' M ')Pd. Period ('3/5/2016')#output:period (' 2016-03-05 ', ' D ')##DatetimeindexT1 = PD. Series (List ('ABC'), [
Data conversionDelete duplicate elements The duplicated () function of the Dataframe object can be used to detect duplicate rows and return a series object with the Boolean type. Each element pairsshould be a row, if the row repeats with other rows (that is, the row is not the first occurrence), the element is true, and if it is not repeated with the preceding, the metaThe vegetarian is false.A Series object that returns an element as a Boolean is of
Original English: 11-lesson
Reads data from multiple Excel files and merges the data together in a dataframe.
Import pandas as PD
import matplotlib
import OS
import sys
%matplotlib inline
Print (' Python version ' + sys.version)
print (' Pandas version ' + pd.__version__)
print (' matplotlib version ' + Mat PLOTLIB.__VERSION__)
Python version 3.6.1 | Packaged b
should be used like this:
First, you need to put the code in a separate configuration file, such as config. py.
Then, import the configuration file where you need it.
1 from config import con_analyze 2 3 4 class AnalyzeData: 5 def _ init _ (self): 6 # initialization here, You can include a parameter: database, the default is myanalyze 7 self. conn = con_analyze () 8 # self. conn2 = con_analyze ("myanalyze_2") 9 10 def get_data (self, SQL): 11 # Save the SQL query result to df 12 df = self. conn
In addition to the series, dataframe these two commonly used data structures in the Pandas library, there is also a panel data structure that can typically be created with a dictionary of Dataframe objects or a three-dimensional array to create a Panel object. 1 # 2 3 created on Sat Mar 18:01:05 4 5 @author: Jeremy 6 7 import NumPy as NP 8 from Pandas import Series,
The main data were three kinds of preprocessing:
1. Interval Scaling
reading data, data processing, storing data
Import pandas as PD
import NumPy as NP from
Sklearn import preprocessing
import matplotlib.pyplot as PLT
p lt.rcparams[' Font.sans-serif '] =[' Simhei '] #用来正常显示中文标签
plt.rcparams[' Axes.unicode_minus '] =false #用来正常显示负号
filename = ' Hits persecond_t20m_130.csv '
data_f = pd.read_csv (filename) #二维
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.