Python Processes time Series data

Source: Internet
Author: User

Initial claims processing time series data with Python, hitting some pits. In this article to record, I hope that the latter can be less detours.

Background note: I use an existing CSV data sheet as raw material for processing.

Objective: To realize the visualization of time series and periodic visualization.

1, hit the first pit is, import to time data, the default is the data type of the string. Therefore, when visualizing, there is a situation in which there is no chronological order of drawing.

Therefore, you need to resolve the string to the data type of the time type.

Method 1: When reading the data, use Parse_dates=true to automatically parse the time data.

Method 2: Use the Parser.parse in the Dateuyil package to parse the time string:

 from Import  = Parse ('2018-09-02')print(" the parsed time format is: ", v1)

Method 3: Take advantage of Pandas's to_datetime processing time list

Import= ['2018/09/02','2018/09/03','  2018/09/04']print(Pd.to_datetime (datestrs))

2, the 2nd pit is the processing of numerical data, in the import pandas by default is the data type of object, it is necessary to cast the data type, but I have not been able to convert.

The bug that appears is: Valueerror:could not convert string to float
After a long day to find the reason: The data contains a space or "," so that the string can not be converted to int.
Workaround: Replace (', ', '). Replace (', ', ') the space in which it was replaced, delete the ","

3, as for the back of the drawing is very simple, the only thing worth talking about is the periodicity of the drawing.

I use the "Week" to draw, the periodic fixed. The implementation process to see the code is good.

4, also need to mention, read the file when you need to set encoding = 'gbk'. The default is Utf-8, but the system will error.

1 #!/usr/bin/env python2 #-*-coding:utf-8-*-3 #Author:leslie Dang4 5 ImportNumPy as NP6 ImportPandas as PD7 ImportMatplotlib.pyplot as Plt8 9 #01 importing data from a fileTenData1 = Pd.read_csv ('01series.csv', parse_dates=true,index_col=0,encoding ='GBK') One Print(data1) A #print (Type (data1.index)) - Print(data1.dtypes) -  the #02 Casting data Types - Print('***02 Cast Data type * * *') -  - #valueerror:could not convert string to float + #cause: It is likely that your data contains \ t, or a space, or "," - #Workaround: Replace (', '). Replace (', ', ') +  A  forIinchRange (data1['Sales'].count ()): atdata1['Sales '][i] = data1['Sales'][i].replace (' ',"'). Replace (',',"') -  -data1['Sales '] = data1['Sales '].astype (int) - Print(data1.dtypes) -  - #03 Drawing-Line chart in Print('***03 Drawing * * *') - #plt.plot (data1[' sales '],label = ' sales ') to #plt.show () +  - #04 drawing-Periodic analysis diagram the Print('***04 drawing-Periodic analysis chart * * *') *  $Data1 = Data1.set_index ('Week')Panax Notoginseng Print(data1) -  theCount = data1['Sales '].count () +Circle = COUNT//7 A Print(count,circle) the  forIinchRange (circle): +Plt.plot (data1['Sales'][7*i:7*i+7]) - plt.show () $  $ #thinking: How to quantify periodicity? What parameters can be used to express? How strong is the periodicity? 

This complements the data sources I used:

            Weekly    sales date                          2018-08-01  Wed  4,702,986 2018-08-02  Thu  5,034,151 2018-08-03  Fri  5,636,981 2018-08-04  Sat  6,377,764 2018-08-05  Sun  6,138,548 2018-08-06  Mon  5,335,273 2018-08-07  Tue  5,055,513 2018-08-08  Wed  5,159,413 2018-08-09  Thu  5,393,767 2018-08-10  Fri  5,920,339 2018-08-11  Sat  6,637,867 2018-08-12  Sun  6,292,839 2018-08-13  Mon  5,485,055 2018-08-14  Tue  5,274,536 2018-08-15  Wed  5,171,561 2018-08-16  Thu  5,269,780 2018-08-17  Fri  5,359,121 2018-08-18  Sat  6,353,952 2018-08-19  Sun  6,334,198 2018-08-20  Mon  5,577,552 2018-08-21  Tue  5,276,165 2018-08-22  Wed  5,403,919 2018-08-23  Thu  5,611,874 2018-08-24  Fri  6,073,795 2018-08-25  Sat  6,754,291 2018-08-26  Sun  6,333,426 2018-08-27  Mon  5,570,875 2018-08-28  Tue  5,327,305 2018-08-29  Wed  

Python Processes time Series data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.