R Language-Data preprocessing

Last Update:2016-11-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One, date time, string processing

Date

Day: Date class, Year and day

POSIXCT: Date time class, accurate to seconds, expressed in numbers

POSIXLT: DateTime class, accurate to seconds, expressed as a list

Sys.date (), date (), Difftime (), Isodate (), Isodatetime ()

#得到当前日期时间 (D1=sys.date ()) #日期 date (D3=sys.time ()) #时间 Date and Time (D2=date ()) #日期和时间 year of the specified format by format output Month Day minute minute "Fri 11:11:00 1999" Mydate=as. Date (' 2007-08-09 ') class (MyDate) #Datemode (mydate) #numeric # Date to String As.character (mydate) birday=c (' 01/05/1986 ', ' 08/1 1/1976 ') #dates =as.    Date (Birday, '%m/%d/%y ') #向量化运算, convert the vector dates#%d days (01~31) #%a abbreviation Week (Mon) #%a Week (Monday) #%m month (00~12) #%b  Abbreviated month (Jan) #%B Month (January) #%y year (s) #%y year ($) #%H when #%M min #%s seconds td=sys.date () format (td,format= '%B %d%Y%s ') format (td,format= '%a,%a ') format (Sys.time (), '%H%H%M%s%s ') #日期转换成数字as. Integer (Sys.date ()) #自1970年1月1号至今的天数 As.integer (AS. Date (' 1970-1-1 ')) #0as. Integer (AS. Date (' 1970-1-2 ')) #1sdate =as. Date (' 2004-10-01 ') edate=as. Date (' 2010-10-22 ') days=edate-sdatedays #时间类型相互减, the result shows the difference in the number of days Ws=difftime (Sys.date (), as. Date (' 1956-10-12 '), units= ' weeks ') #可以指定单位 # Date (d=isodate (2011,10,2)); The result of Class (d) #ISOdate is Posixctas.date ( Isodate (2011,10,2)) #将结果转Change to Dateisodate (2011,2,30) #不存在的日期 result for na# batch conversion to date Years=c (2010,2011,2012,2013,2014,2015) months=1days=c (15,20,21,19 , 30,3) as. Date (Isodate (years,months,days)) #提取日期时间的一部分p =as. Posixlt (Sys.date ()) P=as. Posixlt (Sys.time ()) sys.date () sys.time () p$year + 1900 #年份需要加1900p $mon + 1 #月份需要加1p $mdayp $hourp$minp$sec

String processing

NCHAR (), Length ()

Paste (), outer ()
SUBSTR (), Strsplit ()
Sub (), Gsub (), grep (), regexpr (), grepexpr ()

1 #字符串2X='hello\rwold\n'3 4 Cat (x) #woldo Hello encountered \ r cursor shifted to the head and then printed Wold covered before Hell became Woldo5 Print(x) #6 #字符串长度7 nchar(x) #字符串长度8Length (x) #1the number of elements in the vector9 Ten #字符串拼接 OneBoard=Paste'b',1:4, Sep='-') # "B-1"" B-2"" B-3"" B-4" A Board -  -Mm=Paste'mm',1:3, Sep='-') # "MM-1"" MM-2"" MM-3" the mm -  - outer(Board,mm,paste,sep=':') #向量的外积 -#[, 1]       [, 2]       [, 3]       +#[1,]"B-1: MM-1"" B-1: MM-2"" B-1: MM-3" -#[2,]"B-2: MM-1"" B-2: MM-2"" B-2: MM-3" +#[3,]"B-3: MM-1"" B-3: MM-2"" B-3: MM-3" A#[4,]"B-4: MM-1"" B-4: MM-2"" B-4: MM-3" at  -  - #拆分提取 - Board -SUBSTR (board,3,3) #子串 -Strsplit (board,'-', fixed=T) #拆分 in  - #修改 toSub'-','.', board,fixed=T) #修改指定字符 + Board -MM # "mm-1"" MM-2"" MM-3" theSub'm','P', mm) #替换第一个匹配项 "PM-1"" PM-2"" PM-3" *Gsub'm','P', mm) #替换全部匹配项 "pp-1"" pp-2"" pp-3" $ Panax Notoginseng  - #查找 theMm=C (MM,'MM4') # "MM-1"" MM-2"" MM-3"" mm4 " + mm Agrep'-', MM) #1 2 3Vector in 1,2, 3 contains'-' the  +REGEXPR ('-', mm) #匹配成功会返回位置信息, return if not found-1

Second, data preprocessing

Ensure data quality

Accuracy
Integrity
Consistency
Redundancy of
Timeliness

...

1, the extraction of effective data, business personnel need to cooperate (subjective), and related technical means to ensure

2, understand the data definition, unify the understanding of the data definition

...

Data integration: Consolidating multiple data sources
Data conversion:
Data cleansing: Exception data, missing data
Data reduction: refining, rows, columns

Third, data integration

Integration of data through the merge

1 #数据集成2#merge PYLR::Join(Package:: function)3(Customer=Data.frame (Id=C1:6), state=C (Rep ("Beijing",3), Rep ("Shanghai",3))))4(OL=Data.frame (Id=C1,4,6,7), Product=C'IPhone','Vixo','mi','Note2')))5 6 7Merge (Customer,ol, by=('Id'))  #Inner Join8Merge (Customer,ol, by=('Id'),all=T) # Full Join9Merge (Customer,ol, by=('Id'), All. x=T) # Left outer Joinleft link, data on the left isTenMerge (Customer,ol, by=('Id'), All. Y=T) # Right outer Joinright links, data on the right are One  A  -#Uniongo to the DF1 and DF2 have the same column name under -(DF1=Data.frame (ID=Seq0, by=3, length=5), name=Paste'Zhang', SEQ (0, by=3, length=5)))) the(DF2=Data.frame (ID=Seq0, by=4, length=4), name=Paste'Zhang', SEQ (0, by=4, length=4)))) -  - Rbind (DF1,DF2) -  +Merge (DF1,DF2,all=T) #去重, do not use by -  +Merge (DF1,DF2, by=('ID')) #重名的列会被更改显示

Iv. Data Conversion

Construction properties
Normalization (extremely poor, standardized)
Discretization of
Improved distribution

R Language-Data preprocessing

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

R Language-Data preprocessing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

R Language-Data preprocessing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support