Python Date function udf-program sharing

Source: Internet
Author: User
Tags stdin

The UDF date processing function based on Python function

1, based on the recent study of Python, is to try Python for a date processing UDF function output, the pro-test can be uploaded to hive for use. Learn to use Python crawlers to crawl the Yellow pages, and try to cheat on the amount of reading (laughter). Finally, get to the algorithm and machine learning. I suddenly felt like I was doing a lot of chores. No way, who call me a yard farmer?

2, use-Input what parameters

This py file includes a bunch of date calculation functions.

These include

Name of function Implementing logic Instructions for use Incoming parameters Example
Week_begin First get this week is a few weeks, and then use 7 minus the weeks a few, to obtain this day need to offset the difference, and then use the offset date function can be obtained. Enter the date to arrive at the week of 1st of the week in which the date Dates (date) ' 20171031 ' or ' 2017-10-31 '
Week_end This week is the first, then the remainder of the week divided by 7, the backward offset, and then the offset date function can be obtained Enter the date to arrive at the weekend date of the week on which the date Dates (date) ' 20171031 ' or ' 2017-10-31 '
Week_num Using Python's built-in date function to derive Enter the date to arrive at the date of the week Dates (date) ' 20171031 ' or ' 2017-10-31 '
Month_begin The Intercept field is preserved, and the date is changed to 01 and then combined. Enter a date to obtain the date of the month in which the date is located Dates (date) ' 20171031 ' or ' 2017-10-31 '
Month_end The Intercept field is preserved, and the month-end date is generated using Python's built-in functions, combined. Enter the date to arrive at the end date of the month in which the date Dates (date) ' 20171031 ' or ' 2017-10-31 '
Quarter_begin Mainly to deal with the problem of the month, the current month divided by 3.1 and then rounding, the current is the n-1 quarter, and then the first quarter (offset) Enter a date to arrive at the quarter start date of the quarter of the date Dates (date) ' 20171031 ' or ' 2017-10-31 '
Quarter_end Principle with Quarter_begin in the first quarter where the rigged Enter the date to arrive at the quarter end date of the quarter of the date Dates (date) ' 20171031 ' or ' 2017-10-31 '
N_day Convert a string into a date format and add Enter date and offset after which the date is offset Date, offsets (offset) ' 20171031,3 ' or ' 2017-10-31,3 '
N_week Call the Week_begin function to calculate the function of the week Monday, then *7 the re-offset and call the N_day function Enter the date and the offset to obtain the data after the date offset n weeks (the result is Monday) Date, offsets (offset) ' 20171031,3 ' or ' 2017-10-31,3 '
N_month Intercept the year divided by 12, the remainder of the intercept month divided by 12 to calculate the offset, and finally to process the day Enter date and offset the date after which the date is offset by n months Date, offset, month end or month (tail) ' 20171031,3,begin ' or ' 2017-10-31,3,end '
N_quarter Call Quarter_begin to calculate the initial date of the quarter, and then call n_month*3 to calculate the offset Enter a date to derive the date after n quarters of the date offset Date, offset, month end or month (tail) ' 20171031,3,begin ' or ' 2017-10-31,3,end '

3, Python on the hanging

1), first upload this pile of files to the server.

2), loading the Python file into the cache when used

Add file/home/hive/zyp/python/date.py

3), the last time the hive executes, call this PY function to

SELECT TRANSFORM (' n_day,20171003,3 ') USING ' Python date.py ';

4. Logical Sharing

1), Python udf nesting

The original is written as each date function is one, but later found that python in the local invocation of the time is OK, but the server after the transfer is not good. This is also true, who call others Java UDF are packaged well, need to register it?

Then think about it anyway this function is also written good here need to call to use, so simply put all the functions are packaged together, through a parameter to call.

Concrete implementation methods:

def func (argument,date_p,offset=1,tail= ' begin '):

Try

switcher={

' Week_begin ': Week_begin (str (date_p), Week_num (str (date_p))),

' Week_end ': Week_end (str (date_p), Week_num (str (date_p))),

' Week_num ': Week_num (str (date_p)),

' Month_begin ': Month_begin (str (date_p)),

' Month_end ': Month_end (str (date_p)),

' Quarter_begin ': Quarter_begin (str (date_p)),

' Quarter_end ': Quarter_end (str (date_p)),

' N_day ': N_day (str (date_p), offset),

' N_week ': N_week (str (date_p), int (offset) *7),

' N_month ': N_month (str (date_p), Offset,tail),

' N_quarter ': N_quarter (str (date_p), Offset,tail),

}

Func=switcher.get (argument)

return func

Except

Return None

2), Parameter completion

After packing everything together, we found the problem, there is no need to write a function so many parameters ah. Once you perform these functions that do not require so many parameters. Hard-to-take parameters can cause errors.

So try for a long time, think of a solution, do not know is not the best to achieve. The way is to give him fill in the field chant.

The code is as follows:

Templist = [' 1 ', ' begin ']

For line in Sys.stdin:

Day_p = Line.strip (). Split (', ')

If day_p[1][4]== '-':

Date_p= (Day_p[1][0:4]) + (Day_p[1][5:7]) + (Day_p[1][8:11])

Else

DATE_P=DAY_P[1]

Day_p.extend (Templist[-int (4-len (day_p)%4):])

Day_list=day_p[0:4]

Print func (str (day_list[0]), date_p,day_list[2],day_list[3])

In the middle there is a code that handles the date with a bar.

3), program source code

#!/home/tops/bin/python

Import Calendar

Import Math

Import Sys

Import datetime

#test =[' n_day,20171003,3 ', ' week_begin,2017-10-05 ', ' week_num,2017-01-23 ', ' month_begin,2017-12-24 ', ' Month_end, 2017-09-30 ', ' quarter_begin,2017-10-05,5,begin ', ' quarter_end,2017-01-23,-7,begin ', ' n_day,20171224,8,begin ', ' N_ Week,2017-09-30,3,begin ', ' n_month,2017-10-05,5,begin ', ' N_quarter,2017-01-23,-7,begin ']

def week_end (Day_p,week_num):

Try

return N_day (str (day_p), (7-int (week_num)))

Except

Return None

def week_begin (Day_p,week_num):

Try

return N_day (str (day_p),-(int (week_num)%7))

Except

Return None

def week_num (day_p):

Try

Day_str = day_p[0:4]+ '-' +day_p[4:6]+ '-' +day_p[6:8]

Day_before = Datetime.datetime.strptime (day_str, '%y-%m-%d ')

Return Day_before.weekday () +1

Except

Return None

def month_begin (day_p):

Try

Return day_p[0:4]+ '-' +day_p[4:6]+ '-' + ' 01 '

Except

Return None

def month_end (day_p):

Try

Monthrange = calendar.monthrange (int (day_p[0:4]), int (Day_p[4:6]))

Return day_p[0:4]+ '-' +day_p[4:6]+ '-' +str (monthrange[1])

Except

Return None

def quarter_begin (day_p):

Try

quarter_begin= ' 0 ' +str (int (int (day_p[4:6))/3.1) *3+1)

Return str (Day_p[0:4]) + '-' +str (quarter_begin[-2:]) + '-' + ' 01 '

Except

Return None

def quarter_end (day_p):

Try

Quarter_end= ' 0 ' +str ((int (int (day_p[4:6))/3.1) +1)

return Month_end (str (day_p[0:4]) +str (quarter_end[-2:]) +str (' 01 '))

Except

Return None

def n_day (Day_p,offset):

Try

Day_str = day_p[0:4]+ '-' +day_p[4:6]+ '-' +day_p[6:8]

Day_before = Datetime.datetime.strptime (day_str, '%y-%m-%d ')

Day_after = Datetime.timedelta (Days=int (offset))

N_days = Day_before + day_after

Return N_days.strftime ('%y-%m-%d ')

Except

Return None

def n_week (Day_p,offset):

Try

Date_p=week_begin (Day_p,week_num (day_p))

Date_p1=str (Date_p[0:4]+date_p[5:7]+date_p[8:11])

return N_day (str (DATE_P1), int (offset) *7)

Except

Return None

def n_month (Day_p,offset,tail):

Try

Year_m=int (Day_p[0:4]) + (int (offset)/12)

month_m= ' 0 ' +str (int (int (day_p[4:6)) +int (offset)%12))

If month_m== ' xx ': month_m=12

If tail== ' begin ':

day_m= ' 01 '

Else

Monthrange=calendar.monthrange (int (year_m), int (month_m))

Day_m=str (Monthrange[1])

Return str (year_m) + '-' +str (month_m) [ -2:]+ '-' +str (day_m)

Except

Return None

def n_quarter (Day_p,offset,tail):

Try

Date_p=quarter_begin (day_p)

Return N_month (Date_p,int (offset) *3,tail)

Except

Return None

def func (argument,date_p,offset=1,tail= ' begin '):

Try

switcher={

' Week_begin ': Week_begin (str (date_p), Week_num (str (date_p))),

' Week_end ': Week_end (str (date_p), Week_num (str (date_p))),

' Week_num ': Week_num (str (date_p)),

' Month_begin ': Month_begin (str (date_p)),

' Month_end ': Month_end (str (date_p)),

' Quarter_begin ': Quarter_begin (str (date_p)),

' Quarter_end ': Quarter_end (str (date_p)),

' N_day ': N_day (str (date_p), offset),

' N_week ': N_week (str (date_p), int (offset) *7),

' N_month ': N_month (str (date_p), Offset,tail),

' N_quarter ': N_quarter (str (date_p), Offset,tail),

}

Func=switcher.get (argument)

return func

Except

Return None

def main ():

Try

Templist = [' 1 ', ' begin ']

For line in Sys.stdin:

Day_p = Line.strip (). Split (', ')

If day_p[1][4]== '-':

Date_p= (Day_p[1][0:4]) + (Day_p[1][5:7]) + (Day_p[1][8:11])

Else

DATE_P=DAY_P[1]

Day_p.extend (Templist[-int (4-len (day_p)%4):])

Day_list=day_p[0:4]

Print func (str (day_list[0]), date_p,day_list[2],day_list[3])

Except

Return None

if __name__ = = "__main__":

Main ()

‘‘‘

Zhang Yupeng

Pinterest: HTTP://WWW.JIANSHU.COM/P/5C70D4ADE0DF

Blog Park: http://www.cnblogs.com/Yuppy-Lotr/

Welcome reprint use. Don't change the name of the author.

‘‘‘

Python Date function udf-program sharing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.