Luigi Study 1

Last Update:2016-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Introduction of Luigi

Luigi is a Python-based language that helps to build a complex streaming batch task management system. These batch jobs typically have Hadoop job, import and export of database data, or machine learning algorithms, and so on.

Luigi's Github:https://github.com/spotify/luigi

At present, there are some data processing tools, such as hive,pig,cascading, which are lower in abstraction level. Luigi is not to replace them, but to help you manage them, the Luigi task can be a hive query, a Java-written Hadoop job, a Scala-written spark job, or a Python program. Luigi provides workflow management for a large number of interdependent jobs, so programmers can put their energies into the job itself.

There are some similar projects such as Oozie and Azkaban. One important difference is that Luigi is not just for hadoop jobs, it can easily extend other types of tasks.

Second, Luigi's official website Hello World Example

The purpose of the 2.1top artists example

The purpose of this example is to assemble a stream of some production data, then find the first 10 artists and save the final result to the database

2.2Aggregate Artist Streams

classAggregateartists (Luigi. Task): Date_interval=Luigi. Dateintervalparameter ()defoutput (self):returnLuigi. Localtarget ("DATA/ARTIST_STREAMS_%S.TSV"%self.date_interval)defrequires (self):return[Streams (date) forDateinchSelf.date_interval]defRun (self): Artist_count=defaultdict (int) forInputinchSelf.input (): With Input.open ('R') as In_file: forLineinchin_file:timestamp, artist, track=Line.strip (). Split () Artist_count[artist]+ = 1With Self.output (). Open ('W') as Out_file: forArtist, CountinchArtist_count.iteritems ():Print>> out_file, artist, Count

For the explanation of this class:

Requires method: This method specifies the dependencies required for this task, in this case, Aggregatearttists relies on a stream job, and the stream job requires a date as a parameter.

Parameters: Each job can define one or more parameters, which need to be defined at the class level. For example, the above class has a parameter Date_interval

Output method: Defines the place where the job results are saved.

Run method: For normal task, you need to implement the Run method. In the Run method can be anything, you can create sub-processes, for long-time arithmetic operations and so on. For some of the subclass of the task, you don't need to implement the Run method, such as jobtask requires you to implement the Mapper and reducer methods.

Localtarget: This is a built-in class that can help you easily read or write local disks. and ensure that the operation of the disk is atomic.

2.3Streams

classStreams (Luigi. Task): Date=Luigi. Dateparameter ()defRun (self): with Self.output (). Open ('W') as output: for_inchRange (1000): Output.write ('{} {} {}\n'. Format (random.randint (0,999), Random.randint (0,999), Random.randint (0,999)))    defoutput (self):returnLuigi. Localtarget (Self.date.strftime ('DATA/STREAMS_%Y_%M_%D_FAKED.TSV'))

This class has no dependencies and the resulting effect is to produce a result file on the local file system.

2.4 Performing locally

Pythonpath=' Luigi--module top_artists aggregateartists--local-scheduler--date-  .

After execution, a data directory is generated under the current directory, and the contents of the data directory are as follows:

(my_python_env) [Email protected] data]#lsartist_streams_2012- .. TSV STREAMS_2012_06_06_FAKED.TSV streams_2012_06_12_faked.tsv STREAMS_2012_06_18_FAKED.TSV streams_2012_06_24 _FAKED.TSV STREAMS_2012_06_30_FAKED.TSVSTREAMS_2012_06_01_FAKED.TSV STREAMS_2012_06_07_FAKED.TSV streams_2012_06_ 13_FAKED.TSV STREAMS_2012_06_19_FAKED.TSV STREAMS_2012_06_25_FAKED.TSVSTREAMS_2012_06_02_FAKED.TSV streams_2012_ 06_08_FAKED.TSV STREAMS_2012_06_14_FAKED.TSV STREAMS_2012_06_20_FAKED.TSV Streams_2012_06_26_faked.tsvstreams_  2012_06_03_FAKED.TSV STREAMS_2012_06_09_FAKED.TSV STREAMS_2012_06_15_FAKED.TSV STREAMS_2012_06_21_FAKED.TSV STREAMS_2012_06_27_FAKED.TSVSTREAMS_2012_06_04_FAKED.TSV STREAMS_2012_06_10_FAKED.TSV streams_2012_06_16_ FAKED.TSV STREAMS_2012_06_22_FAKED.TSV STREAMS_2012_06_28_FAKED.TSVSTREAMS_2012_06_05_FAKED.TSV streams_2012_06_ 11_FAKED.TSV STREAMS_2012_06_17_FAKED.TSV STREAMS_2012_06_23_FAKED.TSV STREAMS_2012_06_29_FAKED.TSV

Streams_*: is generated by the stream job.

Artist_*: Aggregateartists generated, just a file.

2.5 Extensions

Running the above execution command again finds that nothing has been done because the output of all tasks already exists. This means that the Luigi task is idempotent, meaning that the output of the job should be constant no matter how many times it is executed.

--local-scheduler told Luigi not to connect to scheduler server. This is not a recommended way to run, which is also used in the testing phase.

Luigi Study 1

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Luigi Study 1

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Luigi Study 1

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support