International - English

Cart Console

Topic Center

Contact Sales

Home > Others

The road to mathematics-distributed computing-disco (4)

Last Update:2014-12-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The first parameter, ITER, is an iterator that involves the keys and values generated by the map function, which are the reduce instances.

In this case, the word is randomly delegated to a different reduce instance, and then the same word is used, and the reduce for processing it is the same, ensuring that the final total is correct.

The second parameter, params, is consistent with the map function, where only simple Disco.util.kvgroup () is used to extract each word count, cumulative count, yield (yield) result.

Run Job

Starting the job below, you can customize the job with a large number of parameters, but typically only 3 of them are used for simple tasks. In addition to starting the job, we also need to output the results, first of all, we wait before the job completes, by calling wait waits for the call to complete, the completion will return the results, for convenience, through the job object calls wait and other related methods.

The Result_iterator () function takes the result file address list, which is returned by the wait () function, and iterates (iterates) through the key-value pairs in all the results.

Defmap (line, params):

For word in Line.split ():

Yield Word, 1

Defreduce (ITER, params):

From Disco.util import Kvgroup

For word, counts in Kvgroup (sorted (ITER)):

Yield word, sum (counts)

if__name__ = = ' __main__ ':

Job =job (). Run (input=["Http://discoproject.org/media/text/chekhov.txt"],

Map=map,

Reduce=reduce)

For Word, Count Inresult_iterator (job.wait (show=true)):

Print (Word, count)

This blog all content is original, if reproduced please indicate source http://blog.csdn.net/myhaspl/

If all is proper, you can see the job execution, input read from Tagdata:bigtxt, this is the final printout created at the beginning, and when the job runs, you can open (or run the port of Disco master) to see the real-time process of the job.

Python count_words.py

You can also view the job process on the console as follows:

Disco_events=1 python count_words.py

As you can see, creating a new disco job is fairly straightforward. You can extend this simple example in any number of ways. For example, you include a list of deactivated words by using the params object.

If you put the Disco Distributed file system data, you can try to change the output to Tag://data:bigtxt, as well as add Map_reader =disco.worker.task_io.chain_reader.

You can try to use Sum_combiner () to make the job more efficient.

You can also try customizing functional partitioning and reading functions, written in the same way as the map and reduce functions, and then you can try to link the job together so that the previous job output becomes the next input.

Disco is designed to be as simple as possible so that you can focus on your own problems, not the framework.

The road to mathematics-distributed computing-disco (4)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

distributed computing software get the permalink wordpress the field road mapping big road dash link toy road signs crossy road download

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The road to mathematics-distributed computing-disco (4)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support