CATV is the abbreviation of "Community antenna television", that is, sharing antenna TV.
CATV Network is the "CATV network", is an efficient and inexpensive comprehensive network, it has a wide bandwidth, large capacity, multi-function, low cost, anti-interference ability, support a variety of services to connect the advantages of thousands of households, its de
One: BackgroundIn the MapReduce model, the function of reduce is mostly statistical classification type of total, the maximum value of the minimum value, etc., for these operations can consider the map output after the combiner operation, so as to reduce network transport load, while reducing the burden of reduce tasks . The combiner operation is run on each node, only affects the output of the local map,
Brief IntroductionThe role of combiner is to combine the multiple of a map generation into a new one , and then the new one as the input of reduce;There is a combine function between the map function and the reduce function to reduce the intermediate result of the map output, which reduces the data of the map output and reduces the network transmission load ;It is not possible to use Combiner,
ArticleDirectory
Declare combiner Function
Many mapreduceProgramLimited by the available bandwidth on the cluster, it will try its best to minimize the intermediate data that needs to be transmitted between map and reduce tasks. Hadoop allows you to declare a combiner function to process map output, and use your own map processing result as the reduce input. Because
What is combiner Functions
“Many MapReduce jobs are limited by the bandwidth available on the cluster, so it paysto minimize the data transferred between map and reduce tasks. Hadoop allows the user to specify a combiner function to be run on the map output—the combiner function’soutput forms the input to the reduce function. Since the
CATV Network is an efficient and inexpensive comprehensive network, it has a wide bandwidth, large capacity, multi-function, low cost, anti-interference ability, support a variety of services to connect the advantages of millions of households, its development for the development of Information superhighway laid a foundation. In recent years in China CATV Network development, the country has built more than
1.CombinerCombiner is an optimization method for MapReduce. Each map can generate a large amount of local output, and the Combiner function is to merge the output of the map end first to reduce the amount of data transferred between the map and reduce nodes to improve network IO performance. The combiner can be set only if the operation satisfies the binding law.The role of
As we all know, the hadoop framework uses Mapper to process data into a
In the above process, we can see at least two performance bottlenecks:
If we have 1 billion million data records, mapper will generate 1 billion key-value pairs for transmission across the network, but if we only calculate the maximum value for the data, obviously, mapper only needs to output the maximum value it knows. This not only reduces the network pressure, but also greatly improves program efficiency.
This defini
Partitioner Programming data that has some common characteristics is written to the same file. Sorting and grouping when sorting in the map and reduce phases, the comparison is K2. V2 are not involved in sorting comparisons. If you want V2 to be sorted, you need to assemble K2 and V2 into new classes as K2,To participate in the comparison. If you want to customize the collation, the sorted object is implementedWritablecomparable interface, implementing collations in the CompareTo method
As we all know, the hadoop framework uses Mapper to process data into a
In the above process, we can see at least two performance bottlenecks:
If we have 1 billion million data records, mapper will generate 1 billion key-value pairs for transmission across the network, but if we only calculate the maximum value for the data, obviously, mapper only needs to output the maximum value it knows. This not only reduces the network pressure, but also greatly improves program efficiency.
This defini
1. Concept2. ReferencesImprove the MapReduce job Efficiency Note II of Hadoop (use combiner as much as possible): Http://sishuo (k). com/forum/blogpost/list/5829.htmlHadoop Learning notes -8.combiner and custom Combiner:http://www.tuicool.com/articles/qazujavHadoop in-depth learning: combiner:http://blog.csdn.net/cnbird2008/article/details/23788233(mean Scene) 0Hadoop using
The bandwidth available on the cluster limits the number of mapreduce jobs, so the most important thing to do is to avoid the data transfer between the map task and the reduce task as much as possible. Hadoop allows users to specify a merge function for the output of the map task, and sometimes we also call it combiner, which is like mapper and reducer.The output of the merge function as input to the reduce function, because the merge function is an o
remains the same and the cluster Scale doubles, the computing time is halved.
This chapter is arranged as follows:
Section 3.1This section describes the importance of local aggregation and details the combiner. Local merge can merge mapper output results to reduce the amount of data that needs to be transmitted over the network. This section describes the "in-mapper combining" design mode.
Section 3.2The example of constructing the word co-orrcuranc
3.1 local Aggregation)
In a data-intensive distributed processing environment, interaction of intermediate results is an important aspect of synchronization from processes that generate them to processes that consume them at the end. In a cluster environment, except for the embarrassing parallel problem, data must be transmitted over the network. In addition, in hadoop, the intermediate result is first written to the local disk and then sent over the network. Because network and disk factors ar
comment.
It is also worth mentioning that snappy, which is developed by Google and open source compression algorithm, is the Cloudera official strongly advocated in mapreduce used in the compression algorithm. It is characterized by: in the case of similar compression rate as the Lzo file, the compression and decompression performance can also be greatly improved, but it is not divisible as a mapreduce input.
Extended content:
Cloudera Official Blog to snappy Introduction:
http://blog.cloudera.
. Therefore, we need to customize partition to choose the record reducer according to our own requirements. Custom Partitioner is simple, as long as you customize a class, and inherit the Partitioner class, overriding its Getpartition method is good, when used by calling the job's setpartitionerclass to specify can beThe results of the map will be distributed to reducer via partition. Mapper results, may send to combiner do merge,
additional mapreduce functions figure 4.6 inserts the mapreduce data stream of combiner. combiner: the pipeline shown above ignores a step that can optimize the bandwidth used by mapreduce jobs. This process is called combiner, which runs before CER er and reducer. Combiner is optional. If this process is suitabl
the Optical Network Unit-ONU) onaccessh1001. The transmission media is a single-mode optical fiber. You can select either a single fiber or a dual fiber. An OnAccess5xxx optical switch can connect to the OnAccessH1001 optical network unit of 24 users. OnAccessH1001 can be connected to up to four computers or IP phones at the same time. If you only have one computer, you can select the OnAccess1001 Optical Fiber Nic to be inserted into the computer and directly connect to the optical fiber. The
communication and CATV access cable Resources in the residential area belong to all operators. Another feature of China's FTTH target market is the existence of industry barriers in the telecom business: telecom operators are not allowed to operate CATV services, and this situation cannot be changed for quite some time in the future.
2. Selection of FTTH Optical Fiber Access Technology in China1) Passive O
-olt) which is placed in the cell room, ONACCESS5XXX series low cost optical switch and the optical Network unit placed on the user side (Optical network unit- ONU) OnAccessH1001 composition. Transmission medium is single-mode fiber, can choose single fiber, also can choose double fiber. A onaccess5xxx optical switch can connect up to 24 users of the ONACCESSH1001 Optical Network unit, ONACCESSH1001 can also line up to 4 computers or IP telephony. If the user has only one computer can choose OnA
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.