Duff equipment (Duff's device)

Source: Internet
Author: User
Tags case statement

Source: Wikipedia

In the field of computer science, Duff EquipmentEnglish Duff ' s device) is a serial copy (serial copy)The optimization realizes, through the assembly language programming time one common method, realizes the expansion circulation, then enhances the execution efficiency. This approach is believed to have been invented in November 1983 by Tom Daff, who worked for Lucas Pictures , and could be the most ingenious implementation so far using the C -language switch statement feature. Directory
  • 1 Optimizing the background
  • 2 Original Code
  • 3 Implementation Mechanism
  • 4 Performance
  • 5 Straussrup Version Code
  • 6 Notes
  • 7 References
  • 8 Extended Reading
  • 9 External Links
optimize the background

In programming, loop unwinding focuses on using batch processing to reduce the number of total processing branches. When the serial copy of the data, when the total number of cycles can not be expanded after the increment of the loop is divisible, it is generally used to jump directly to the middle of the loop after the expansion of the way to complete the rest of the data replication process.

Therefore, according to the thought of cyclic expansion, for the need of serial copying data, Tom Daff with a maximum of 8 values per iteration, and C language to write an optimization implementation, successfully optimized the efficiency of serial replication.

Original Code

To copy an array element into the memory map output register, the more straightforward approach is as follows [Note 1]:

1  Do {/* */                          2*to = * from + +;                 /*  */3while0);

Duff's insight is that if you combine a switch with a loop in this process, you can expand the loop to achieve the following [Note 2]:

1Send (To, from, Count)2Register Short*to, * from;3 register count;4 {5Register N = (count +7) /8;6     Switch(Count%8) {7      Case 0: Do{*to = * from++;8      Case 7: *to = * from++;9      Case 6: *to = * from++;Ten      Case 5: *to = * from++; One      Case 4: *to = * from++; A      Case 3: *to = * from++; -      Case 2: *to = * from++; -      Case 1: *to = * from++; the} while(--n >0); -     } -}
Implementation Mechanism

The duff device is based on the commonly used algorithm of "minimizing the number of judgments and branches during replication" in assembly language programming, and is implemented in C language. The code appears to be incompatible with the environment, but can still be compatible with C, which has two:

On the one hand, the C language has a relatively loose specification of switch statements. Duff equipment was invented, the first edition of the C programming language to lead the C language specification, and according to which, in the switch control statement, thecondition label (case) can appear in any sub-statement before the prefix. In addition, if the break statement is not added, the control flow will ignore the rest of the conditional label and execute to the end of the switch nested statement after the switch statement jumps to the corresponding label according to the condition, and begins execution. This is the "drop" (Fall-through) attribute of the switch statement. With these features, the code can copy count data from a contiguous address into the memory-mapped output register.

On the other hand, the C language provides support for jumping to the inside of the loop, so the switch/case statement here can jump to the inside of the loop.

Many C-language compilers emulate assembly-language programming to convert switch statements to transfer tables, which improves execution efficiency. In the C language, the default "Drop" feature of the switch statement has long been controversial, and Duff has found that "this code is an argument for some of the arguments in this discussion, but I'm not sure whether to support or deny it." "

Performance features

From the speed, because of the use of the loop expansion technique, so that the number of branches required to reduce, thereby reducing the processing of branches, interrupt and refresh the pipeline of huge operational overhead, and compared to the simple, direct loop code, the execution of this code is more efficient. In addition, the code is easy to know, without a switch statement, this code can only copy 8*n data items, and if count cannot be divisible by 8, then there is still count%8 (that is, the remainder of count in addition to 8) item is not processed; In view of this, the Switch/case statement is embedded here, Responsible for processing the remaining data.

However, Duff's equipment also has its limitations. In some circumstances, the use of switch/case statements to deal with the remaining data items, sometimes not the optimal choice, and corresponding, if the use of a double-loop mechanism (first to achieve a post-expansion loop, copying 8*n data items, and then another loop, to replicate the remaining data items), may be faster. This is often due to the fact that the compiler is unable to optimize for Duff devices, but may be due to differences in the pipeline and transfer prediction mechanisms of some architectures [1]. In addition, it was tested that performance was significantly increased [2] when all duff device code was cleaned out of XFREE86 Server 4.0 code . Therefore, if you intend to use the DAF device, it is best to benchmark the DAF device against the hardware architecture, optimization level, and compiler used, and then decide.

Straussrup Version Code

The original Duff device only satisfies the need to copy data to a (memory-mapped) register. To replicate data serially between storage addresses, a self-increment operation is required each time the pointer to is referenced, as follows:

*to++ = *from + +;

This version of the code is taken from the "C + + programming language," a book Bjarne Stroustrup "What is the use of this code?" (What's does this code does?) "The Practice section, and the reason why he was so modified, is probably due to the fact that novice programmers generally know nothing about memory-mapped output registers. It is worth mentioning that for the requirements of serial replication, the standard C language library provides the memcpy function, which is less efficient than the Straussrup version of the Duff device, and may contain optimizations for a particular architecture, further significantly improving execution efficiency [3][4].

notes
    1. ^ This code does not replicate "memory-to-memory", so it does not need to be *to++.
    2. ^ the loop expansion size in your code can be increased or decreased as needed, not necessarily 8.
references

Part or all of this article is derived from the free online computer dictionary (FOLDOC) entry Duff ' s device, issued under the authority of GFDL.

    1. ^  james Ralston ' S USENIX 2003 Journal
    2. ^   Cao Zide.  re: [PATCH] re:move of input drivers, some word needed from. Linux Kernel Archive mailing List.
    3. ^   Wall, mike. using Block Prefetch for Optimized Memory performance. mit.edu. 2002-03-19 [2012-09-22].
    4. ^   Fog, agner. optimizing subroutines in assembly language. Copenhagen University College of Engineering. ff. 2012-02-29 [2012-09-22].
Extended Reading
    • Biani Straustrup, C + + programming language Third Edition . Addison-wesley, ISBN 0-201-88954-4
    • Blaine Collingham and Dennis Ritchie, C programming language .
External Links
    • Tom Daff's explanation of the mechanism of Duff's equipment (English)
    • Duff ' s device, c-faq.com(US)
    • A reusable Duff Device, Dr.dobb ' s Journal(US)
    • Tom Daff's original discussion on Usenet (English)

Duff equipment (Duff's device)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.