First, background:
Recently has been doing DCM related programming work, the previous project uses C + +, so use DCMTK open Source Library, and currently the team uses C # mostly, so need to turn to use the Fo-dicom library, Due to the previous column Dicom medical image processing: The use of fo-dicom to send C-find query worklist in the debugging process requires the DIMSE information to be manually saved, Accidentally found out that the DCMTK Open Source Library and the fo-dicom Open Source Library in the way the DCM files are used in different ways, so decided to study, looking forward to a comparative analysis to see which is better.
second, DCMTK and fo-dicom Save the file function of the source code analysis:1) SaveFile of Dcmfileformat in DCMTK
SaveFile the file write state into four kinds, namely Erw_init, Erw_ready, Erw_inwork, erw_notinitialized four states. Different processing for different states, so you can think of DCMTK in the file save using the "state machine" approach, which is a bit like I used to write a C + + code of self-parsing program, but also by discriminating the current environment to set different states, so jump to different operations. State machines This approach is common in digital circuits and in the compilation principle (http://blog.chinaunix.net/ uid-14880649-id-3011358.html), but DCMTK is the most intuitive state machine used during file writing (http://blog.csdn.net/xgbing/article/details/ 2784127).
State machines can be simply understood as "specific states, for input characters, state changes, no additional behavior", the state machine implemented in DCMTK may not quite conform to the original state machine definition, because the jump of each state is not determined by the current input, It is based on the current level of write to the DCM file (can it be understood as an event?). ) to set each state (that is, erw_xxx four states), the different phases need to complete the format check, write preparation, write, write completion.
There are two kinds of programming of state machine, namely, vertical writing--that is, to classify the events that occur in each state, and to classify them according to the state changes that each event can cause (see http://blog.csdn.net/tomsen00/article for details. /details/4932789), DCMTK file saving function is used in the "vertical writing", because there is a sequence between the states, not unordered jump, you must first complete the prefix (preamble), and then to write the information element (MetaInfo), Finally, the real data Body (dataset). In addition, due to the self-inclusion of the DCM file itself, the MetaInfo and the dataset two parts of the inter-nesting (can refer to my previous post http://blog.csdn.net/zssureqh/article/details/ 9275271), the different parts of the DCM file are separately using "state machine" to control the writing of each part to ensure smooth progress.
2) Save for dicom in fo-dicom
In the Dicomfilewriter class, the synchronous function in write directly initiates the asynchronous call request, with the following code:
public void Write (Ibytetarget target, dicomfilemetainformation filemetainfo, Dicomdataset DataSet) {
EndWrite (BeginWrite (Target, filemetainfo, dataset, NULL, NULL));
}
It can be seen that this is an earlier asynchronous programming model in C #--APM, for the DCM file header, file meta information, file data body three different parts, through the way of asynchronous callbacks. This is the same as the result of the "state machine" implementation in DCMTK, which is the same for the orderly control of each state, so what is the difference between the two? Fo-dicom is the re-encapsulation of the DCMTK, does this mean that the asynchronous callback is better than the state machine mode?
third, DCMTK and fo-dicom save file actual detection:
Since we have not yet found a good explanation for the principle, or have not fully understood the essential differences between the two ways, then we can actually experience the difference between them directly from the results of their operation.
1) Processexplorer
Simple constructs C + + console program and C # console program, respectively using DCMTK savefile and fo-dicom Save, the specific code is as follows:
Dcmtk-save-test.cpp: Defines the entry point of the console application. #include "stdafx.h" #include "dcmtk/config/osconfig.h" #include "dcmtk/dcmdata/dctk.h" #include "dcmtk/dcmdata/ Dcpxitem.h "#include" dcmtk/dcmjpeg/djdecode.h "#include" dcmtk/dcmjpeg/djencode.h "#include" dcmtk/dcmjpeg/ Djcodece.h "#include" dcmtk/dcmjpeg/djrplol.h "int _tmain (int argc, _tchar* argv[]) {Sleep (15000); Dcmfileformat MDCM; Mdcm.loadfile ("D:\\DCMDATA\\TEST2.DCM"); Mdcm.savefile ("D:\\DCMDATA\\DCMTK-TEST2.DCM"); return 0; }
Using System; Using System.Collections.Generic; Using System.Linq; Using System.Text; Using system.threading.tasks;using Dicom; Using Dicom.network; Using Dicom.Log; Using System.threading;namespace FindSCU1 {class Program {static void Main (string[] args) {thread.sleep (1000); Dicomfile MDCM = Dicomfile.open (@ "D:\DCMDATA\TEST2.DCM"); Mdcm.save (@ "D:\DCMDATA\FO-TEST2.DCM"); Console.read (); } } }
Use Processexplorer to observe the actual performance of the two libraries when saving files.
The first is the actual performance analysis of the SaveFile function of the DCMTK Open Source Library,
The second is the performance analysis of the Save function of fo-dicom, which gives the creation, suspend and termination of multithreading.
It can be seen that fo-dicom's save function uses multi-threaded asynchronous Programming mode to complete the DCM file write, from the performance analysis can be intuitively seen in the multi-threaded version of the fo-dicom increase in time, but in other consumption increased (of course, the analysis is not scientific, Because C # 's underlying operating environment is much more complex than C + +, it cannot be simply attributed to fo-dicom using APM results)
2) Vshost.exe episode:
The first time you use Processexplorer to see the performance of two simulations, you can see the fo-dicom version of multiple threads as normal (as shown in the previous figure), but have not found my own project in the Processexplorer tool during the second debugging process.
It was later discovered because the Vshost.exe debug container was started and our program was running in that container. By right-clicking on the project properties, selecting the "Debug" option, removing the "Enable Visual Studio hosting process" option, and after recompiling starts, you will be able to see our FINDSCU1.exe process smoothly in processexplorer.
iv. Asynchronous Programming patterns in C #:
To tell the truth, just to the C # domain, for many of the internal presence of delegates and event bindings, as well as complex jump process also can not touch the clue, the main thing is the internal mechanism and the performance impact of not a clearer understanding, simple learning to use delegates, events, and asynchronous programming of many patterns is not difficult, but more should, It's even harder to understand why Microsoft is adding so many new features to C #. If so, there must be a reason for him to understand fundamentally the reasons for introducing this mechanism to better use the many new features of C #. Below is a brief introduction of C # of several asynchronous programming methods, because there is not really clear the nature of the pattern and performance differences, so it is not detailed analysis, interested can refer to the next section I raised some of the questions, perhaps some inspiration.
Here is a quote from others: the so-called pattern, in fact, is a method, just like the design pattern, is from the engineering practice summed up to solve similar or specific problems of a customary means. Common asynchronous patterns include the following:
1) APM mode: Beginxxx/endxxx,iasyncresult
This is the pattern used in the Save function of the fo-dicom version mentioned above, which is an early asynchronous model in C #, which is straightforward to understand.
2) EAP mode: Event-based Asynchronous Pattern, common with Windows Form, Methodnameasync, event
This part of the pattern is mainly derived from the concept of new "events" in C #, which differs from real-life specific events that occur at a certain time and in a space, in fact, a re-encapsulation of messages in the Windows operating system (blog http://blog.csdn.net/fan158/ article/details/6178392 a detailed analysis of events in C #) messages can be understood as an abstraction of the actual events that occur during the operation of the operating system, which can be passed between applications, and C # Event-to-message encapsulation allows the message to be bundled with its corresponding response function, providing greater convenience to the developer, but its nature is unchanged, that is, when an event occurs, it triggers a handler that is bound to the event (which is also a common implementation mechanism used by the computer's underlying system, such as interrupt vector tables).
3) Tap mode: task-based asynchronous mode, common with Methodnameasync, task, task<result>4) Asynchronous programming patterns introduced by Async and await in c#5.0
For the latter two types of asynchronous mode contact is not much, but the essence of the principle is the same, all need to use the thread pool (also can say multithreading) and events (messages) to complete the asynchronous operation, follow-up will further learn.
Finally talk about your own understanding of asynchronous programming, why asynchronous programming occurs? Why do we have to be asynchronous? The reason is only one-"efficiency", efficient use of computer resources. For example, you go to a restaurant to eat, the order must be the same as the cook is not the same person (of course, there is this extreme situation ha), why do you want to separate the two kinds of work? The reason is that the time and energy required to complete two things is completely different, the order is simple and fast, the requirement is to understand the customer's requirements accurately, and the complex slow cooking, the request is to make a good taste. If the people who make the dishes are also responsible for ordering, there are only two ways, a customer needs to wait for the dishes to finish to order, and this wasted the customer a long time, affecting the customer experience, one is to cook the process out to listen to customers order, and this interrupted the rhythm of cooking, natural dishes can not eat. Either way, it is estimated that the restaurant is not far away from bankruptcy. This scene in the computer also often occurs, the user's various operations, various requirements are also different, some need to perform long-time operations (garbage collection, defragmentation), and some need to respond quickly (such as UI interface), then in the Computer field, how do Daniel solve it? Like the Intel Daniel is committed to improve the performance of the computer itself, and expect to provide unlimited resources to meet the needs of users, it is like a restaurant to invite more chefs, when the number of chefs is always greater than the number of times the problem is solved naturally (of course, this situation is not the real world)-based on this situation, is derived from the real parallel programming, parallel computing, and like Microsoft's Daniel is committed to the development of new usage patterns, the hope is more efficient and more reasonable use of limited resources, it is like first let the chef for a period of time, a short period of time to concentrate on accepting the guests ' order, and then take a longer time to cook, as long as the time will also make the majority of customers satisfied, and then progress a little is to spend a small cost of a waiter who is responsible for ordering a meal, in charge of the customer order, if the chef can feel at ease in the kitchen stir-fry-in this case, there are many of the asynchronous programming patterns we mentioned, The completion port is much like the restaurant I'm talking about: After the completion port is created, the system will open up some threads in advance, which is related to the core number of CPUs, the basic rule is: Number of threads =2*cpu Core number +1 (this is like having more than one chef), while assigning a message queue to the completion port The system then uses a lightweight thread to quickly accept the user's response and is responsible for adding it to the message queue (this is like ordering a waiter to order), and when the message queue is non-empty, the thread pool's threads take the messages out of them for the appropriate processing (this is like having the chef handle the menu in chronological order). This will be able to efficiently meet the needs of the majority of users, of course, the above two cases is only the real world of the two extreme cases of various situations, so in the real world in order to improve efficiency there will be more ways, such as chefs can be fried according to the type of dishes, the waiter can also be in accordance with the capacity of the seat to arrange order, and so on.
However, when the number of cooks and the number of waiters is rapidly expanding, the cooperation between each other is becoming more and more important, which is like the concept of synchronization in asynchronous programming, which requires coordination between the threads of asynchronous running, especially when accessing shared resources, or when the core steps of a thing need to be ordered. For example, the Save function of fo-dicom saves the preamble, MetaInfo, and dataset of DCM. In order to improve the efficiency of the developers, so that the existing efficient processing transaction mode can be used quickly and easily, there are various asynchronous programming modes such as APM, EAP, Task and async+await in C #.
v. Thinking
From C + + to C # Soon, has always been the case for C + +, to the C + + in the completion of the port, whether it is similar to the thread pool in C #, C + + file stream, I/O flow and C # file stream, I/O flow is a big difference, follow-up to further analysis.
1) thread pool and completion port
In the CLR via C # Third edition, it is pointed out that inside the CLR is the "thread pool" implemented using the completion port technology, which I used to analyze the good performance of the completion port in the asynchronous transfer of DCM files (see blog: Completion port learning Note (i): Complete port + console for file copy, Completion Port learning Note (ii): Completion of the port Implementation Mechanism simulation)
The completion port is a programmatic approach that is strongly advocated in Windows programming, and can be efficiently implemented with compute-intensive and IO-intensive operations by using a thread pool and Message Queuing. In my top two posts, there is a detailed anatomy of the completion port, as well as a simple look at the examples listed in the previous section.
2) Asynchronous programming and state machine
Give a few great blog posts about asynchronous programming and state machines:
Http://msdn.microsoft.com/zh-cn/magazine/hh456403.aspx
Http://blog.chinaunix.net/uid-14880649-id-3011358.html
http://blog.csdn.net/tomsen00/article/details/4932789
The async and await new asynchronous programming patterns introduced by c#5.0 are implemented asynchronously in the compiler, using state machines for asynchronous programming. As mentioned above, the so-called asynchronous programming model is just a summary of the experience, a idiom, more exaggerated point of view of C # from APM to today's async and await the improvement is actually a summary of the experience and the previous pattern of improvement, of course, it requires the operating system and C # language underlying embedded support to achieve, To make it easier for developers to work.
In this simple summary of the current experience, DCMTK single-threaded savefile function using state machine to make the program flow clear, concise, for subsequent code maintenance also provides convenience; fo-dicom APM Asynchronous Programming mode, Through the way of starting thread pool to achieve a certain increase in efficiency, in addition to the DCM file is divided into separate and related parts, to detect and catch the exception, for the safe operation of the program also has some help. It can be said that Fo-dicom uses the APM asynchronous programming model to simulate synchronous writes of DCM files, which take advantage of the asynchronous nature of the C # FileStream stream. About the FileStream class in "CLR via C #" in the 27.8.8 section of the detailed introduction, you can understand.
Vi. References:
The above analysis did not really find the answer I want, so follow up to continue to read the relevant books and materials, and strive to find the desired results as soon as possible, write this blog when the main reference books as follows, recommended to see the English version, and then look at the Chinese version, so that understanding may be more thorough, because the translation is not first-hand information
"CLR via C #" (Chinese third edition)
"C # 5.0 in a Nutshell" (English version)
Follow-up Column blog introduction:
1) The setup problem of the total aetitle of multiple network segments in DICOM
2) Introduction to MPPs services in Dicom
3) C # Asynchronous Programming pattern
[Email protected]
Date: 2014-09-22
DICOM Medical image processing: "Synchronous vs Async" + "Single Thread vs Multithreading" for different design patterns of DCMTK and fo-dicom save files