Just now I went online and found another beautiful article with all the images. That's great...
What theoretical knowledge of COM must be mastered when using VC for COM programming?
I have seen a lot of people learn about com. After reading a book, I feel that I have a better understanding of the principles of COM, but I just don't know how to compile the program. I also have this situation on my own, I have also gone through this stage. To learn the basic principles of COM, I recommend the book "COM technology insider". However, reading such a book alone is far from enough. Our ultimate goal is to learn how to use com to compile programs, rather than studying the mechanisms of COM itself. Therefore, I personally think that the basic principles of COM do not need to take a lot of time to follow up. It is not necessary. It is a thankless task. In fact, we only need to master several key concepts. Here I list several key concepts that I think are necessary to use VC programming. (Here we will talk about the COM programming method in the C ++ language)
(1) The COM component is actually a C ++ class, and the interfaces are pure virtual classes. The component is derived from the interface. We can simply describe what COM is in the form of pure C ++ Syntax:
Class iobject { Public: Virtual function1 (...) = 0; Virtual function2 (...) = 0; .... }; Class myobject: Public iobject { Public: Virtual function1 (...){...} Virtual function2 (...){...} .... }; |
Are you clear? Iobject is the interface we often call. myobject is the so-called COM component. Remember that all interfaces are pure virtual classes. All functions contained in the interface are pure virtual functions, and they do not have member variables. The COM component is the derived class inherited from these pure virtual classes. It implements these virtual functions. We can also see from the above that the COM component is based on C ++, especially the concept of virtual functions and polymorphism. All functions in COM are virtual functions, it must be called through the virtual function table vtable. This is extremely important and must be kept in mind at all times. To let you know exactly what the virtual function table looks like, copy the following example from COM + technology Insider:
(2) The COM component has three basic interface classes: iunknown, iclassfactory, and idispatch.
The COM specification specifies that all components and interfaces must be inherited from iunknown. iunknown contains three functions: QueryInterface, addref, and release. These three functions are extremely important, and their order cannot be changed. QueryInterface is used to query other interfaces implemented by the component. To put it bluntly, it is used to check which interface classes exist in the parent class of the component. addref is used to increase the reference count, and release is used to reduce the reference count. Reference counting is also a very important concept in COM. In general, it can be understood that the COM component is a DLL, and it should be loaded into the memory when the client program uses it. On the other hand, a component is not only for you, but may be used by many programs at the same time. But in fact, the DLL is only loaded once, that is, there is only one COM component in the memory. Who will release the COM component? Is it a client program? No, because if you release components, how can they be used by others? Therefore, the COM component is only responsible for this. Therefore, the concept of reference count emerged. com maintains a count and records the number of people currently using it. Each time multiple calls are called, the count is incremented. If one customer uses it, the count is reduced by one, when the last customer releases it, com knows that no one has used it, and its use is over, then it releases it. Reference counting is a very error-prone place in COM programming, but fortunately, the various class libraries of VC have basically implicitly called addref. In my impression, I have never called addref during programming. We only need to call release at the appropriate time. Remember to call release at least two times. The first is to call QueryInterface, and the second is to call any function that gets a pointer to an interface, remember to check msdn to check whether addref is called in a function. If so, you will be responsible for calling release. The implementation of the three functions of iunknown is very standard but cumbersome and error-prone. Fortunately, we may never need to implement them ourselves.
Iclassfactory is used to create COM components. We already know that the COM component is actually a class. How do we instantiate a class object? Use the 'new' command! It's easy, as is the COM component. But who will come to new it? It cannot be a customer program, because the customer program cannot know the class name of the component. If the customer knows the Class Name of the component, the reusability of the component will be greatly reduced, in fact, the customer program only knows a 128-bit numeric string representing the component, which will be introduced later. Therefore, you cannot create components on your own. If the components are on a remote machine, can you create a new object? Therefore, the responsibility for creating components is assigned to a separate object, which is a class factory. Each component must have a relevant class factory. This class factory knows how to create a component. When the customer requests an instance of a component object, the request is actually sent to the class factory, the class factory creates a component instance and then delivers the instance pointer to the customer program. This process is particularly useful when creating components across processes and remotely, because it is not a simple new operation, and it must be scheduled, these complex operations are handed over to the class factory objects. The most important function of iclassfactory is createinstance. The component is to create a component instance. Generally, we will not directly call it. The API function encapsulates it for us, only in some special circumstances can we call it by ourselves. This is also the advantage of VC's compilation of COM components, giving us more control opportunities, however, VB has given us too few and too few opportunities.
Idispatch is called a scheduling interface. What is its role? In addition to C ++, there are many other languages, such as VB, VJ, VBScript, and JavaScript. It can be said that if there are not so many messy languages in the world, there will be no idispatch. :-) We know that the COM component is a C ++ class and uses a virtual function table to call functions. For VC, there is no problem. This was originally designed for C ++, in the past, VB did not work. Now VB can use pointers, or vtable to call functions. VJ also works, but some languages still do not work, that is, the script language, typical examples include VBScript and JavaScript. The reason is that they do not support pointers. How can they use polymorphism and call these virtual functions even if pointers cannot be used. Alas, there is no way to ignore these scripting languages. Currently, all these scripting languages are used on the webpage, and distributed applications are also a major market for COM components, it has to be called by these script languages. Since the virtual function table method does not work, we can only find another method. As a hero, idispatch came into being. :-) The scheduling interface records every attribute of a function. When the client program calls these function attributes, it just sends these numbers to the idispatch interface, idispatch then calls the corresponding function based on these numbers. Of course, the actual process is far more complex than this. Simply giving a number will let others know how to call a function. Isn't it a great night? You have to let others know what parameters are required for the function to be called, what are the parameter types and what are returned? It is a headache to handle these problems in a unified way. The main function of the idispatch interface is invoke, which is called by the client program, and then invoke calls the corresponding function, if you look at the code that implements invoke in the MS class library, you will be amazed at the complexity of its implementation, because you have to consider various parameter types. Fortunately, we don't need to do it ourselves, and there may never be such an opportunity. :-)
(3) dispinterface, dual interface, and custom Interface
This section seems inappropriate here, because it is a term used in ATL programming. Here I want to talk about the advantages and disadvantages of automated interfaces. It may be better to use these three terms and they will be met sooner or later, I will explain them in a popular way. It may not be so accurate, as if I use pseudo code to describe algorithms. -:)
The so-called automatic interface is the interface implemented using idispatch. We have already explained the role of idispatch. The advantage of idispatch is that the scripting language, such as VBScript and JavaScript, can also use the COM component, thus basically achieving the disadvantages of being independent of the language, the first is slow speed and low efficiency. This is obvious. The function can be called at once through the virtual function table, and the procedure is transferred in the middle through invoke, in particular, it takes a lot of time to call a function by converting function parameters into a standard format. Therefore, we generally want to use vtable to call functions to achieve high efficiency. The second disadvantage is that you can only use the specified so-called automated data type. Without idispatch, the VC will automatically generate the corresponding scheduling code for us if we can use any data type. The automation interface won't work, because the invoke implementation code is written by VC in advance, and it cannot predict all the types we will use in advance, it can only write its processing code based on some common data types, and it also needs to consider the data type conversion problem between different languages. Therefore, the scheduling code generated by the VC automatic interface only applies to the data types it specifies. Of course, these data types are rich enough, but cannot meet the requirements of custom data structures. You can also write scheduling code to process your custom data structure, but this is not an easy task. Considering the shortcomings of idispatch (it also has a disadvantage, that is, the usage is troublesome, :-)), we generally recommend writing dual interface components, called dual interfaces, which are actually interfaces inherited from idispatch. We know that all interfaces must be inherited from iunknown, And the idispatch interface is no exception. The interface inherited from idispatch actually has two base classes: iunknown and idispatch. Therefore, it can call components in two ways, you can call the interface method through iunknown using a virtual function table, or through idispatch: invoke automatic scheduling. This provides great flexibility. This component can be used in both the C ++ environment and the scripting language, and meets the needs of various aspects.
In comparison, dispinterface is a pure automated interface, which can be simply regarded as an idispatch interface (although it is actually not ), this interface can only be called in an automated way. COM component events generally use this form of interface.
The custom interface is a class derived from the iunknown interface. Obviously, it can only call the interface in the form of a virtual function table.
(4) There are three COM components: In-process, local, and remote. In the latter two cases, the interface pointer and function parameters must be scheduled.
Com is a DLL, which has three running modes. It can be in the process, that is, it can be in the same process as the caller, or it can be in the same machine but in different processes as the caller, it can also be connected to the caller on two machines. There is a fundamental point to remember here, that is, the COM component is just a DLL, and it cannot run itself. A process must take care of it like a father, that is, the COM component must be in a process. who is responsible for watching? Let's talk about scheduling first. Scheduling is a complicated problem. I still cannot tell you about it with my knowledge. I just want to talk about some of the most basic concepts in general. We know that for Win32 programs, each process has a 4 GB virtual address space and each process has its own addressing, the addressing of the same data block in different processes may be different, so there is a problem of address conversion between processes. This is the scheduling problem. For local and remote processes, the DLL and the client program are in different addressing spaces. Therefore, to pass interface pointers to the client program, scheduling is required. Windows already provides ready-made scheduling functions, so we don't need to do this complex thing on our own. For Remote components, parameter passing of functions is another type of scheduling. DCOM is based on rpc. data transmission between networks must comply with the standard online data transmission protocol. Before data transmission, it must be packaged and delivered to the destination before unpacking, this process is scheduling. This process is very complicated, but Windows has done everything for us. Generally, we do not need to compile the scheduling DLL by ourselves.
We just said that a COM component must be in a process. For components in local mode, they are usually in the form of EXE, so they are already a process. For remote DLL, we must find a process that contains the scheduling code for basic scheduling. This process is called dllhost.exe. This is the default DLL proxy for com. In fact, in distributed applications, MTS should be used as the DLL proxy, because MTS has powerful functions and is a dedicated tool for managing distributed DLL components.
Scheduling seems to be very close and far away from us. We seldom pay attention to it during programming. This is also one of the advantages of COM, which is platform-independent, no matter whether you are remote, local, or in-process, programming is the same, and all the details are handled by com itself, so we do not need to go into this issue, you only need to have a concept. Of course, if you have special requirements for scheduling, you need to have a deep understanding of the entire scheduling process. Here we recommend a "COM + Technical insider", this is definitely a good book about scheduling.
(5) The core of the COM component is IDL.
We hope that the software will be assembled from blocks, but it is impossible to splice without any regulations. We always need to abide by certain standards, and how can we cooperate closely with each module, the interaction between them must be formulated in advance. This specification is an interface. We know that interfaces are actually pure virtual classes, which define many pure virtual functions and wait for a component to implement them, this interface is the key to combining two completely unrelated modules. Imagine if we are an application software vendor and we need to use a module in our software, we don't have time to develop it ourselves, so how can we find this module on the market? Maybe the module we need already has standards in the industry, and some people have already developed standard interfaces. Many Component tool manufacturers have already implemented this interface in their own components, the goal we are looking for is the components that have implemented interfaces. We don't care where the components come from and what other functions they have, we only care about whether it is a good implementation of the interface we have developed. This interface may be a standard in the industry or an internal agreement between you and several vendors, but in short it is a standard, it is the basis for the combination of your software and other modules, and the communication standard of COM components.
Com is language-independent. It can be written in any language or called on any language platform. So far, we have been talking about COM in the C ++ environment. How is its language independence reflected? In other words, how can we define interfaces in a language-independent manner? Previously we defined it directly in the form of pure virtual classes, but obviously it cannot be done. Who should recognize it in addition to C ++? For this reason, Microsoft decided to use IDL to define interfaces. To put it bluntly, IDL is actually a language that everyone knows. It is used to define interfaces, and it is known on any language platform. We can imagine the ideal standard component mode. We always start with IDL, first use IDL to formulate various interfaces, and then assign different persons to the tasks that implement interfaces, some may use VC for good purposes, and some may use VB for good purposes. It doesn't matter. As the project owner, I don't care about this. I only care about you giving me the final DLL. This is a good development mode. You can develop it in any language or appreciate your development results in any language.
(6) How the COM component runs.
In this part, we will construct a minimum framework structure for creating COM components, and then let's take a look at the internal processing process.
Iunknown * punk = NULL; Iobject * pobject = NULL; Coinitialize (null ); Cocreateinstance (clsid_object, clsctx_inproc_server, null, iid_iunknown, (void **) & punk ); Punk-> QueryInterface (iid_iojbect, (void **) & pobject ); Punk-> release (); Pobject-> func (); Pobject-> release (); Couninitialize (); |
This is a typical framework for creating COM components, but I am interested in cocreateinstance. Let's take a look at what it has done internally. The following is a pseudo code implemented internally:
Cocreateinstance (....) { ....... Iclassfactory * pclassfactory = NULL; Cogetclassobject (clsid_object, clsctx_inproc_server, null, iid_iclassfactory, (void **) & pclassfactory ); Pclassfactory-> createinstance (null, iid_iunknown, (void **) & punk ); Pclassfactory-> release (); ........ } |
In this section, the class factory object is obtained first, and then the class factory creates components to obtain the iunknown pointer. Take a further step and check the internal pseudo code of cogetclassobject:
Cogetclassobject (.....) { // Query the Registry clsid_object to find the location and file name of the component DLL. // Load the dll library // Use the getprocaddress (...) function to obtain the function pointer of the dllgetclassobject function in the dll library. // Call dllgetclassobject } What is dllgetclassobject? It is used to obtain the class factory object. Only the class factory can be used to create components. The following is the pseudo code of dllgetclassobject: Dllgetclassobject (...) { ...... Cfactory * pfactory = new cfactory; // class factory object Pfactory-> QueryInterface (iid_iclassfactory, (void **) & pclassfactory ); // Query the iclassfactory pointer Pfactory-> release (); ...... } The cogetclassobject process has reached this point. Now return cocreateinstance and check the pseudo code of createinstance: Cfactory: createinstance (.....) { ........... Cobject * pobject = new cobject; // Component Object Pobject-> QueryInterface (iid_iunknown, (void **) & punk ); Pobject-> release (); ........... } |
Is an example of the copy from the COM + technology insider, you can clearly see the entire process of cocreateinstance.
(7) four functions required by a typical self-registered com DLL
Dllgetclassobject: used to obtain the class factory pointer
Dllregisterserver: registers necessary information to the Registry.
Dllunregisterserver: uninstall Registration Information
Dllcanunloadnow: This function is called when the system is idle to determine whether the DLL can be detached.
DLL also has a function named dllmain, which does not need to be implemented in COM, but is automatically included in the components generated by VC, it is mainly used to obtain a global instance object.
(8) Important Role of registry in COM
First, you must know the concept of guid. All classes, interfaces, and type libraries in COM are uniquely identified by guid, which is a 128-Bit String, the GUID generated based on the special algorithm can ensure that it is the only guid in the world. COM component creation and Query Interfaces are all performed through the registry. With the registry, the application does not need to know the DLL file name and location of the component. You only need to query the component by clsid. When the version is upgraded, you only need to modify the registry information to transfer it to the new version of DLL.
This article is a newly-developed tu duck project. It is not very comprehensive, and many useful experiences have not been written. If you are interested in writing this article in the future, I will try again. I hope this article will be of some practical use to everyone, so my hard work in the evening will not be wasted. -:)
Trackback: http://tb.donews.net/TrackBack.aspx? Postid = 270979