Dream code-a programmer's self-white (4)

Source: Internet
Author: User

This document is not reposted

Dream code-a programmer's self-white (4)


Over the past weekend, I also responded to the solution to the problem of storing strings during runtime. O is very dissatisfied with my solution, and thinks that my solution is something he has considered before. I have not solved all the problems he is worried about. It should be said that this cooperation is unpleasant, and I am also very responsible for communication. In the past week, apart from asking him questions, he answered questions and there was no real two-way communication. One of his concerns is memory fragmentation, which I did not care about at all. He believes that it is necessary for a data object to be stored in the continuous memory, and in my opinion it is completely unnecessary. I underestimated his value for this. In my opinion, this layer is close to the IO layer. Compared with the IO time, the memory operation overhead is nothing. In addition, you can use a custom Memory Manager, such as Allocator, to control the memory allocation problem. In other words, the program does not directly care about where the memory comes from. Instead, it gives Allocator the opportunity to make full use of various operating system mechanisms for optimization. In the upper layer, ease of use is also more important than a little performance loss. In addition, the simpler object management implementation during runtime, the better performance in the long run. Sadly, I mistakenly understood the main problems in the other party's mind.


At that time, my main focus was on three issues. First, this runtime cannot depend on a physical IO Interface. The IO operation set must be abstracted first. We can neither output Io to a file nor output it into XML. A conditional reflection method is the adaptor mode. Furthermore, the compose mode is supported in adaptor. Then the backbone of IO will be completed. We can easily expand the adaptor and write a dedicated adaptor for the input/output to the local file system, ZIP file, HTTP, and FTP. This is a job that just graduated programmers can do. Using the compose mode, we can easily implement encryption and signature, and support data in text, XML, and binary formats. More importantly, the advantage of doing so is that the work is fully broken down, and each job has lower requirements on personnel levels, so that more people can work in parallel. Easy to test and quality assurance. Another reason for doing so is that ADP is only a library, and end users (products) have various reasons to adjust and expand. Therefore, ADP cannot be associated with specific Io at all.
The adaptor is bound to death, but multiple small components must be provided for users to choose and use after assembly. Of course, ADP can provide some preset assembly for easy use, but the final decision must be handed over to the product. That is to say, scalability, configurability (assembly), and simplicity must all be taken care. Later facts prove that such foresight is necessary. Otherwise, it would be a nightmare.


Of course, the design risk of Io adaptor is also increased. However, these risks are actually negligible. For C ++, possible abstract punishments are almost nonexistent-in fact, as long as redundant data replication is avoided in design and implementation. After all, we are dealing with a low-speed IO system. Of course, to be true, the adaptor mode is obviously not suitable for large-scale and severe Io scenarios, such as processing thousands of files and network connections at the same time. If so, you need to be careful. Obviously, ADP is not such a project.


My second concern is how the user code can access data in such a memory? O provides very primitive and insecure interfaces. For example, to read an int data, you need to prepare an int variable and call the interface to copy the data back to your int variable. This memory replication operation is very inefficient and insecure, because the user must also tell the interface the number of bytes of the int variable. The interface is actually equivalent to memcpy, but only knows where to start copying to the user. Therefore, I think we need to provide users with an efficient and secure data access interface. This is not hard to achieve. If you want to access an int, the API returns the int reference of that data to the user. The same is true for string. Of course, this requires that a String object can be put in the ADP object management. This is why the string type is used in my solution.


The third is the true core issue. At least I think so. O designed data types, except the basic type (fundamental type) in C ++, there are only unimplemented string types. There are several other fixed-length data types, such as the two-dimensional vector vecotor2 and the 2 × 2 matrix e2x2matrix. This leads to a serious problem. The data type is not extensible, And the runtime implementation cannot support data types with longer content, such as arrays. This is one reason why o's colleagues think that strings are very difficult. Arrays cannot be supported. However, it is more difficult to support arrays than to support strings. The difficulty with arrays is that, to truly support, the element type must be allowed to be variable. This means that the array itself is not a specific type. An array with the element type specified is a specific type. This also means that a simple type combination mechanism is required. On the other hand, users actually use many data types, such as point and vector. Just a point, there is point2f, point2d, point3f, point3d, or even point4d. The extra D is time. It is also called light. Different Light data members can be completely different in different scenarios of different products and products. If ADP cannot provide a mechanism to define the same data type as the data type in the product, the product will have to use the basic data type to make it difficult. This will, in fact, make it impossible for the product to directly use ADP's runtime data for modeling, instead of using ADP as a destination or source for data serialization, translate the data into their existing data model again. This reasoning result is inevitable. In fact, it is also true. It is impossible for any rational programmer to directly construct the data model on the ADP runtime.


In this way, the attempt to unify the company's data model object will inevitably go bankrupt as a major objective of ADP. However, it does not mean that ADP can successfully combine custom types and types. The efficiency, space efficiency, and time efficiency of this mechanism must be very high. It must be the same as that of C ++. At least the difference is not big. In addition, you must have robust and easy-to-use APIs to accept the product. You need to know that all products make money for the company and are very strong. Fortunately, these requirements can be achieved. Unfortunately, neither has ADP.


I later saw this type of extension and Combination Mechanism in the other two library projects. They all adopt a dry approach, for example, using hard coding to support int, double, float, and string array. You are not allowed to expand on your own or combine new types. The only way is to modify the library code to support the new type. Because it is inconvenient to modify the library, you have to modify the product.


I don't know the three questions afterwards, and I guess o doesn't think that much at the same time. Therefore, although O has a lot of opinions on my solution, I am not particularly concerned about it, but I think it will make him clear. Later facts show that I am naive again.


In the past, the efficiency of the string type in profile was mentioned in the DwF project, and some serious implementation problems were pointed out, as well as some interface design problems. As a result, the performance results are valued, and the interface problems are completely irrelevant. This makes me doubt the level and taste of American colleagues in this regard-the so-called kind of taste of Linus. Because of the performance of the DwF string, we should avoid these problems and implement a higher quality string? I also wrote a letter to G, saying that I plan to implement a string that is more suitable for ADP, because it is not completed in the complement O solution. G's answer surprised me first, and then it went messy. G colleagues decided that we should not use the string data type. Because the string type has low performance, we should use char *. Oh, because localization is supported, we should use wchar_t * in a unified manner *! I cannot understand the logic between the reason and the decision. What kind of spirit is this? As for how I designed the string, G is naturally not interested.


This made me very skeptical about myself, because it was not the first time. Why do I always get a worse result if I want to do better? Where did I go wrong? How is this right? Is it true that I am not suitable for teamwork in a subordinate position? A series of unexpected results in a short period of time, from ut, DBC, to this string, made me doubt my ability to communicate with others for the first time. I have lowered my requirements to not expect to communicate with other people, but the reality tells me that I have failed to communicate with other people. I don't know why. I began to feel a strong sense of self-confidence and doubt about my abilities.


This string story is not complete yet. Wchar_t * is obviously difficult to use, but it can be stuck in ADP for a year or two. In the end, we can't stand the performance problem. In fact, it is an inevitable consequence. To overscan and copy strings, We need to count them by reference. It seems that M colleagues in the United States wrote something called stringpointer. You can see the name, and there is no decent API at all. In fact, this is also true. Stringpointer has been using ADP for a long time. In order to integrate ADP into another project called protein, a lot of string types are produced. This time, the product is even more unbearable, but its life is destined to be surprisingly long. Leave it for future discussion.


Continue with the third question. Combination of types and Custom User types are possible, because C ++ is a ready-made example of the language used by our project, while JSON is another example. The C ++ can define any type of array, which is more explicit in STD: vector. You can change the template parameters, of course, it still requires a little skill to do it at runtime. The struct of C is a model of custom data types. If ADP can do these two things, it will be able to use the same type modeling capabilities as C-of course, it is still a little different from the abstract data type (ADT.


After that two or three months, I spent a weekend working on a prototype and tried to determine whether the idea was feasible. For basic data types, we can know that we only need to care about the object size and align the attributes. I use a table to describe the composite data type to be customized, as struct did. In this way, I can calculate the size of each member and the offset in the object. Finally, the composite data type has a certain size and alignment requirement. This is the same as the features that basic attributes need to care about. In other words, they can be processed like basic types. Complex basic types, such as string, also need to care about the structure and structure, which is required for interaction with C ++ objects. Other languages can ignore this issue. Composite Structures and analysis structures can be automatically generated: traverse members in the appropriate order for structure and structure.


The support for arrays is slightly more complex. I need to implement a special Vector class. The element type is not a template parameter, but a descriptive data that is told to the vector during construction, that is, type. Vector internally calculates the offset of a given element based on the size and alignment information provided by type. You can also use the constructor method given by type to initialize elements.


In this way, the customized new data type can be exactly the same as the memory layout described by struct in C ++ (of course you cannot put any virtual stuff in C ++), and the memory layout is consistent, this means that users can cast them as their favorite Data Objects and operate them directly. Of course, direct cast is dangerous and efficient. We can also provide a set of Cast mechanisms with check as a regular method of use.


But I made a small mistake when I was doing this. For the sake of simplicity, the descriptive table of the composite data type in that prototype is a data structure rather than a text, which can avoid writing a parser. However, there may be errors and contradictions in the description of the data structure. It is easy to detect errors. But how can I report errors and locate them as accurately as possible? I don't want to, nor should I throw a failed result when parsing a wrong text in the future, without telling the user where the error is. If a user is not a webmaster of a small website, I cannot do the truth department either? Oh, sorry, traslation team, I am not really shooting you, you will not tell the user anything wrong, but will destroy the dead.

 

I confused the essential differences between the input data structure and text. So that it took a lot of time for me to implement a parser that is exception-safe and capable of accurately reporting errors. Either it's ugly or it's hard to reach all goals. Finally, I realized my understanding of the errors. This is probably the price that a person must pay to write code. No one will point out what you are obviously stupid. This stupid line will be further detailed in the future. Of course, the solution is complete. That is a positive lesson: Never give up or compromise easily.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.