On software engineering
Yesterday, a classmate asked me about software engineering experience? At that time, do not know how to answer, feel the words to the mouth but do not know how to express, because this topic is more solemn, do not know how to use a short description of the language to describe, it may be said that the day can not be finished, may be their limited ability to express, Chinese did not learn (college entrance examination language less than 80 points "This is how I answer." Then, in my mind, I always recalled my development experience and wanted to find some information from it. The first time I heard that software engineering was a junior, a course called software engineering, I vaguely remember several key words: high clustering. Low coupling, scalability, viability, maintainability, and some software development processes. The following will be a combination of these and my experience to talk about the software engineering in my eyes, but also a summary of myself.
Gao Clustering-Low coupling
This word has never been forgotten since I was in a software engineering class and has been affecting the design and development process. What do you mean, Gao clustering? A word can be summarized: Birds of a feather, flock together. What do you mean? is to pull things out of the same attribute, thus simplifying the control of this kind of things to facilitate management. What does that mean, low coupling? The above said that the same attributes of the things out to reach the group, then the low coupling is to reduce the relationship between the various groups, so that the structure is clear, flexible management. The above is the interpretation of the literal meaning, completely did not pull a bit about the software aspect of the thing, then these two words in the software design is how to embody it?
This can be divided into macro and micro, macro above, a word: modular, hierarchical. Modularity and hierarchy are higher clustering and low-coupling in the software design macro above embodiment, the specific can be divided into vertical business division, and horizontal Architecture division (this is also known as the two-dimensional architecture, before also heard the Cai teacher said the four-dimensional architecture this is more abstract). The same business is pumped into a module, the entire business line is divided into multiple levels, the upper layer can only rely on the lower layer, the lower layer can not rely on the upper layer of the graph structure. So that the entire software in a structured hierarchy, business lines clear. Here's an example: for example, I'm going to do an e-commerce site, the first thing to do is not to explore whether the technology is difficult, but to explore the entire business direction of the site, so as to determine the entire business model, the inside of modeling, analysis of which domain models contained within. Not to talk about the complex e-commerce, just say a simple customer-to-consumer site, first of all its business direction is a personal buyer trading platform, a user can play as a buyer or a seller, to determine this, we then model the inside, which can be divided into modules. The above site can be easily divided into commodity modules, order modules, user modules and so on, then this module is in the software architecture of the Gao Clustering , then how to express the low coupling? After the module is divided, it is necessary to determine the location of the module within the entire system, so that the dependencies between them can be determined to determine the level at which they are located. The following diagram describes the hierarchical relationship of the above three modules:
The user module depends on the order and commodity module, the order module depends on the commodity module, then can be determined that the commodity module is in the entire system is in the bottom module, and the user belongs to the upper module, then the order module can not be reversed to rely on the user module, commodity modules can not reverse dependence on orders and user modules, This is the layering of architecture design, the benefit is that the structure of the system is clear, and the hierarchical degradation of functionality can simplify the complexity of the business, and can achieve local optimization does not affect the impact on other modules, as long as the current externally exposed interface definition is unchanged. So what does each module relate to each other after layering? Through the underlying module to the last module to expose its internal interface, such as commodity exposure query product interface, then the user can query the user on the goods, the order contains what goods, and orders to expose its interface to the outside, then the user can query a user's own orders. What is the mode of communication between modules? If the project is in the early stages of these modules are placed in a project, this can directly refer to the interface of each module, if the project development to the end, need to separate the modules, separate deployment, then need a remote communication scheme, so there are many mature programs, such as more general WebService, Compare High performance RPC (this piece has a lot of tools, such as the recently heard across multiple languages of the ice (Internet communications engine), such as Ali Dubbo and Hsf,facebook Thrift), this is the software engineering low-coupling .
Above is the software design on the macro-high clustering. Low-coupling, then refinement to the specific code implementation above, this is in the implementation of some logic need to take this thinking to look at your code, this is the micro-high clustering. Low coupling. In the development process we have always advocated to improve the reusability of code, which is high clustering, but also to keep the structure of the code clear, which is low coupling. I write code in the process of one 洁癖
, that is, if you write a business logic, similar logic appears more than 3 times, then I have the same obsessive-compulsive disorder, pull out of this piece, through external exposure variables, to achieve the specific business needs. In Java to provide interface and abstract functions, and appropriate use of abstraction is also a kind of the embodiment of the GAO clustering, we generally put some common, template business logic in the abstract class, so that the complexity of its subclasses to avoid its sub-class, only need to focus on its own needs to reflect the business, and some general abstract parent class provides the implementation, This is a kind of Gao clustering in terms of the reusability of code. In the details of other places also embodies the characteristics of the GAO clustering, for example, we always like to put a certain kind of operation in a class or a package inside, such as the operation of the file, we can put in one FileUtil
, this is to manage the operation of the whole project to the file, not the East one, the West one, If one day the business requires us to manipulate the file when adding something, this time if all the operations are placed FileUtil
inside, in the face of such changes, we are so handy, otherwise you go to a file inside to find the operation of the file! This is also a high-clustering embodiment. So how does low coupling show? For example, we write code process often take the MVC model, the data presentation layer, the business layer, and the data storage layer separate, this is a low-coupling embodiment, each level of division of the clear, such as I optimize the data storage layer, then the upper layer of the business layers of basic no perception, this phenomenon shows that you are very low coupling, Not everything is a mixture of things.
Scalability
If the software development design has been adhering to the above high clustering. Low coupling , the scalability of a software is theoretically no problem. The scalability here is only in the software life cycle can adapt and meet the needs of business development, in the original architecture design with new business can achieve a smooth transition, or you can implement plug-in to embed a business directly into the current architecture, if a system truly plug-in, Then its extensibility is certainly pretty good. Or take the above-mentioned e-commerce website for example, before just to meet a simple customer-to-consumer trading platform, possibly with the development of the business, in order to attract more customers, need to launch snapping business. How to better embed this snapping business into the previous system, it depends on the architect's design level, if in order to meet this snapping business, the entire project is reconstructed, then the structure is very unreasonable, If you can make a small adjustment to the previous architecture can put this snapping up the business on-line to explain the previous architecture design is very reasonable. Of course, this process in addition to the analysis of snapping up business processes, but also to analyze the current system architecture, through the business optimization and analysis of the key elements of the architecture to achieve the best way to achieve. For the above-mentioned module dependencies, can be roughly divided into two ways to achieve this snapping business.
First, if the current website business volume is large, if snapped up, can stimulate user participation, may need to suggest a snap-in module, so that can be snapped when the hot or may be snapping up the business because the traffic is too large, causing its business overall paralysis, then the normal normal business can still run, Because you have the whole snapping up business independently, it will simply invoke the user module, commodity module and Order module interface to execute the relevant business logic, but also can be individually optimized for this module, such as the analysis snapped module is a short duration, the data is concentrated, the modification of a data more centralized operation , you can put the overall operation of the data in the cache (which is the way most e-commerce is used to snapping up the business), so you can reduce the pressure on database I/O and increase the system throughput of the snapping module. This implementation can reduce the coupling degree of the system, but also make the snapping up business and the overall system scalability above, but for the system maintenance of a work and increase in the development workload, because a new module was created after all to start again.
Second, if the current site traffic is not very large, just want to buy to get more users, then the snapping business can be embedded in the original architecture module, without the need to create a new module to achieve the purpose. In the project I have been involved in the implementation of this approach, because I was responsible for an e-commerce website traffic is not very large, in order to provide user participation and access to more users, need to buy business online, but do not want to increase the complexity of the system, so in the commodity module on the product added a snapping property, and set the buying time interval of the product, so it can simply embed the snapping business into the whole system. Of course this premise is that your business volume is not very large, or on-line a snapping business may be paralyzed your entire system, then you will be snapped up independent, that is countless hours of overtime night.
Feasibility
This is to reflect the value of all the work done above, if your architecture is good, your architecture is fully high clustering. Low coupling, but also has the scalability, but is completely whimsical, not feasible, this can only say that this is a YY architecture. I think the feasibility can be divided into: in satisfying the above high clustering. Under the condition of low coupling and expansibility, the business can be run on this architecture, and the whole system is an efficient execution, and the whole system is easily scaled to support the subsequent business development in the deployment. Then the three points will be combined to say the feasibility.
Business run up
This should be the most basic requirement of a system design, because this is the original intention of the design. All architectural design should not be out of business to the whimsical, appropriate around the business, stand in the business of the point of view to design the current system, of course, there is a need for designers to have in-depth understanding of the business, and the understanding of the business to reach agreement, so the design of software finally through UAT. This is like writing text, to stick to the title, or run the same as the train, off-topic ran to 108,000, such a composition, you put Li Bai, du Fu brothers moved up also useless.
Efficient
This can also be seen through macroscopic and microscopic, macroscopic aspects of the communication protocol between the modules, such as WebService although can span multiple languages, but its performance is really not flattering, perhaps RPC more efficient point. It can also improve the execution efficiency of the whole system by combing and simplifying the business process, it is like seeking a shortest path, reducing the business process to improve the efficiency of the system execution more efficient than any method, because it optimizes not a communication to take what protocol, a segment implementation of what algorithm, but the entire line of business abstraction. Then the micro-optimization involves the implementation of the code, such as the implementation of a business logic, avoid unnecessary loops, the use of more efficient algorithms, reduce the definition of a variable, reduce the use of memory, etc. these are in the specific details of the system more efficient implementation.
Horizontal scaling of deployment
To say this is about the deployment of distributed clusters, and when it comes to this maybe really talk about a long day to say. Around this, every detail in the software development process has a lot of content. Here I just briefly describe, a system to achieve the requirements of cluster deployment, we must achieve the system of stateless , what is called stateless? It is not possible to rely on a single service entity state to support the execution of a business, such as by caching a user's information to server memory, such as when you place the task in local memory, which is a barrier to cluster deployment.
To give a simple example: in the standard Java EE or standard Web development, each user's access will have a Session
concept, if you Session
rely on a single service entity, then the system to cluster deployment, this user information synchronization is a problem, because you do cluster deployment, The user's request specifically falls to which server, completely has the front-end load balancer (Nginx,apache or F5) decides, if the user login is executes on the a server, you store the user login state in a server's session, Then the next time the user's request may fall on the B server, because you put the session on a server, and B server does not have user login information, this time the user may not be able to complete a request, which is not understandable to the user, because he asked for your system, He did not know that you specifically landed his request on whichever server you were on. In response to this problem, the industry has a lot of solutions, with the backing of the container session synchronization, there is a third-party cache service to achieve session sharing or CAS single sign-on scheme. Personal comparison bias through the third-party cache service Implementation session sharing scheme, because through the container to achieve session synchronization, then the system across the platform to add a restriction, can only be run under the specific container. Here is a simple description of pancreatic cancer session sharing how to do, session sharing is to store the user's login information is not stored on the specific server, but stored in all servers can access the third-party cache (Redis or memcache, Of course the database can only be slightly unacceptable performance, so rely on all the servers from the third-party cache to get the user's login information, so that the user's login request regardless of the server, all the server can be aware of, so no matter where your subsequent requests fall on which server, The server has access to your login state. The user login state is the first problem that needs to be solved in the system cluster deployment.
The above describes the feasibility of the software from three points, but a system can be said that a large system feasibility is far from the three points to analyze and optimize. There are different ways to optimize for different places:
- For faster user access, CDN and reverse proxies can be used to place our data closer to the user
- For page rendering speed, you can optimize by JS,CSS and HTML page compression or by increasing the number of concurrent pages loaded
- In order to improve the throughput of the system, we can take the asynchronous MQ mode to perform the operation in some parts.
- To reduce the pressure on the database, you can optimize the whole data query by the way of level two cache (local cache and distributed cache)
- To improve the system's processing ability can be processed concurrently, thus improving the efficiency of the system execution
- For increased usage per server, load Balancing can be used to make server resources fully utilized
- In order to improve the system's disaster tolerance, the system needs a certain amount of redundancy, a system needs to have a backup environment, there are many things to say here, the simplest is the cache preheating, or may be cut to a new environment, because the cache does not have data, resulting in huge pressure all fell to the database, will cause the entire database of pressure instantaneous.
- In order to improve the performance of the database, we need to deploy the distribution number of the database, and the optimization of the database-side from one master to the other or more from the main.
The discussion on the feasibility of the system above is just a summary of what I know, but not all of which I have practiced. It's just a good idea to know that you can practice all of this as long as big companies have the opportunity (like bat, for example, there are plenty of opportunities, but not everyone can get involved).
Maintainability
Most of the discussion above is the problem of software architecture and development design process, and the development of a system can not be a person or a team to develop and maintain, so that a system can be in the hands of different people to operate better, which requires from the early stage of the system of development specifications, development process and development documentation collation and collection. So that the entire software in the subsequent handover process can be smooth transition, without the need for excessive "into the state" transition period. The maintainability of a system is small to a general development, and then to the development supervisor, big to the chief architecture needs to pay attention to the work process, only to pay attention to this matter, the quality of the whole system can withstand the test of time. This is the maintainability I understand.
Software development process
In fact, this is the most embodiment of the idea of software engineering, most of which is the design architecture. A software development process typically follows several processes:
- Product determination business Demand output PRD file
- Product, development, testing and interactive requirements review
- Then develop, test, and interact with the output design documentation, test cases, and UX
- Next is the review of the above three design review, use case review, UX interaction review
- After that, then the development phase is followed.
- Completed body test, enter system smoke test
- system function test
- System regression Test
- System UAT Test
- System deployment on-line
Some companies in the 10th Step into two parts, divided into pre-release and online, which is to improve the overall quality of the system on-line, the pre-release environment is the data and the environment and the same as the online, but the system is the latest branch of the code to run, pre-release will be pre-release verification after deployment, If the validation does not pass then call back to the development fix and go to 7thagain.
Some people think that software engineering is actually the process of things, if you think that understanding is a little too limited to 工程
this concept. In fact, the purpose of software engineering is to make the process and the development of the development of the system as the current hardware development, by providing a unified interface, only need to put their favorite configuration together can be to meet their needs of hardware equipment. So in order to make software development such a model, it will inevitably involve the development of every detail, as well as the overall process of development, not only the process of things, but the system architecture pattern, code implementation style, and system on-line deployment maintenance, so software engineering is a relatively large topic, may be different from the accumulation of work experience, or different positions have different understanding. Maybe today I understand the software engineering is like this, after a year or so, may think I now know is wrong. I now understand software engineering is that the development process is designed to all things are part of the software engineering, the development process of all the output is the purpose of software engineering.
Finally, I would like to give an example, is also a few days ago I think the netizen wanted me to consult the question, he now has a system needs to be reformed, his system main thing is to accept the third party notice, and then notify the result to another third party system. A simple system topology diagram is outlined below:
The entire communication process informs the result of the processing, and then informs the result that the task is 第三方系统A
转发系统
转发系统
第三方系统B
placed in a local queue, and then a task that the thread specializes in handling the queue informs the result 第三方系统B
. What flaws exist in this architecture? Let me cite the following:
- There is a single point of over-barrier, if the traffic increases the single station
转发系统
may not handle the entire business.
- There is a barrier to cluster deployment,
转发系统
is to put the task in the local memory, if the cluster deployment of pressure, then the task in the native other servers can not be perceived at all, resulting in the inability to share the pressure
- No good module division, will accept system A and forwarding system B code all together, resulting in the face of the system to expand difficult
- There is a data loss scenario, if the system on-line new version, service redeployment, then the cache in memory of the task if the squeeze a lot of temporary processing is not complete, then the restart will inevitably lead to the memory of the task data loss, resulting in no further notification
第三方系统B
In view of the above questions I proposed the following solution, first look at the improved topology diagram:
Split the original system into two modules, one is 接受处理结果模块
and 通知处理结果模块
, and then the two modules communicate through asynchronous MQ, now the execution process is: 第三方系统A
send the processing results to the forwarding system's front-end Nginx,nginx the request by load balancing to the backend specific server, A server is processed by the request, the results will accept and execute the relevant business, and then put the notification task into MQ
the message queue, this time can immediately respond to the results 第三方系统A
, this time to 第三方系统A
notify the processing results are processed, and finally, 通知处理结果模块
because of the MQ
monitoring Message queue, then if there is a task message coming in, it will be received, and the result will be sent 第三方系统B
. At this point, the throughput of the processing structure module increases and the processing power increases, because the task is no longer cached locally, but MQ
is placed in, then it is stateless, can be arbitrarily horizontally extended, so as to cope with high concurrency of business processing. and the notification 处理结果模块
because it is listening to MQ
the message queue, the business pressure will not bring it pressure, it is still in accordance with its rhythm in the processing of tasks, if you want to speed up a bit faster, you can deploy more than one machine in the cluster to solve, so that the overall system of horizontal expansion has been greatly improved , it also makes the processing efficiency of the whole system and the system structure be optimized.
Some might say that I split a simple system into so many modules, complicate the whole project, need to maintain and manage so many modules, and add an MQ maintenance. Of course, the premise of this split is your business volume, or prepare for the future, if your current system to deal with less than 10,000 requests a day, I also suggest there is no need to do so. But if it is a sustainable project, a good plan in the early stages can be a good one to avoid the massive refactoring behind it. The above scenario is not very suitable for systems with high real-time requirements, because MQ is asynchronous, so there is a delay, so for a business that allows a certain delay, MQ can be used to improve the response speed of the entire business to achieve the throughput of the business process, of course, the latency referred to here is not a delay of several hours, This delay generally does not return more than a minute.
In our real life this is a lot of patterns here, such as we brush the bank card, and then our mobile phone will immediately receive a bank SMS notice we have a consumption, generally from your card to receive text messages will not be too long will receive text messages, the process is always related to asynchronous processing. The process here is generally like this: generally your card processing system is UnionPay (today's China offline clearance boss, online you should know), UnionPay notice the corresponding bank of a card to be deducted (inside the specific operation I am also not clear, the general practice is UnionPay notice to the bank to transfer money to an account of UnionPay, Then the merchant and UnionPay settlement), the bank will debit the results of the notice to UnionPay, this time the bank will also be stored in a similar message center, and there is a special system to deal with these messages to notify users, UnionPay in the return to the POS machine, the last POS machine ticket, this time the cell phone will also receive a bank message. In this process, the bank generates text messages not immediately to send, but into the message center (this is like more than one MQ), let the other system asynchronous to do, why do this? Because sending text messages is not the main line of business content, if the debit and send text messages in sync to do, it will cause the entire business response is slow, may also be due to the failure of the message sent to the entire business, because send text message who can guarantee 100% success, and can be full of rapid response? Therefore, it is a common means for enterprise-level system development to increase the success rate and processing efficiency of the whole business by asynchronous execution of the business attached to the main business branch.
The above is I know the software engineering, do not know what the last expression of ideas, just summed up to now I have a knowledge of software development. The above is also my subjective understanding, does not mean reference meaning, but also hope not to mislead everyone.
On software engineering