This article was originally a small document used within our development team. Since our company had not previously done the SaaS experience, we set up a team to do a technical pre-exploration in this area and I am one of the members. This document wants to comb through the technical points that are needed to develop a SaaS application from a macro level, and to guide the technical aspects behind it. The reason is here, because of their own research and development experience is too scarce, there may be some technical blind spots are not able to take into account, I hope the garden of the many of you Daniel guidance.
I. Spotlight on "Three head Monsters"
In MS's official documentation, there are 3 major challenges to building a sufficiently mature SaaS (MS simply lists 4 levels of maturity for SaaS applications): Configurable, extensible, and multi-user storage structure design called "Three Headed Monster". Of the two SaaS demos offered by MS (Litwarehr and crab, respectively), these three issues are also addressed. Therefore, we should also withhold the three main lines when we explore the technology before.
1. Concept
SaaS is a typical "single-instance multi-application" software type, which requires the instance itself in different application environment has a strong configuration capability. Specifically, the following 4 areas need to be provisioned:
(1) The appearance of the program.
(2) Workflow and business rules.
(3) Data model.
(4) Access rights for users and end users.
This custom capability should be weighed against two aspects of ease of configuration and configuration, and the best result is of course the most complex customization by means of the most convenient configuration, but this is not easy to do, So it is possible that we will need to provide some more advanced users with two development capabilities based on existing systems (using scripts, etc.). Let's talk a little bit about the various aspects of the configurable nature.
2. The implementation basis of the configuration: metadata (MetaData)
system through the configuration data to show different appearance and behavior, then for the whole system, the configuration data is the metadata in the system. The author does not specifically study meta-data, only to use some XML files or specific tables in the database to save some configuration information, but for a mature SaaS application, this simple configuration may not be enough. How to design a flexible, powerful and compatible meta-data structure, how to provide the most efficient metadata service (Metadata services) in code, and how to define metadata metadata (Metadata of Metadata), How to ensure that the metadata structure changes do not affect the operation of the program, all this requires us in the meta-data itself semantic attributes and possible technical support has been investigated before it can be concluded. In addition, I wrote this article, very accidentally found on the internet a few pictures on the metadata service layer design, it is interesting:
above the upper part of the graph gives a conceptual model of the SaaS Reference Architecture, the lower part gives the corresponding to each part of the architecture, Existing possible technical means. We note that metadata services and metadata data structure definitions are the only blind zone inside.
The following figure is a possible solution to the metadata services provided by Matias Woloski for each of the main configuration modules (interfaces, business rules, multiuser data structures, access controls), and may be a clue to our further depth.
3. Requirements base for configuration: Business commonality extraction and business personality analysis
Why is all tenant sharing a set of code? Because all of these users have a common need for software requirements, there are many common areas of business processes, and all of their applications are carried out within a more established business domain. We need to sort out the business needs in this business area and then provide our software to serve this business area. Since they all belong to a business area, then we provide a set of software that can run, why should we provide such a strong configuration capability? Because they belong to a business area of the customer, they are in the details of the business rules in fact there are many different, the business flow is exactly the same two enterprises do not exist. In addition, every business is constantly evolving, business structure is constantly adjusting, reorganization, upgrade process, our software should be within a certain range of flexibility to support this upgrade of enterprises.
Therefore, it is the first and hardest step to implement SaaS software to study the characteristics of the business areas that our SaaS software solves, and to find out the difference between the enterprise business that is currently in the horizontal range of this field and the requirements for business upgrade in the foreseeable future. However, this step itself and the technology may not be so large, the most involved can also include UML, including many domain modeling techniques. This part should be part of a much larger portion of the SaaS research that combines business and technology with a larger share of the business.
4. Configure main content 1: Configuration of the program appearance
The appearance of the configuration actually includes a lot of content, but the core content of the author believes that the interface elements of the multi-granularity modularity, only to achieve this, it is possible to allow users at multiple levels of granularity to define what they want to display, with what kind of interface style to display. This point has always been the author's opinion. NET has advantages, because MS's products have been very well-behaved on the component. Only in the asp.net2.0, there has been custom Control,user control,web parts,theme, Skin, masterpage and other mature interface technology can be used; Net3.0 WPF (now renamed Silverlight), I believe there are many excellent modular interface technology, the author has not studied. These all need us to learn and discover.
In addition, if we continue to adopt a browser-based software system, the Web program "experience localization" is essential to work, in this respect, Ajax technology has become more and more mature, MS's AJAX 1.0 official version has been released.
When we learn these interface technologies, we not only learn how to use them, but also understand their underlying compilation models and running models so that we can fully grasp how each Display module object instance can perform as many interface displays and interactive responses as possible with minimal running costs.
5. Configure main content 2: Configuration of business processes
The so-called business process, I do not know whether the author's understanding is correct, in fact, should belong to the classic workflow (workflow) problem. So while this aspect is very important and complex, the focus of the study is the most prominent, which is nothing more than two of them:
(1) Research on the methodology of workflow. I have not done a project on workflow development, in this regard, but I think that in the use of workflow-related tools and libraries to develop, should be in the methodology of the workflow has a relatively degree of certainty.
(2) Research on a variety of existing workflow tools. Workflow is the focus of enterprise development, related products should be many, I know the main tools are as follows:
A.WF (Microsoft Workflow Foundation), this is. NET3.0 is one of the three new classes of libraries that are specifically designed to support workflow technologies.
B.bpel (Business Process Execution Language), which is led by IBM, is a standard that uses XML to describe the behavior of the process, and is now widely supported and applied in the industry. Of course, BPEL is not exactly the same as workflow, it does not clearly define people, roles, work items, etc., and it focuses on describing a Web service-based business process.
6. Ease of use of configuration tools
A widely accepted view is that ease of use can be a key to the success of a SaaS software. As a user of SaaS applications, it's important to know that it's easy for every one of his employees to use the system, which is one of the main costs of deploying SaaS applications in the enterprise. In the operation of the whole system, the custom configuration of appearance and business flow is likely to become the most complex part, and the ease of operation will directly affect the usability of the whole software.
To enhance the ease of use of configuration tools, we need to learn from a mature SaaS application like Salesforce.
Three. Extensibility and related technologies
1. The concept of
application scalability includes two requirements: (1) Efficient utilization of application resources to maximize parallelism. (2) When the original server resources do not meet the increasing number of users, it is possible to increase the concurrency of the entire system by scaling up (improving hardware performance) or scaling out (increasing the number of hardware). Scalability is not just a problem for SaaS, all large Web applications based on Internate will face such a pick-up, so there is a lot of relevant information on the network, and most of the topics on architecture design for large-scale enterprise applications will involve the discussion of extensibility issues. The author has not developed a large-scale application system, more detailed knowledge points need to further study and understanding to give, the following temporary can only briefly list some of the author's previous contact with the content as a possible technical research points as follows.
2. Extensibility Support Technology 1: Stateless mode of application
We need to look at how to design the application so that it runs in stateless mode . That is, all the required user and session data is stored on the client or distributed storage device, which can be accessed by any application instance. Stateless means that each transaction can be done by any instance, and the user can transact with many different instances in a single session, but the user itself is unaware. For example, when we do ASP. NET development, if we use the session variable to store the state information, because it needs to occupy the server resources, when you want to increase the machine to expand performance, they will play a role, because the session is related to the specific machine. In this regard, perhaps we can refer to the EJB's implementation of the stateless session Beanand its manager stateless Manager. In fact, the Internet can grow to today's scale, and HTTP this stateless protocol should have a great relationship, and the Internet extensibility has been confirmed in the real world, so if we can try to understand the structure of the entire internet, Perhaps we have a deeper understanding of how to build a large, highly scalable system. In addition, from an SOA perspective, the advocated service is generally stateless, given what kind of input, there will be a corresponding determined output. Once the service is in a state, it is not easy to use the object pool to reduce the number of objects, thereby increasing the load.
3. Extensibility Support Technology 2: Decomposition and reconstruction technology of large-scale database
MySpace is a super-large website with hundreds of millions of registered users, in the course of three years of development, it has completed a change from a single server to a scalable architecture, I think the change in fact for us to study how to build a scalable system actually has a strong reference significance:
(1) earliest. Two Web servers, one database server, and our current structure is exactly the same. In the early stages of development, MySpace wasAdd a new Web serverTo improve performance until its database server is overloaded.
(2) 500,000 users. To increase the database, it is faced with the problem of ensuring data consistency. In the second-generation architecture, they enabled 3 SQL Server database servers, one for the primary, to accept all data submissions, and to replicate the content to another two, which they gave to the user with full data. In this architecture, theAdd secondary database serverMethod can effectively deal with the increase of system traffic.
(3) 1.002 billion users. The database server starts to be subject to I/O capacity. At this point, putting all the business on one DB seems to be out of reach, so they have a database architectureVertical Split, different database servers serve different classes of functions, some are responsible for logging in, and some are responsible for blogs. In addition, when users reached 2 million, they enabled theStorage Area Network (san:storage)。 According to MySpaces, this greatly improves the performance, uptime and reliability of the system.
(4) 3 million. Vertical segmentation is not enough, after all, some information (such as user tables) needs to be shared across all db, and maintaining data consistency is expensive. In addition, some applications are growing too fast and their dedicated DB pressure is too high. At this point, there is a need for an up-scaling (Scale up: Upgrade server) and scale-out (scaled out: to strengthen the distributed capabilities, with a large number of inexpensive servers to share the pressure) choice. To minimize the changes to the previous code, they first considered scaling up and studying how to use the 32CPU server problem. But in the end, they are on the outward path, starting to refine the distributed Computing architecture (which is the problem that the scale-out architecture has to face, big vendors like Google even develop their own distributed file systems). At this point, the previously split application is logically integrated. Also, users are set up in millions, each of which is stored in a different db. Of course there will be a special db to control all the accounts and passwords, but its function is single, it is easier to control the load.
(5) 10 million. A new San has been introduced, changing thethe SAN and database bindingsWay. Add a data caching layer between the Web server and the database server---they said it was the first thing to do, but they grew too fast to realize it.
(6) 26 million. Use SQL Server2005 because more memory is available for databases that support 64-bit.
From the changes of MySpace can be seen, with the increase in user volume and the accumulation of user data, the database server will increasingly become a bottleneck. Before it becomes a bottleneck, it is necessary to do all kinds of related technical preparation work well.
4. Scalability Support Technology 3: Pooling of resources such as threading and network connectivity (load Balancing)
centralizing resources such as threads, network connections, and database connections can help maximize compute resources and improve our ability to anticipate resource use. Of course, in these aspects, in fact, IIS and ADO, such as the basic resource service system has made a lot of relevant considerations, we do resource scheduling should be based on these servers. We can refer to it. NET to the thread pool management, you can also refer to the management of the connection pool ADO. In addition, in the scheduling and management of resources, we should fully consider the topology structure of the whole network, and draw on the consideration and characteristics of the distributed system in the implementation mode, especially the resource scheduling algorithm .
Therefore, the general emphasis here is a concept of load balancing , and load balancing in addition to the above mentioned software means, may also include some hardware equipment awareness and purchase, especially for a variety of load Balancer server performance and function of the master.
5. Extensibility Support Technology 4: Other
(1) Multilevel cache technology. Caching, the white is to use space to change time. The more concurrent the system, the more we need to carefully consider the problem of caching, and every discovery of data that can be reused between multiple users and added to the cache, it is possible to increase the productivity of this part of the data by a few thousand times (if the data can be shared among thousands of users). In addition, how to increase the hit rate of the cache, but also when our website gradually become larger, more and more important issues. And maybe we need to delve deeper into the point of integration with our project for a memcache-like cache solution.
(2) Carefully study the database locking method, when the database operation, as far as possible to lock a small range of data collection. It can be used to maximize concurrency and write database operations in a way that minimizes the locking of an exclusive lock. For example, do not lock a record while performing a read-only operation. These links appear to be small, in fact, the whole system of concurrency has a significant impact.
Four. Multi-user storage structure design and related technologies
when a company's users access customer information using our SaaS app service, the user-connected application instance may also serve other dozens of or even hundreds of companies, each with no knowledge of each other. This requires the application architecture to maximize the sharing of resources among different users, but still differentiate between data belonging to different customers. Therefore, when we design the storage structure, we should consider the extensibility of the structure itself, and consider the security of data access on the one hand.
1. Data access Control
Data is not safe, will not be the people who do not have access to steal or tamper, this is the user's most concerned about, and therefore we are most concerned about. Security issues involved in the content is really very much, I try to explain from several aspects of the simple:
(1) Filtering: Adding the middle layer between the user and the data source, giving the user the feeling that only their own data is stored in the database.
In this regard, the typical approach is to adopt a view.
(2) Permissions: Use ACLs to limit who can access what data, and what can be done with the data.
In this regard, there is a crucial question of how to build aTrusted Connections。 A more common way is to have the server impersonate the client user's identity (Impersonation) to access the data, or the server always accesses the data in its own process identity (Trusted Subsystem Approach), additional security is placed inside the application. The previous approach is more complex, but more secure; the latter does not require much configuration, but is less secure, and some resources are not accessible by the server process itself. We may need to mix these two connections at the time of development.
In addition, there may be times when a server might need to access some other resources in a proxy way.
(3) Certification (authentication)
The previous points are all about a particular user, how the system controls his access rights (that is, the issue of Authorization/authorization), then how does the system know that a client request is really from a user? This is the work that authentication to do. This part alone, contains a lot of knowledge, taking asp.net2.0 as an example, there are Windows authentication, form-based validation, and so on Windows authentication, andKerberos, NTLM and other authentication protocols, these authentication protocols in the proxy server support, operational efficiency is different, we need to learn and master. In addition, in. Net3.0, Ms seems to have proposed a new concept and implementation technology,CardSpace。 It's supposed to be a condensation. Ms's latest thinking on network security, I think it should focus on (this technology and active Directory, WCF, Windows Live ID and other technologies are very strong relevance).
Also, from a macro point of view, the authentication method is divided into centralized authentication (centralized authentication system) and the non-centralized authentication (decentralized authentication system) two kinds, We need to study the differences and linkages between them and use them together.
(4) encryption
Encrypting the user's sensitive data is not a good time for unauthorized parties to obtain the data. Symmetric or asymmetric cryptographic algorithms can be used. Asymmetric algorithms have good confidentiality, but the processing overhead is much greater. In actual operation, it is generally used in two ways, that is, the data adopts symmetric encryption method, but the symmetric key is encrypted by asymmetric encryption.
In fact, there are many security-related issues, such as SQL injection, code security, and so on, here is not detailed.
2. Extensible data Structure
In this section, MS in the second chapter of its white paper, "Multiuser data Architecture" already has a more detailed explanation, this article is no longer highlighted. Possible data structures are: (1) stand-alone databases, (2) share databases, different users have different schemas, (3) Share databases, share schemas. Altogether there are three ways in which the resource utilization of these three ways is increasing, but the design is becoming more and more complex. So, it's not that the higher the share, the better, the following diagram shows how to make a choice between the independence and the sharing of the database:
For the most complex "shared databases, shared schemas" in three modes, MS gives two suggested scenarios: predefined extension fields and a table of name values. On the one hand, we can refer to these solutions to achieve, on the other hand we can also think about their own solutions. Also, regardless of which scenario the data store uses, leave room for possible horizontal or vertical segmentation of the data table. From the perspective of the scale of the database, the main means is to copy and group , how to do all need further research.
Five. SaaS and SOA
SaaS refers to a software business model (especially a marketing model), and SOA refers to the implementation of the system (including the integration of existing systems), the two are not a conceptual layer, but here I would like to highlight the relationship between the two, especially the relevance of SOA technology to the importance of SaaS. Because of the many features of business integration, business agility and so on in SOA, it is what SaaS needs. In particular, business agility, although it emphasizes the changes in the vertical timeline of an enterprise's business and software response, and SaaS more emphasis at the same point in time to synchronize the business differences between vendors, but the two are not essential differences in technical requirements.
In addition, SOA has become a standard for enterprise application development, and our users, in addition to using our SaaS software, are likely to use many other applications. In order to make our systems more open, and to have better interaction with other peripheral systems, an SOA thought and technology approach will be essential in our development.
SOA itself is also a technology set, covering a very wide range, this article will not be expanded.
Six. Other technologies
1. Single Sign-on
One of the basic requirements for the ease of use of modern web applications, at least within our system, is to access all of the subsystems that he has access to, with the user logging in at once.
2. Further understanding of the SaaS tree-like architecture
SaaS itself is a tree-like management structure from service providers to individual enterprises, from enterprise to department, from department to each end user, and a structure that contains some of the technical features that can be exploited:
(1) The inheritance of security rights
System Authorization (ahthorization) is generally based on user group/role/role, in the management of authorization, MS proposed a " configuration domain " concept. Access control is managed by the configuration domain. Each configuration domain inherits the role, licensing, and business rules of the ancestor configuration domain according to the app's relationship policy, and can be modified, added, and deleted at the appropriate time. Conceptually this is very reasonable, such as an enterprise within the department is a configuration domain, its ancestor configuration domain is the enterprise, and the subordinate configuration domain may be specific to each end user (also may be the team within the department). But how to do it in detail, we need to further study and give a plan.
(2) Different levels of users
According to the MS document, the user is distinguished as "user" and "End user". In fact, "user" here refers to a unit or organization, the data between different "users" is at least logically isolated from each other. Users can authorize multiple "end users" within their own enterprise, and end users will eventually be able to access a subset of user data under the control of the user.
It makes sense to divide the user into tiers, such as the combination of user and non-user simulations when establishing a trusted connection. For the user to use the simulation method, the end-user adopt non-analog. For example, in the storage structure design, you can set up a separate database for users.
3. Active Directory
If all of our data is taken from a relational database server, then the Active Directory may never be needed. However, some resources are stored in the file system, which requires the Active Directory to provide us with access services. We need to look at the original Active Directory technology, as well as technologies such as Adam (Active Directory Application Mode) that were just proposed.
SaaS-Related Technology essentials