This articleArticleThis is a small document used in our development team. Since our company has no experience in Saas before, we have set up a group to explore the technology in this area. I am one of the members. This document will sort out the technical points used to develop an SaaS application from a macro perspective, so as to facilitate the subsequent technical provisioning. This is because your R & D experience is too lacking and may be somewhatTechnical blind spotsI did not think about it myself. I hope you will have more guidance in the garden.
I. Focus on "Three NOPs"
In the official documents of MS, three major challenges faced by building a sufficiently mature SaaS (MS briefly lists the four-level maturity of SAAS applications): configurability and scalability, the multi-user storage structure is designed as "three headed monster ". In the two SaaS demos provided by MS (litwarehr and crab respectively), the three problems are also solved. Therefore, we must deduct these three main lines before exploring technology.
1. Concept
SAAS is a typical software type of "multiple applications for a single instance", which requires the instance to have strong configuration capabilities in different application environments. Specifically, you need to provide configuration capabilities for the following four aspects:
(1)Program.
(2) workflow and business rules.
(3) data model.
(4) Access Permissions of users and end users.
This custom capability should be weighed between ease of configuration and configuration capability. The optimal result is of course the most convenient configuration means to implement the most complex custom functions, but this is not easy to do, so we may need to provide some advanced users with secondary development capabilities based on existing systems (using scripts, etc ). Next, let's talk about configuration.
2. implementation basis of configuration: Metadata
the system displays different appearances and behaviors through configuration data. For the entire system, configuration data is the metadata of the system. I have not studied the metadata myself. I only used some XML files or specific tables in the database to save some configuration information in development. However, for a mature SaaS application, this simple configuration method may not be enough. How to design a metadata structure with flexible structures, powerful functions, and good compatibility, how to provide the most efficient metadata service in Code and define metadata (Metadata of metadata ), how to ensure that the operation of the program is not affected when the metadata structure changes requires us to investigate the semantic characteristics and possible technical support methods of the metadata. In addition, when I wrote this article, I accidentally found several pictures about the metadata service layer design on the Internet, which is very interesting:
the upper part of the figure above provides a conceptual model of the SAAs reference architecture. The lower part provides each part of the architecture, existing technical means that may be adopted. We have noticed that the metadata service and metadata data structure definition are the only blind zone in it.
the following figure shows a possible solution for metadata services provided by Matias woloski for the main configuration modules (interfaces, business rules, multi-user data structures, and Access Control, maybe it can be used as a clue for further exploration.
3. Basic Configuration Requirements: Business commonality extraction and business Personality Analysis
Why is a set of code shared by all tenant? Because all these users have a common need for software and business processes, all their applications are in a relatively definite business domain). We need to sort out the business needs in this business field, and then provide our software to serve this business field. Since they all belong to a business field, we can provide a set of software that can run. Why do we need to provide such powerful configuration capabilities? Because customers in the same business field actually have many differences in the details of business rules, and the two enterprises with identical business flows do not exist. In addition, every enterprise is constantly developing and changing, and its business structure is constantly being adjusted, reorganized, and upgraded. Our software should be flexible to a certain extent, supports such upgrades for enterprises.
therefore, we need to study the features of the business field to be addressed by our SaaS software, and find out the differences between business areas in this field that are currently in the horizontal scope, and the foreseeable future business upgrade requirements are actually the first and most difficult step to implement SaaS software. However, the relationship between this step and technology may not be so great. The most involved is the modeling technology in many fields including UML. This part should be part of the SAAs research that combines business and technology while taking a larger proportion of the business.
4. configuration Main Content 1: the configuration of the program appearance
the appearance configuration actually contains a lot of content, but the core content is the multi-granularity modularization of interface elements, only with this implementation can users define what they want to display and what kind of interface style they want to display at multiple granularity levels. This is also the advantage of the author's opinion. net, because Ms products have been doing well in componentization. Only ASP. in net2.0, mature interface technologies such as custom control, user control, Web parts, theme, skin, and masterpage are available. net3.0's WPF (now renamed Silverlight), I believe there are many more excellent modular interface technologies that I have not studied yet. All of these require us to learn and discover.
In addition, if we continue to adopt browser-based software systems, the "experience localization" of web programs is indispensable. In this regard, Ajax technology has become increasingly mature, ms's Ajax 1.0 official version has been released.
when learning these interface technologies, we should not only learn how to use them, but also understand their underlying compiling and running models, in this way, we can fully grasp how each display module object instance can complete as many interface display and interactive response functions as possible at the minimum operating cost.
5. Configuration main content 2: Business Process configuration
The so-called business process, I do not know whether the understanding is correct, in fact, should belong to the classic workflow problem. Therefore, although this aspect is very important and complex, the focus of research is the most prominent. There are only two points:
(1) Research on Workflow methodology. I have not made any development projects on workflows and have no right to speak in this respect. However, I believe that before using tools and class libraries related to workflows for development, we should have a good grasp of the workflow in terms of methodology.
(2) Research on various existing workflow tools. Workflow is the focus of enterprise development, and there should be a lot of related products. The main tools I know are as follows:
A. WF (Microsoft Workflow Foundation), one of the three new class libraries added by. net3.0, is specially used to support workflow technology.
B. BPEL (Business Process Execution Language) is a standard led by IBM to describe business process behaviors using XML. It is widely supported and applied in the industry. Of course, it is not exactly the same as a workflow. It is not clearly defined for people, roles, work projects, etc. It focuses on describing a web service-based business process.
6. ease of use of Configuration Methods
A widely accepted view is that usability may be the key to the success or failure of an SaaS software. As a user of SAAS applications, they will certainly be concerned about whether every employee of the company can easily use the system. This is actually one of the main costs of deploying SaaS applications. In the operation of the entire system, the User-Defined configuration of the appearance and business flow may become the most complex part. The ease-of-use of these operations will also directly affect the ease-of-use of the entire software.
To enhance the ease of use of configuration methods, we need to learn from mature SaaS applications such as Salesforce.
Iii. scalability and Related Technologies
1. Concept
Application scalability includes two requirements: (1) Efficient Use of application resources to maximize parallelism. (2) When the original server resources cannot meet the increasing number of users, you can easily scale up (improve hardware performance) or horizontally (Increase the number of hardware) to improve the concurrency processing capability of the entire system. Scalability is not just a challenge faced by SaaS. All large network application teams based on internate will face such challenges. Therefore, there are a lot of Network-related information, most of the topics about architecture design for large-scale enterprise applications will involve discussions on scalability. I have not developed a large application system myself. More detailed knowledge points can be provided only after further learning and understanding, the following section only briefly lists some of the content that I have previously touched on as possible technical research points.
2. scalability Support Technology 1: stateless Application Mode
We need to study how to design applications to run them in stateless mode. That is to say, all necessary user and session data are stored on the client or distributed storage device and can be accessed by any application instance. Stateless means that each transaction can be processed by any instance. In a session, you can use multiple instances to process the transaction, but you are not aware of it. For example, when we are developing Asp.net, if we use session variables to store status information, because it needs to occupy server resources, when you want to increase the number of machines to expand performance, they will impede the connection, because the session is connected to a specific machine. In this regard, we may refer to the stateless Session Bean in EJB and the implementation of its manager stateless manager. In fact, the size of the Internet can grow to today. It should be closely related to the stateless HTTP protocol, and the scalability of the Internet has been confirmed in the real world, so if we can try to understand the structural characteristics of the entire Internet, we may have a deeper understanding of how to build a highly scalable large system. In addition, from the perspective of SOA, the promoted service is generally stateless, and a definite output will be given. Once the service is in a state, it is difficult to use the object pool to reduce the number of objects and increase the load.
3. extensibility Support Technology 2: decomposition and reconstruction Technology for Large Databases
Myspace is a super-large website with over registered users. During its three years of development, it has completed the transformation from a single server to a scalable architecture, I want to make a change that has a strong reference for us to study how to build a scalable system:
(1) The earliest. The two web servers and one Database Server are exactly the same as our current structure. In its initial development, MySpace increased its performance by adding new Web servers until its database server was overloaded.
(2) 0.5 million users. To increase the number of databases, we are faced with the problem of ensuring data consistency. In the second-generation architecture, they enabled three SQL Server database servers, one as the primary server, accepted all data submissions, and copied the content to the other two for full data delivery. In this architecture, the method of adding a secondary database server can effectively cope with the increase in system traffic.
(3)-million users. Database servers are beginning to be subject to I/O capacity. At this time, it seems that it is no longer good to put all services on one database, so they vertically split the database architecture. Different database servers serve different types of functions, and some are responsible for logon, some are responsible for blogs. In addition, when users reach 2 million, they enable the storage area network (SAN: Storage Area Network ). According to myspaces, this greatly improves the system performance, normal running time, and reliability.
(4) 3 million. Vertical segmentation is not enough. After all, some information (such as user tables) needs to be shared among all databases, resulting in a high cost of maintaining data consistency. In addition, some applications are growing too fast, and the pressure on their dedicated databases is too high. At this time, you need to scale up (upgrade servers) and scale out (scale out: Enhance distributed capabilities, and use a large number of cheap servers to share the pressure. In order to minimize changes to the previous code, they first considered scaling up and studied how to use a server with 32 CPUs. But in the end, they are still on the road to outward scaling and begin to refine the distributed computing architecture (this is a problem that must be faced by outward scaling architecture, major vendors, such as Google, have even developed their own distributed file systems ). At this time, the split application is logically integrated. In addition, users are stored in different databases in a group of millions. Of course, there will be a special dB to control all accounts and passwords, but its single function makes it easier to control load.
(5) 10 million. The new San is used to change the binding mode between the San and the database. Add a data cache layer between the Web server and the database server-they say this is what we should do from the beginning, but they have grown so fast that they have never been implemented.
(6) 26 million. SQL server2005 is used because 64-bit databases can use more memory.
From the changes in MySpace, we can see that with the increase in the number of users and the accumulation of user data, database servers will become increasingly bottlenecks. Before it becomes a bottleneck, it is necessary to fully prepare relevant technologies.
4. extensibility Support Technology 3: sharing of resources such as threads and network connections (Server Load balancer)
The concentration of threads, network connections, database connections, and other resources helps to maximize computing resources and improve our ability to predict resource usage. Of course, in these aspects, in fact, IIS, ADO. NET and other basic resource service systems have made a lot of considerations, and our resource scheduling should be based on these servers. We can refer to. Net for thread pool management, or ADO. Net for connection pool management. In addition, the topology of the entire network should be fully considered during resource scheduling and management, and the implementation methods of distributed systems, especially resource scheduling, should be fully used for reference.AlgorithmSome considerations and features.
Therefore, here we emphasize the concept of Server Load balancer. In addition to the software mentioned above, Server Load balancer may also include understanding and purchasing some hardware devices, this is especially true for the performance and functions of various Server Load balancer servers.
5. extensibility Support Technology 4: Others
(1) multi-level cache technology. Cache. To put it bluntly, the space is used for time. The more concurrent the system, the more you need to carefully consider the cache issue. Every time we find a data that can be repeatedly reused among multiple users, we add it to the cache, this may increase the efficiency of data generation by thousands of times (if the data can be shared among thousands of users ). In addition, how to increase the cache hit rate is also an increasingly important issue when our website grows. In addition, for memcache and other cool cache solutions, we may also need to study its integration with our project.
(2) carefully study the locking method of the database, and try to lock a small range of data sets when operating the database. The database can be written in a way that maximizes concurrency and minimizes the exclusive locks. For example, do not lock records when performing read-only operations. Although these steps seem small, they have a significant impact on the concurrency of the entire system.
Iv. Multi-User Storage Structure Design and Related Technologies
When a company user uses our SaaS application service to access customer information, the user may connect to dozens of other application instances, users of even hundreds of companies provide services, and users do not know each other. This requires that the application architecture can maximize resource sharing among different users, but data belonging to different customers must be differentiated. Therefore, when designing a storage structure, we must consider the scalability of the structure and the security of data access.
1. Data Access Control
Whether the data is secure or not is stolen or tampered with by unauthorized users. This is what users are most concerned about, so we are most concerned about it. Security issues involve a lot of content. I will try to elaborate on several aspects here:
(1) Filtering: adding an intermediate layer between the user and the data source gives the user the feeling that only their own data is stored in the database.
In this regard, a typical method is to use a view.
(2) Permission: ACL is used to restrict who can access and what operations can be performed on data.
In this regard, a key issue is how to establish a trusted connection. A common method is to simulate the client user's identity (impersonation) to access data, or the Server Always accesses data as its own Process Identity (trusted subsystem mode ), the extra security is placed inside the application. The configuration in the previous method is complex, but the security is high. In the latter method, too many configurations are not required, but the security is low, in addition, some resources cannot be accessed by the access level of the server process. We may need to use these two Connection Methods in combination during development.
In addition, sometimes the server may need to access some other resources as a proxy.
(3) Authentication)
The previous points are about how the system controls the access permissions of a specific user (that is, all of them are authorization/authorization issues ), so how does the system know that a client request actually comes from a user? This is the work to be done by authentication. This part contains a considerable amount of knowledge, using ASP. for example, net2.0 can be used for Windows Authentication and form-based authentication. For Windows authentication, it also has different authentication protocols such as Kerberos and NTLM, these verification protocols are different in terms of support for proxy servers and operational efficiency. We need to learn and master them. In addition, in. net3.0, Ms has proposed a new concept and implementation technology, cardspace. This technology is the latest reflection on network security by Ms. I think we should pay attention to it (this technology and Active Directory, WCF, windows Live ID and other technologies are strongly correlated ).
Also, from a macro perspective, the authentication methods are divided into two types: centralized authentication (centralized authentication system) and non-centralized authentication (decentralized Authentication System). We need to study the differences and relationships between them, combined Use.
(4) Encryption
Encryption of users' sensitive data is not useful for unauthorized users to obtain the data in a timely manner. Symmetric or asymmetric encryption algorithms can be used. Asymmetric algorithms have good confidentiality, but the processing overhead is also large. In practice, the two methods are generally mixed, that is, symmetric encryption is used for data, but asymmetric encryption is used to encrypt symmetric keys.
In fact, there are still many security-related issues, such as SQL injection and code security. We will not detail them here.
2. Scalable Data Structure
In this section, Ms has made a detailed description in Chapter 2 multi-user data architecture in its White Paper. Possible data structures include: (1) independent databases; (2) shared databases. Different users have different architectures; (3) shared databases and shared architectures. There are three methods in total. The resource utilization of these three methods is getting higher and higher, but the design is becoming more and more complex. Therefore, the higher the degree of sharing, the better. The following figure shows how to make a trade-off between database independence and sharing:
For the most complex "shared database and shared architecture" among the three modes, Ms provides two solutions for creating a condition: predefined extended fields and name-value tables. On the one hand, we can refer to these solutions for implementation, and on the other hand, we can also think about our own solutions. In addition, no matter which solution is used for data storage, it leaves room for further horizontal or vertical splitting of data tables. From the perspective of horizontal expansion of databases, the main means is replication and grouping. Further research is required on how to do this.
V. SaaS and SOA
SAAS refers to a software business model (especially a marketing model), while SOA refers to system implementation (including integration of existing systems). The two are not at the conceptual layer, however, I would like to emphasize the relationship between the two, especially the importance of SOA-related technologies to SAAS. Because SOA embodies many features such as business integration and business agility, it is exactly what SaaS needs. In particular, business agility, although it emphasizes the changes in an enterprise's business in the vertical timeline and the response of software, SAAS emphasizes the business differences between vendors at the same time point, however, there is no essential difference between the two in terms of technical requirements.
In addition, SOA has gradually become the standard for enterprise application development. In addition to using our SaaS software, we may also use many other application systems. To make our system more open and have better interaction with other peripheral systems, an SOA idea and technical means will be indispensable in our development.
SOA itself is also a technology set that covers a wide range of areas. This article will not be discussed here.
Vi. Other Technologies
1. Single Sign-on
One of the basic requirements for the ease of use of modern network applications, at least within our system, we need a user to log on once to access all subsystems that he or she has the right to access.
2. Further understanding of the SAAs tree architecture
SAAS itself is a tree-like management structure from service providers to enterprises, from enterprises to departments, from departments to each end user. Such a structure includes some technical features that can be used:
(1) security permission inheritance
Generally, System Authorization is performed based on user groups, roles, and roles. In terms of authorization management, Ms puts forward the concept of "configuration domain. Access control is managed by the configuration domain. Each configuration domain inherits the roles, licenses, and business rules of the upper-level Configuration Domain Based on the relationship policy of the application, and can be modified, added, and deleted as appropriate. In terms of concept, this is very reasonable. For example, a department in an enterprise is a configuration domain, and its upper-level Configuration domain is an enterprise, the lower-level Configuration domain may be specific to each end user (or a group in the department ). However, further research and solutions are required for implementation.
(2) Different user levels
Users are classified into "users" and "end users" by Ms document ". In fact, "user" refers to a unit or organization. data between different "users" is logically isolated from each other. The "user" can authorize multiple "end users" in the enterprise to access a subset of user data under the user's control.
This method of dividing users into levels makes sense. For example, when building a trusted connection, you can combine User Simulation with non-user simulation. Users are simulated and end users are not simulated. For example, when designing the storage structure, you can create a separate database for the user.
3. Activity directory
If all our data is taken from the relational database server, the Active Directory may never be required. But some resources are stored in the file system, which requires the active directory to provide access services for us. We need to study the original Active Directory technology and the new Adam (Active Directory Application Mode) technology.