Mass picture system cluster distributed storage and load balancing case sharing

Source: Internet
Author: User
Tags current time

For Web servers, user access to picture information consumes server resources. When a Web page is browsed, the Web server establishes a connection with the browser, and each connection represents a concurrency. When a page contains more than one picture, the Web server and the browser produce multiple connections, sending text and pictures to improve browsing speed. As a result, the more pictures on the page, the greater the pressure on the Web server.

The General small Web site is to put all the pages and pictures unified in a home directory, such a site for the system architecture, performance requirements are very simple. Here is the schematic diagram


Some sites of a slightly larger size are saved with a large amount of picture resources. When users visit these site pages, the picture information in the Web page takes up most of the page data traffic. Because it is limited by the client browser, it is not possible to download all picture information from the page at the same time on a single server, so even if the server has high bandwidth, the user's access rate will be greatly affected. Because the picture is on the physical hard disk, access to the picture requires frequent I/O operations, so when the number of concurrent users is more and more, I/O operations will become the overall system performance bottlenecks. This time we will consider the image information distributed storage.


Here is an idea for a medium sized business web site for a distributed dynamic storage of picture data and a load-balancing solution. This idea can increase the access speed of the Web site by adding a little hardware cost, and dynamically adjust the number of picture servers and the storage directory of the pictures, to ensure the scalability and scalability of the system. But for large web-site systems, they may have better technology to implement distributed storage of data.

When the image server is added, the entire Web site system execution process should remain transparent and will not bring any impact to the user. But the backend system needs to address the following 4 issues:

(1) How to realize the distributed deployment of pictures, picture upload How to determine the dynamic storage of the picture server;

(2) How to do the image server load balancing, both the image server to ensure that all the picture servers have an equal opportunity to save the picture.

(3) How to save pictures on a picture server to multiple subdirectories in order to break the operating system in the same directory to save the number of files in the limit, the picture for better management and maintenance;

(4) How to achieve the dynamic expansion of picture server according to the requirement of performance and the increase of picture quantity.


Here is the schematic diagram


Web server deploys Web pages for Web sites that respond to requests from client users. When the user browses the Web page, the Web server responds to the request and accesses the database server, gets the URL path to all the pictures in the page, and then generates the page and returns it to the client, which receives the page and automatically downloads and displays the corresponding picture from a different picture server based on the image URL path in the When the user uploads the picture, the Web server first obtains all picture server's current state from the database server, and according to the correlation algorithm chooses a picture server and the saved directory, then calls the picture server's Web service method to save the picture to this server, Finally, in the database server to record the image number and URL path and other information. The database server is used to record the number of all pictures and the location of the pictures, and to record the configuration and current status information of all picture servers. A picture server cluster is used to store all the picture information for a site, and the number of servers in the cluster can be increased dynamically as needed.


Picture Server Information Table


The WEB server needs to be in time to master all picture Server status and information in order to dynamically decide to save the picture to which picture server, therefore, all picture servers need to record all state information to the database server, the State information table in the ServerID field as the primary key self-add columns, Only represents a picture server record. The ServerName field records the name of the server, which makes it easy for an administrator to identify which server the record represents. The ServerURL field identifies the URL root path of the picture's home directory on the picture server. The Picrootpath field identifies the physical home directory where the picture is saved. The Maxpicamount field represents the maximum number of pictures a picture server can hold, which can be dynamically adjusted according to the hardware configuration and performance of the picture server and the actual needs of the user. The Curpicamount field represents the number of currently saved pictures, and the system will no longer upload pictures to the server when Curpicamount≥maxpicamount. The Flgusable field indicates whether the picture server is available.


Save the picture to the picture server

You can deploy the appropriate services on a picture server with Web service, WCF, WebClient classes, shared files, and so on.


A random algorithm for obtaining picture servers

Filter out the available Picture server collection from the state table for C, and get the total number of records for the collection N. Then the random function is used to generate a random number R1 and R1 and N are taken as i=r1%n. Then C[i] is the picture server where you want to save the picture


To detect whether a picture server is running correctly

You can use the heartbeat mechanism

Client users through the browser to the Web server to browse a page request, the Web server from the database server to get all the picture URL information, and according to the URL information to search Picture Server status information table, to determine the URL to point to the picture Server Status field flgusable , if flgusable = = False indicates that the picture server is currently unavailable for some reason, replace the URL of the picture with the URL of a default picture saved on the Web server, or return the URL directly to the client. The client then automatically downloads and displays the corresponding picture from a different picture server according to the URL path of the picture. Because the image URL path directly points to a specific picture server, you need to set up a Web site on the home directory of each picture server's saved picture. Because the client browser needs a picture that is downloaded directly from multiple image servers, the browser can simultaneously download pictures from multiple servers, which shortens the download time of the picture and reduces the I/O requests and performance pressures of the Web server, thus increasing the speed of the site's access.



Research on distributed storage and load balancing of massive images


Objective

This paper presents a distributed storage and load balancing technique for massive images, which is based on the problem of the loss of access speed, the increase of performance pressure and the I/O bottleneck. The separation of picture and page data is realized by separately deploying picture data and website content, recording and maintaining picture server state information in the database. The experimental results show that this technique can improve the access speed and running efficiency of the website, and can dynamically increase the number of image servers to meet the increasing performance requirements.

Summary

This paper presents a distributed storage and load balancing technique for massive images, which is based on the problem of the loss of access speed, the increase of performance pressure and the I/O bottleneck. The separation of picture and page data is realized by separately deploying picture data and website content, recording and maintaining picture server state information in the database. The experimental results show that the technology can improve the access speed and running efficiency of the website, and can dynamically increase the number of image servers to meet the increasing performance requirements.

Keywords: mass picture; distributed storage; load balancing

"Abstract" aiming at the problems of the mass images can cause to WEB site such as lower access speed, more performance pre Ssure, I/operformance Bottle-neck, etc., a technology of distributed store and load balance for mass images are proposed. By the means of deploying Website pages and images separately and recording status of image servers in database, solves th E Problem of separation for image data and page data. Experimental result shows this solution can improve the access speeds and running for Web site, and can add Additional imageservers to meet the increasing performance demands.

"Key words" mass images; Distributed store; Load balance

I. Overview

With the development and popularization of computer network technology, there are more and more like "Sina", "Taobao" large portals and E-commerce website [1]. This kind of website has a large number of image resources saved. When users visit these site pages, the picture information in the Web page takes up most of the page data traffic. Because it is limited by the client browser, it is not possible to download all picture information from the page at the same time on a single server, so even if the server has high bandwidth, the user's access rate will be greatly affected. Because the picture is on a physical hard disk, the access to the picture requires frequent I/O operations, so when the number of concurrent users grows, I/O operations become the performance bottleneck for the entire system [2]. At the same time, due to operating system restrictions, a directory can be stored in the number of picture files is limited, so with the increasing picture resources, how to properly and effectively manage and maintain the picture is also a difficult problem.

For a few large web site systems, because of their own strong financial and human resources, can use nfs[3], cdn[4], Lighttpd, reverse proxy, load balancing technology to improve user access speed. But these technologies require a huge amount of financial support, in the early days of entrepreneurship in the medium-scale business sites, due to lack of necessary financial support, so can not use these technologies to improve the speed of Web site access. In this paper, the author puts forward a solution for distributed dynamic storage and load balancing of massive picture data for medium sized business websites. The scheme can increase the access speed of the Web site by adding very little hardware cost, and dynamically adjust the number of picture servers and the storage directory of pictures, to ensure the scalability and scalability of the system [5].

Second, the system architecture design

For Web servers, user access to picture information consumes server resources. When a Web page is browsed, the Web server establishes a connection with the browser, and each connection represents a concurrency. When a page contains more than one picture, the Web server and the browser produce multiple connections, sending text and pictures to improve browsing speed. As a result, the more pictures on the page, the greater the pressure on the Web server. At the same time, due to the number of concurrent connections in the browser itself (2 ~6 concurrency), means that there are more than the number of concurrent connections on the page of the picture, you can not parallel to the simultaneous download and display all pictures. For small sites, due to the small size of the data, you can put all pages and pictures of the site in a unified home directory, such a site for the system architecture, performance requirements are very simple. However, large and medium-sized sites have a massive level of image files, the technology used is more extensive, from hardware to software, programming languages, databases, Web servers, firewalls and other fields have higher requirements. Therefore, it is necessary to set up a separate picture server to store pictures, the flow of image data from the Web server separated, such a framework can effectively alleviate the Web server I/O performance bottlenecks, improve the user's access speed.

The system architecture design needs to meet the following 4-point requirements: (1) The picture can carry on the distributed storage, (2) can realize the load balance, (3) can add the picture server node dynamically according to the user visit quantity and the website picture data quantity increment, (4) The Picture server node dynamic adjustment is transparent to the website user, And does not disrupt the system's normal operation. The overall architecture of the system is shown in Figure 1, which includes 4 parts: client, Web server, database server, picture server cluster.


3.2 Image Browsing

The client user sends a request to the Web server to browse a page through the browser, the Web server obtains all the picture URL information of the page from the database server, searches the status information table of table 1 according to the URL information, and determines the status field flgusable of the picture server that the URL points to. If flgusable = = False indicates that the picture server is currently unavailable for some reason, replace the picture's URL with the URL of a default picture saved on the Web server, or return the URL directly to the client. The client then automatically downloads and displays the corresponding picture from a different picture server according to the URL path of the picture. Because the image URL path directly points to a specific picture server, you need to set up a Web site on the home directory of each picture server's saved picture. Because the client browser needs a picture that is downloaded directly from multiple image servers, the browser can simultaneously download pictures from multiple servers, which shortens the download time of the picture and reduces the I/O requests and performance pressures of the Web server, thus increasing the speed of the site's access. Browse the image algorithm as shown in Figure 2.


3.3 Image Upload

Because the B/s architecture itself is technically limited, images cannot be uploaded directly to a different image server via a Web server, so a Web service[6 should be deployed on all picture servers so that the Web server can call Web on different picture servers The service performs a save or delete picture operation.

The image upload process is more complex, first the Web server receives the client's access request and accesses the database by executing "SELECT * from tb_serverstatus where Flgusalbe = 1 and Maxpicamount > Curpicamount "statement (Tb_serverstatus is the Picture Server State information table listed in table 1), filter out the available Picture server collection from the state table and get the total number of records N of the collection. Then the random function is used to generate a random number R1 and R1 and N are taken as i=r1%n. Then C[i] is the picture server where you want to save the picture. Gets the value of the Subfoldamount in the C [I] record as the number of picture subdirectories in the k,k c[i] picture server. To simplify the algorithm, specify that all subdirectory names are numbered from "0" until "K-1". For example: The Subfoldamount value is 1 000, then the picture server image directory name is "0", "1", "2" 、...、 "999" respectively. Then using random function to generate random number R2, make s=r2%k, then S is the name of the subfolder of the picture to be saved. In order to ensure that the uploaded image name does not repeat, the current time + random number of forms to form the name of the picture. To sum up, through the use of random function to take the value of randomness and residual operations, so that each picture server and the same server all picture subdirectories have an equal opportunity to save pictures. Therefore, all the pictures are randomly saved to different image servers in different subdirectories, the implementation of the image distributed deployment and load balancing. Also, site administrators can limit the maximum number of pictures and subdirectories by setting the values of the "Maxpicamount" and "Subfoldamount" 2 fields in the Server Status information table. This can be based on the server's hardware configuration and performance differences and other factors to determine the server can save the maximum number of pictures and subdirectories, therefore, further enhance the entire picture server cluster load balancing ability. When you need to add a picture server, you can only add a new picture server record in the status information table, the process of adding a new picture server has no effect on the whole website system, thus the dynamic increase of picture server is realized. The algorithm that the user uploads the picture is shown in Figure 3.


3.4 Picture Deletion

The client sends a request to the Web server to delete a picture, and the Web server receives the request and searches the picture database for the URL information for the picture to be deleted. The URL information is separated by a string operation into the URL root path r of the picture server, the subdirectory d that the picture holds, and the picture name N. Then find the Picture Database status information table, get the record with R matching C,c is the picture server to delete the picture. Then call the webservice[7 on the C picture server and notify the picture by the name N and the subdirectory d in the picture that the method deletes the picture, and then remove the picture record from the database server. The user's algorithm for deleting picture information is shown in Figure 4.


3.5 Picture Modifications

The algorithm to modify the picture is to delete the image and upload the image 2 features superimposed. The client sends a request to modify the picture and uploads the new picture to the Web server, the Web server accesses the database to get the URL address of the old picture, calls the function that deletes the picture to delete the old picture, finally calls the function of uploading the picture to complete the new picture upload. Finally modify the image database, record the new image URL path. The algorithm flow is shown in Figure 5.


Iv. System Performance Analysis

In the LAN environment, the performance test is carried out for 2 kinds of picture server and not using Picture server. The hardware configuration is as follows: A Web server, a database server, configured to Cpu:intel Xeon four 2.2 GHz, Memory 4 GB, network bandwidth MB/s. Client machine 5 for Cpu:pentium 3.0 GHz, Memory 2 GB, network bandwidth MB/s. Picture server 3, for ordinary PC: Cpu:intel Dual-core P2.0 GHz, Memory 1 GB, network bandwidth 100mb/s. There are 3 million images in the test data, evenly distributed on 3 picture servers, each image server establishes 1 000 subdirectories. Run the stress test software at the same time on 5 clients to simulate the request of 200 ~ 000 concurrent users, and the test results are shown in Figure 6.


As can be seen from Figure 6, using 3 pc as Picture server, the response time of the whole system is greatly reduced, the performance is obviously improved, and the greater the concurrent traffic, the higher the performance, and the more limited the hardware cost for the whole system.

V. Concluding remarks

In the face of the growing picture data of the website, this paper designs and implements a distributed picture distribution and load balancing solution for medium sized Web sites. This paper discusses some key technologies such as distributed image storage, database structure design and related query, modification and deletion algorithm. Through the performance analysis data, this solution can increase the access speed and running efficiency of the website greatly by adding very little hardware cost.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.