Massive Image Distributed Storage and Load Balancing Research, massive Load Balancing

Source: Internet
Author: User

Massive Image Distributed Storage and Load Balancing Research, massive Load Balancing

Research on Distributed Storage and load balancing of Massive images

For Web servers, users' access to image information consumes a lot of server resources. When a Web page is browsed, the Web server establishes a connection with the browser. Each connection represents a concurrency. When a page contains multiple images, the Web server and the browser will have multiple connections and send text and images at the same time to improve browsing speed. Therefore, the more images on the page, the more pressure the Web server receives. At the same time, due to the browser's concurrent connections limit (2 ~ 6 concurrency), which means that when there are more than the number of concurrent connections on the page, all images cannot be downloaded and displayed simultaneously in parallel. For small websites, because the data size is small, all the pages and images of the website can be stored in the same home directory. Such websites have simple requirements on system architecture and performance. However, large and medium-sized websites store sea-level image files, and the technologies used are even more extensive, from hardware to software, programming languages, databases, Web servers, firewalls and other fields have high requirements. Therefore, it is necessary to set up a separate image server to store images and separate the traffic of image data from the Web server. This architecture can effectively alleviate the I/O performance bottleneck of Web servers, improves user access speed.

The system architecture design must meet the following four requirements:

(1) distributed storage of images;

(2) Load Balancing;

(3) The image server node can be dynamically added based on the increase in user traffic and website image data volume;

(4) Dynamic Adjustment of image server nodes is transparent to website users without interrupting the normal operation of the system.

The overall system architecture 1 is shown in four parts: client, Web server, database server, and image server cluster.

Figure 1 System Architecture

A client is a commonly used client browser such as IE and Firefox. Users can browse the image information of a website through a client, or upload image information through a client.

The Web server deploys the Web page of the website to respond to client user requests. When a user browses a webpage, the Web server responds to the request and accesses the database server, obtains the URL paths of all images on the webpage, generates the page, and returns it to the client, the client receives the page and automatically downloads the page from different image servers based on the image URL path and displays the corresponding image. When a user uploads an image, the Web server first obtains the current status of all image servers from the database server, and selects an image server and a saved directory based on relevant algorithms, call the Web Service method of the image server to save the image to the server, and then record the image number, URL path, and other information in the database server.

The database server is used to record the numbers of all images and the storage location of images. It also needs to record the configuration and current status of all image servers.

The image server cluster is used to store all the image information of a website. The number of servers in the cluster can be dynamically increased as needed.

  Iii. system implementation and Key Technologies

After the image server is added, the entire website system execution process on the client side should be transparent and will not affect the user. However, the background system must solve the following four problems: (1) how to deploy images in a distributed manner and how to dynamically determine the image server to which images are stored during image uploading; (2) how to achieve the Server Load balancer of image servers should ensure that all image servers have equal opportunities to store images, and take into account the hardware configurations and performance differences of different servers for different treatment; (3) how to save an image server to multiple sub-directories in a balanced manner to better manage and maintain images by breaking the limit on the number of files stored in the same directory of the operating system; (4) how can I dynamically expand the image server based on performance requirements and the increase in the number of images.

3.1 status information table

The Web server needs to know the status and information of all image servers in time to dynamically decide which image server to store the images. Therefore, you need to record all the status information of the image server to the database server. Table 1 shows the information and status of the image server. The ServerId field in the status information table is the auto-incrementing primary key column, which uniquely represents an image server record. The ServerName field records the server name, allowing the administrator to identify which server the record represents. The ServerUrl field identifies the URL root path of the image server's main directory. The PicRootPath field identifies the physical home directory for saving the image. The MaxPicAmount field indicates the maximum number of images that can be saved by the image server. This value can be dynamically adjusted based on the hardware configuration and performance of the image server and the actual needs of users. The CurPicAmount field indicates the number of saved images. When CurPicAmount is greater than or equal to MaxPicAmount, the system will not upload the images to the server. The SubFoldAmount field describes the number of subdirectories in the main directory of the image specified in PicRootPath. In this way, images can be evenly distributed to different subdirectories to avoid storing too many images in the same directory, so as to facilitate the maintenance and management of images. The FlgUsable field indicates whether the image server is available.

3.2 Image Browsing

A client user sends a browser request to the Web server to browse a page. The Web server obtains all image URL Information of the page from the database server, search the status information table listed in Table 1 Based on the URL information to determine the FlgUsable Status field of the image server to which the URL points, if FlgUsable = false indicates that the image server is unavailable for some reason, replace the image URL with the URL of the default image saved on the Web server, otherwise, the URL is directly returned to the client. The client automatically downloads images from different image servers and displays the corresponding images based on the image URL path. Because the image URL directly points to a specific image server, you need to create a Web site on the home directory of each image server. Because the images required by the client browser are directly downloaded from multiple image servers, the browser can concurrently download images from multiple servers, which shortens the image download time, at the same time, it also reduces the I/O requests and performance pressure on the Web server, thus improving the Website access speed. View the image algorithm 2.

3.3 upload images

Due to the technical limitations of the B/S architecture, Images cannot be directly uploaded to different image servers through Web servers, therefore, you need to deploy a Web Service [6] on all image servers so that the Web server can save or delete images by calling the Web Service on different image servers.

The process of uploading images is complex. First, the Web server receives client access requests and accesses the database, run the "select * from tb_ServerStatus where FlgUsalbe = 1 and MaxPicAmount> CurPicAmount" Statement (tb_ServerStatus is the image server status information table listed in Table 1 ), filter the available image server sets from the status table as C, and obtain the total number of records in the set N. Then a random number R1 is generated using the random function and the remainder operation is performed with R1 and N as I = R1 % N. C [I] is the image server for storing images. Obtain the value of SubFoldAmount in the C [I] record as K. K indicates the number of image subdirectories in the C [I] image server. To simplify the algorithm, specify that all subdirectory names start from "0" until "K-1 ". For example, if the SubFoldAmount value is 1 000, the image server's film directories are named "0", "1", "2 ",... And "999 ". Use the random function to generate a random number R2. if S is R2 % K, S is the name of the subfolders of the image to be saved. To ensure that the name of the uploaded image is not repeated, the image name is composed of the current time + a random number. To sum up, the random function value randomness and the remainder operation are used to ensure that each image server and all image subdirectories on the same server have an equal opportunity to save images. Therefore, all images are randomly stored in different sub-directories of different image servers to achieve distributed deployment and load balancing of images. At the same time, the website administrator can set the value of the "MaxPicAmount" and "SubFoldAmount" fields in the server status information table to limit the maximum number of images and subdirectories that can be saved, therefore, the maximum number of images that can be saved and the number of subdirectories can be determined based on factors such as hardware configuration and performance differences of the server. Therefore, the Server Load balancer capability of the entire image server cluster is further improved. To add an image server, you only need to add a new image server record in the status information table. The process of adding a new image server does not affect the operation of the entire website system, this allows for the dynamic increase of image servers. The algorithm 3 for uploading images is shown in.

3.4 delete images

The client sends a request to the Web server to delete an image. The Web server receives the request and searches for the image database to obtain the URL Information of the image to be deleted. The URL Information is separated by string operations into the URL root path R of the image server, the subdirectory D of the image, and the image name N. Search for the image database status information table and record the matching records with R as C. C is the image server to delete the image. Then, call the WebService [7] method on the C image server, and notify the method to delete the image with the parameter "image name N" and the subdirectory D in the image as the parameter, finally, delete the image record from the database server. The algorithm 4 for deleting image information is shown in.

3.5 image Modification

The algorithm used to modify an image is used to remove the image and upload the image. The client sends a request to modify the image and uploads the new image to the Web server. The Web server accesses the database to obtain the URL address of the old image, and calls the image deletion function to delete the old image, finally, the upload image function is called to complete the upload of the new image. Finally, modify the image database to record the URL path of the new image. The algorithm flow is shown in step 5.

 Iv. System Performance Analysis

In the LAN environment, performance tests were conducted for two cases: image servers and image servers. The hardware configuration is as follows: one Web server and one database server, with a CPU: Intel Xeon quad-core 2.2 GHz, 4 GB memory, and 100 Mb/s network bandwidth. Five client machines are CPUs: Pentium 3.0 GHz, 2 GB memory, and 100 Mb/s network bandwidth. Three image servers, for example, ordinary PC: CPU: Intel dual-core P2.0 GHz, 1 GB memory, 100 Mb/s network bandwidth. There are a total of 3 million images in the test data, evenly distributed on three image servers, each of which creates 1 000 subdirectories. Run the stress testing software on five clients at the same time, simulating 200 ~ 1 000 concurrent user requests, as shown in test result 6.

As shown in figure 6, after three common PCs are used as image servers, the response time of the entire system is greatly reduced, and the performance is significantly improved. In addition, the larger the concurrency traffic, the more obvious the performance improvement, however, the hardware cost for the entire system is limited.

 V. Conclusion

In the face of the increasing image data of websites, this paper designs and implements a solution for Distributed Image deployment and load balancing for medium-sized websites. This article discusses key technologies such as distributed image storage, database structure design, and related query, modification, and deletion algorithms. According to the performance analysis data, this solution greatly improves Website access speed and operation efficiency by increasing the hardware cost.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.