Basic Methods to Improve data transmission efficiency of Web Services

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Background

Web Service is now one of the standards for implementing SOA. Many companies have or are involved in the implementation and deployment of web service projects. The advantage of web service is its loose processing of communication and data exchange between heterogeneous systems, which can handle integration issues between various systems of enterprises at random. However, Web services use XML standards for inter-system data transmission, which increases the amount of data transmitted, especially when transmitting data with a strict structure, this will reduce the transmission efficiency. Therefore, how to improve the transmission efficiency of web services has become a concern of many companies during project deployment.

Purpose

This article introduces some methods to improve system efficiency in the Web service implementation and development process. Practice has proved that these methods are very effective and easy to implement. Different methods have their own application fields and advantages and disadvantages. We will discuss them separately.ArticleThe main purpose is to provide readers with a variety of basic solutions, so that readers have more ideas when deploying web service projects.

Cause Analysis

Web Service uses XML standards for data transmission. XML contains a lot of data information during transmission and is displayed as tags. During transmission, in some cases, these tags occupy more than half of the data transmission volume. For example, to transmit information about a table, see the following (the name of the table is fictitious ):

Table 1. Table examples to be transferred

Name	City	Apartment
Air Wang	Beijing	Some place

During data transmission in Table 1, the following XML file may be generated:
List 1. xml file example corresponding to the data transmitted in table 1

<Heading> <column> name </column> <column> city </column> <column> apartment </column>

If the preceding table contains formatted Information (such as the font, background color, and so on), the corresponding XML will be more complex. From the XML above, we can see that in addition to data, XML will append a lot of tag information, which increases the amount of data transmitted. When the amount of data to be transmitted is large, XML labels bring about great efficiency issues.

Solution 1: compression and decompression

Web Services transmit XML-based message requests and responses over the network. A large amount of data transmission may cause network bottlenecks. The most direct solution is to compress the transmitted messages. There should be different solutions for data compression of different scales. The following describes the solutions respectively:

1. compression of the entire XML file
data compression has been developed for many years and there are many mature technologies, such as algorithms and toolkit. APIS that are often used for data compression include gzip. It is very simple to compress files, that is, to compress the XML before sending the XML. After compression, decompress the compressed files on the XML receiving end.
advantages:
the advantage of this method is that the mature compression and decompression technologies can greatly improve the transmission efficiency when the data volume is large. For plain text XML, compression can reduce its volume by more than 80%.
disadvantages:
although compression and decompression can greatly reduce the XML volume, the process consumes system resources. Compression and decompression usually have a high CPU usage and memory usage. Low-configuration clients or even servers may cause a lot of pressure.
Application Scenario:
this technology is applicable to scenarios with severe network bottlenecks or high host configurations.
For example:
as described at the beginning of this section, many mature compression and decompression APIs are available to developers, the most common gzip method is used as an example. In general, the system request XML is relatively small, and there is no need to use compression and decompression methods to process the request XML. In terms of system response XML, a large amount of data is usually contained, resulting in a huge volume that requires compression. The process for compressing the response XML is as follows:
server-side Data Model --> serialization operation --> Use gzip to compress serialized XML --> return to client --> decompress in gzip mode --> decompress -> Client Data Model
here we need to note that, the data model of the client and the server must implement the serializable interface.
listing 2. gzip compression Code example (implemented in Java)

Import java. Io. *; import java.util.zip. *; public class compress {Public String gzip (outputstream pstream ){... Try {gzipoutputstream stream = new gzipoutputstream (pstream); Return stream. tostring ();} catch (ioexception e ){...} ... Return NULL ;}... }

InProgramBefore serializing an object model to XML, you can use the preceding compression method to compress data streams. Some code is as follows:
Listing 3. Implementation code example of Object Model serialization before compression

Public class xmlserializerhelper {... Public static string saveobjtostring (Object inputobj) throws converexception {...... Bytearrayoutputstream outputstream = NULL; try {outputstream = new Java. io. bytearrayoutputstream (); m_serializer.save (ixmlserializable) inputobject, outputstream); Return compress.gzip (outputstream);} catch (exception e ){...} ...... }... }

The decompression process is similar to the above Code. Tests show that gzip compression can reduce the consumption caused by more than 60% of the network.

2. Special processing of specific data
In daily data transmission, a large amount of data has many common characteristics. There are often many similarities or duplicates between data and data. For example, in a web service-based report processing system, a report usually contains a lot of empty data or data with the same attributes and value ranges. In this case, you can perform special processing on special situations in the code. We also use the following example to transfer a table:

Table 2. Example Table containing multiple null values to be transmitted

Software sold	Hardware sold	System sold	Others
120	-	-	-
-	-	90	-
-	110	-	-

As you can see, the above table has many null values, so in XML, the null values can be processed in a unified manner, which can greatly reduce the number of network transmission. The corresponding XML part is as follows:
Listing 4. Simplified XML example after processing null values

...... <Null value>) </null value> ......

Advantages:
For repetitive data, this method can reduce the amount of data transmitted by dozens or even hundreds of times (depending on the number of data duplicates). Compared with the first compression method, because it only processes fixed forms of data, it does not occupy a large amount of CPU and memory.
Disadvantages:
The characteristics of data are not easy to grasp, and the situations that can be processed are relatively simple and simple. This method can be used for null values or when there are many duplicates.

Solution 2: Reduce multiple calls and try to use the one-time call method.

Traditional RPC calls may cause high efficiency when used multiple times. The user needs to wait for the network transmission time for each remote call. For web services, many users still call them as traditional RPC. Every time a function provided by web service provider is called, the efficiency is greatly wasted. A major feature of Web service is that you can perform one-time settings locally, and then send the generated XML to the other end of the service in a unified manner. For the user, all the settings are performed locally, and the user does not feel the bottleneck caused by the network. In the final data transmission process, it is worthwhile to spend more time.
Application scenarios:
When interacting with users, try to use such processing as much as possible, that is, to reduce Multiple Remote calls, and try to make the program complete only one call. A simple example is used to illustrate the problem: a user interface requires many settings, and each step of these settings requires remote calls, in this way, the user will wait for the time consumed by network transmission during each setting process. For the web service, all the work of the client is to generate the request XML, and all the settings can generate a uniform XML file and then transmit it to the server. In this way, the user only needs to wait for the data transmission time (which may be relatively long, but the user only needs to wait once, which is still worth it), and other work is processed on the server side.
Advantages:
All data operations are performed locally, so that the user does not feel the pause caused by the network when processing the data. All operation requests are sent to the server in a unified manner, so that only one data transmission is required for the remote operation that has been waiting for multiple times.
Example:
The following two figures (figure 1 and figure 2) are two steps in a data import wizard. You can set the database and data table information on the interface to obtain the target data.

Figure 1. Data Import setting wizard Example 1

Figure 2. Data Import setting wizard Example 2

a common mistake for many beginners is that, after selecting a database, they can directly connect to the database in the program and establish a database connection using the user password provided by the user, then, the remote function call is performed in each subsequent step (Set Data Table conditions in), so that the network bottleneck is felt every time you click the "Next" button in the Wizard, for example, a pause or a more serious program does not respond for a period of time. In fact, for web services, you can set the request XML in each step (this setting is performed locally). After all the steps are completed, then, XML is sent to the server for processing. In this way, operations between each step are performed locally, without any extra network response wait time, so that the entire wizard can be quickly set. After the last step is completed, the user's one-time waiting for the network is relatively worthwhile. In this example, the structure of the Request XML is as follows:
List 5. xml example of data import settings on the client

...... <Data retrieving> <database> <type> toolbox </type> <pattern> TCP/IP </pattern> <username> Wang Yun </username> </password> </ database> <Table> <condition method = 'more than thance' value = 'table'> 2500 </condition> <condition method = 'equal' value = 'table'> Wang Yun </condition> <condition method = 'less thanc' value = 'table'> 10 days </condition> </table> <data retrieving> ......

Some sample code of the System Client for setting XML is as follows (only database nodes are listed during object model serialization ):
Listing 6. Object Model serialization into XML example

 public class databaseloginobj {... Public element persisttoxml (dataobject pobj ){... Element dbelement = pobj. createelement ("Database"); element typeelement = pobj. createelement ("type"); element patternelement = pobj. createelement ("pattern"); element userelement = pobj. createelement ("username"); element pwdelement = pobj. createelement ("password"); dbelement. appendchild (typeelement); dbelement. appendchild (patternelement); dbelement. appendchild (userelement); dbelement. appendchild (PW Delement );... }...}

Dataobject implements the org. W3C. Dom. Document Interface, and element implements the org. W3C. Dom. Element interface. After the xml configuration of the entire request is complete, it is transmitted to the server for processing.

Solution 3: select and optimize the XML Parser

There are many types of XML parser in the software field, and there are two basic methods: Dom and sax. Different parsers should be used for different levels of XML files. For more information about the XML parser, see other relevant documents. The scenarios and advantages and disadvantages are listed below:

Table 3. Basic comparison between Sax and Dom parsing Methods

type	advantages	disadvantages	application scenarios
Dom	1. The XML tree is fully stored in the memory, so you can directly modify its data and structure. 2. You can use this parser to access any node in the XML tree at any time. 3. Dom parser APIs are relatively simple to use.	If the XML file is large in size, reading the file into the memory consumes a lot of system resources.	Dom represents the official W3C standard for XML documents in a way unrelated to the platform and language. Dom is a set of nodes organized in hierarchies. This hierarchy allows developers to search for specific information in the tree. To analyze this structure, you usually need to load the entire document and construct a hierarchy before you can perform any work. Dom is based on object hierarchies.
sax	sax has low requirements on memory because it allows developers to determine the tag to be processed by themselves. Especially when developers only need to process part of the data contained in the document, the extension capability of Sax is better reflected.	it is difficult to access different data in the same document because it needs to be executed in sequence during XML parsing using the sax method. In addition, the parsing encoding process based on this method is also relatively complex.	This method is very effective when there is a huge amount of data, and you do not need to traverse or analyze all the data in the document. This method does not need to read the entire document into the memory, but only needs to read the document tag required by the program.

Correct Selection of XML Parser types can greatly improve the efficiency of the web service system. Many manufacturers now provide many XML Parser based on these two types. When selecting the parser, you should carefully read the instructions and carefully select the parser. In addition, these two methods can be used in a system at the same time to achieve the highest resolution efficiency. Generally, the request XML can be parsed using Dom, while the response XML can be parsed using sax.

Solution 4: simplify tags

We know that the Web service solution has a complicated structure to transmit XML in the network. In the transmission process, not only the necessary data is sent and received, but also the XML structure is too redundant and more information is attached for transmission. Example:
The XML structure of the response data obtained by the system client from the server is as follows:
Listing 7. XML for table data transmitted over the network

<Cells>

From the XML above, we can see that each returned value contains at least <column>, </column>, <row>, </row> to identify it, that is, if we want to transmit a table, each data in the table must be transmitted with the corresponding four tags. Assume that the number of bytes of each data in the table is 8, the additional bytes of the tag are nearly four times that of the data. If the amount of data to be transmitted is huge, the efficiency will naturally decrease. Therefore, we can use a very simple method to effectively reduce the number of additional bytes brought about by XML, that is, to simplify the XML tag. You can rewrite the XML mentioned above to the following format:
Listing 8. corresponds to the simplified XML in listing 7

<Cells>

Advantages:
Tests show that using this improvement method can nearly double or even increase the efficiency of the entire system. In addition, this processing method is simple and easy, but you only need to modify the tag.
Disadvantages:
The meaning represented by XML itself is not so easy to understand. The remedy is that a comparison table is generally provided to describe the meaning of the simplified XML file.
Example:
This method is simple and intuitive. An example of the comparison table is listed as follows:

Table 4. simplified the XML comparison table and its meaning

Simplify the front label	Simplified label	Description
Row	R	Indicates the rows in the table.
Column	C	Column in table
...	...	...

Declaration tables similar to the above should be provided to users as documents.

Application scenarios:
Here we need to note that XML can clearly express the entire data structure, and many software developers often want to use XML to observe the data situation, the tag is to help them do such a job better. Therefore, the label should be as meaningful as possible, because C and R can clearly indicate column and row, and they are the most repeated during data transmission (accounting for 80% of the total number of transfer tags), so we made the above changes. But for a label like cells, because the repetition rate is not very high, and we need this label to have a literal meaning, we keep it in a situation like this.

Solution 5: Cache Mechanism

The four methods described above to improve efficiency are based on XML transmission, that is, they are all introduced on How to Improve XML transmission efficiency. The fifth mechanism applies to any software solution, while the Web service solution has a icing on the cake. The most common idea of the cache mechanism is to cache data that has been used for a large number of times. If there are the same requests, the data in the cache will be returned. This reduces the time required for logical data processing. We use a user's data refresh operation as an example to introduce a common cache method ,:

Figure 3. cache process for a data refresh operation

As shown in the figure, when you perform a refresh operation, the server first checks the request XML to retrieve the cuid of the data source to be refreshed. Based on the cuid, the system checks whether the data object corresponding to the cuid exists in the cache. If yes, the system retrieves the refresh status and compares it with the refresh status of the data source that should be cuid. If they are consistent, data is directly returned to the client. If they are inconsistent, you need to open the data source and refresh it again, and save the objects of the refreshed result set in the cache. If the cache is full, use the least recently used principle to clear some data objects in the cache. In addition, you can set the cache size through the attribute file. In the program, when initializing the cache, you first read the attribute file to set the cache size. The advantage of caching data is that when you refresh the data multiple times, the system does not need to re-open the data source (sometimes it takes a lot of time to connect and open the data source ), this improves the system efficiency.
Advantages:
For multiple data operations in the same way, data is directly returned from the cache, which can greatly improve the system efficiency.
Disadvantages:
The programming workload is large, and many tests are required.
Application scenarios:
Generally, Web Service deployment programs can adopt the cache mechanism for processing.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Basic Methods to Improve data transmission efficiency of Web Services

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Basic Methods to Improve data transmission efficiency of Web Services

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support