Reduce the cost of cloud storage: Data abstraction and distributed query data access

Last Update:2015-03-23 Source: Internet

Author: User

Keywords Cost data access

Tags access access to information analysis application applications business business intelligence change

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For all applications around cloud computing, the current application of public cloud services represents only a fraction of total IT spending. This will not change unless key mission applications, which account for the bulk of the corporate budget, move to the public cloud.

The biggest bottleneck in public cloud applications seems to be the high cost of cloud storage. Web-related cloud applications may store hundreds of megabytes of data, while mission-critical applications may store terabytes of data, which, at current prices, are more expensive to store than most users can afford. But thankfully, we have two strategies to address this cloud storage cost problem: Data abstraction and distributed query-type data access.

Applying data abstraction methods to business intelligence and cost analysis applications

Business Intelligence (BI) and analytics are the two most promising applications of cloud computing. These applications are clustered around important it decisions and are distributed among planners and decision makers. This makes them ideal for cloud computing applications, but businesses typically estimate that a pilot business intelligence application runs at least 30,000 dollars, a price that is quite expensive.

Building real, but not too large, data is the first of two data cost management methods that we have put forward, a concrete implementation of data abstraction. Data abstraction is a mechanism for generating one or more summary databases from raw company information, and the size of the database should ensure that it can be stored economically in the cloud.

One of our clients in the healthcare industry said that creating a set of patient information Digest databases through diagnostic code, treatment code, and age/gender was reduced by more than 300 times times, meaning that the cost of storing and accessing cloud data was only 1% of data abstraction.

If you want the data abstraction method to be an efficient method of cost management, you must do an in-depth study of how to analyze and analyze the objects. Most bi runs not for the purpose of discovering details; they are looking for a certain pattern or some kind of development trend. For most industries, it is important to have specific variables, such as diagnostics and treatment in the medical profession. By creating a summary database of these variables, you can reduce costs by speeding up access, without impacting the analysis itself. Once you have defined a specific combination of variables, it is also very easy to extract the details of the combination in the data that you need to never abstract. As a result, the analysis based on data abstraction becomes a cloud application that can be used for detailed analysis operations in the datacenter.

Using distributed query-access methods for unstructured data

The data abstraction method is applicable to the analysis and application of structured transaction data with few important variable parameters. However, it does not apply to the traditional large data in unstructured formats, because the abstraction of unstructured data is difficult to implement. Some companies have a number of successful cases in the application of creating email specific word or word combination high rate database, but the prerequisite is that such a keyword/word combination can be known in advance. For most applications, a more general approach is needed. This method is the second data cost management strategy--distributed query data access method.

Typically, data processing tasks can be divided into three parts: actual processing of data, database management access for locating data locations, and storage access to information from mass storage devices. If a large amount of information cannot be moved to the cloud for cost reasons, it is not possible to achieve an article-by-article access to information in the cloud. The best solution is to host data and query logic somewhere other than the cloud, and send a database management system (DBMS) query command to extract a subset of the data to enable data processing in the cloud. Ensuring DBMS engine functionality within the enterprise and moving only queries and results into/Izumo can significantly reduce data storage and access costs.

It is relatively straightforward to structure the application for such functional partitioning, in fact, more and more vendors are providing DBMS engines or devices that contain storage/query capabilities. However, it is necessary to build a review of the application to prevent the problematic query structure from providing all the data information. Here, the pilot test is not enough; the query logic should test the size of the results before delivery.

Recognize the problem of distributed query processing

A special condition of large data is the possibility that information is not stored in one place. e-mail, instant messaging, and collaboration information are usually saved where it is generated, so businesses may have dozens of or hundreds of sites. This creates a problem with distributed query processing, which is often referred to as the MapReduce solution architecture or the most commonly used open source implementation Hadoop.

But structured data can solve distributed queries; a financial company reported that its clients borrowed empirical results from data extracted from more than 30 databases in major metropolitan areas. For structured DBMS analysis, you can use the Sql/dbms command to "synthesize" results from multiple sites, even if the query commands are sent to each site for ease of running alone. As a result, the problem is to make sure that the query commands are subdivided at every place to be fully operational, otherwise, running each command requires access to data from other places, and the cost becomes quite high.

While many people are focused on creating a hybrid cloud, creating "mixed data" will be a more important task for mission-critical applications in the future cloud. Without a way to optimize the use of inexpensive local storage resources and highly flexible cloud computing, users may find that their large data will force them to maintain a traditional it architecture. This will not only cause the cloud to lose its relevance to mission-critical applications

(Responsible editor: Fumingli)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More