How to streamline the management of data stored in the cloud

Source: Internet
Author: User
Keywords Can can delete can delete these can delete these or can delete these or very

Year after year, the cost of disk space has been significantly reduced, and 50 of dollars can be bought TB of disk, seems to almost regardless of disk costs. In the enterprise physical environment, you will not consider disk space, but in the cloud environment, you must consider, otherwise you will pay the price.

But in a cloud environment, it's another story. If your cloud space has too much low value data or too many copies of files, it will cost you a couple of unnecessary costs. The first is a monthly storage charge, and the second is an unavoidable performance impact, especially when it comes to searching, viewing, reporting, and system updates. In a cloud environment, it is really necessary to manage data, including streamlining, data deduplication, and compression.

The first step is to evaluate the problem: Is it a document or table data? These http://www.aliyun.com/zixun/aggregation/18278.html "> data types usually have different storage limits, There are often significant differences in the strategies and tools used to manage these data.

Documents are often used as attachments to records (for example, in PDF format for signing contracts), and users may not be able to easily find them. The same document may be attached to three or four different records, and you will need to consider other situations, such as users attaching each version to a rapidly changing document. The first thing to do is to create a list of system documents (including their additional record IDs, the last update date, and so on) and use the spreadsheet filter to remove duplicate content. There are many handy duplicate file removal tools available on the market (by detecting the contents of the file), but it is not known whether the file tools can be used directly in a cloud application. Unless you are willing to download all file content to your own server for in-depth analysis, you can only use Meta data analysis to manage files. In addition, because the disc storage is cheap, you can also save all the files you deleted from the cloud environment in the optical storage, in case someone needs the data later.

Table data is another thing, there are many system-specific techniques and techniques for different types of cloud services. The following steps are common for managing table data:

Determine if your cloud system is really storing storage problems. Some systems, such as financial systems, need to be audited, and all the details must be kept in place for a long time, so they cannot be pruned. Other systems, such as marketing automation or log analysis, often collect a lot of detail, and these unnecessary information will undoubtedly slow down the system.

To determine which table data consumes more than 20% of the total storage, focus here.

For each table data, figure out the value of a single record. Some table data (especially accounts or contacts) are not to be touched, because privacy information may be involved, and the removal of that information can cause unnecessary inconvenience (especially when the table data is relevant to the external System). Other table data, such as anonymous information in the Marketing automation system, can be completely deleted.

Before proceeding to the next step, complete backup of all cloud services to disk or optical media, I would say: This step is important.

For the table data you can freely subtract, consider using the "Signal-to-noise ratio" method. Do you need to keep the information completely irrelevant for a while? For example, in a marketing automation or network monitoring cloud, do we really care about anonymous visitors 6 months ago? Can messages with a signal-to-noise score of less than 0 be deleted? Before using this method, make sure that you get the consent of all the relevant user groups first, The data pruning method based on Snr can delete millions of unnecessary records in a short time.

Some table data have a good number of signal-to-noise scores, but over time these stored details are not worth it. For example, many marketing automation and e-mail blasting systems use activity tables to record important email and network interactions. These active tables may occupy half of the storage in the system. But how important was it that a user watched video A or video b a year ago? Use this as a "touchstone": If a particular detail does not actually change anyone's decision or behavior, it is no longer "information." In this case, we recommend a compression method: Save the information and then delete most of the information after about six months. These historical information is often used as a custom table store, in the form of a bitmap of token characters or small storage requirements. This strategy requires serious consideration, but can be based on the value of information to cut a lot of unnecessary information.

Some table data, especially contact information, can collect a lot of duplicate information in a short time, especially if your company has problems with information gathering and processing. If your cloud system provides a de-duplication tool (from a major vendor or a third party), you can buy a better tool and study it carefully. The best tools have a fuzzy logic algorithm that allows you to find and merge duplicate data without having to move the data Izumo the environment. This merge process works for most data, but if you have a lot of data conflicts (for example, two different phone numbers for the same user), you might want to create a shaded field and use different data for pre-filled before merging. For a variety of reasons, data consolidation has to take several stages: it's clear that 100,000 of duplicate data requires a lot of CPU events, and you're thinking time. Do not rush to merge because once merged, you cannot undo it.

Most of these steps are a one-time fix, not a process change. If you're not willing to spend money to improve your data management process, you may need to focus on this progress from time to time.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.