10 great revelations from big data pioneers

Source: Internet
Author: User
Keywords Can can we can we store data can we store data large data can we store data large data provide

There is no doubt that the big data age has come. So how do we deal with this situation? Now, let's hear what the experts with the experience say.

First, we need to know how to make the most of the big data in hundreds of terabytes of information. It all depends on the individual's needs and preferences. Interclick Advertising Services has found a way to provide more efficient solutions while providing near-real-time data analysis. Harvard Medical School has also learned that data can grow significantly in the case of the number of patients and the data that remain unchanged for years. ComScore, an Internet traffic monitoring agency, has over 12 years of experience in compressing data from a storage database, in fact, it uses sequencing techniques to optimize compression and reduce processing requirements.

Currently, including Yahoo, Facebook, Twitter, Netflix and eharmony (AOL dating site), Hadoop is an idealized low-cost processing unstructured data platform. It meets not only the needs of the internet giants, but also the needs of JPMorgan Chase and other mainstream traditional companies. Data provider Infochimps also found that Hadoop will be a fast-growing solution for deployment support as more and more additional and ancillary applications are provided.



Application in the age of large data

Of course, not all large data deployments are measured on a total scale. For example, LinkShare is only a few months old but needs to load and quickly analyze up to dozens of gigabytes of data a day, so it is a larger deployment for these data sizes on a daily basis. In addition, we need to be aware of the six dimensions of Data Warehouse extensibility. Only in this way can we develop a more accurate approach to meet the most demanding testing needs and to acquire technology investments to meet future needs.

Quick query to ensure efficient and timely

Large-scale parallel processing platforms, column storage databases, database processing techniques and memory technology can greatly reduce data query time from days to hours to minutes or even seconds. But that's not enough. New York advertising agency Interclick found that the most important benefit of rapid analysis is efficiency. Quick response can buy more time for more in-depth inquiries. The second benefit is that the results of near-real-time analysis can be obtained, and this analysis helps to improve the level and accuracy of the decision response.


Large data analysis technology used in Interclick

By responding quickly, Interclick can subdivide the behavior of surfers online within hours or even minutes. It can visit the tourist site, booking the hotel site, such as the network Name behavior information, sent to the corresponding airlines, hotel chains, car rental companies. Interclick uses the Paraccel column storage database deployment, where the memory cluster can hold 3.2TB of capacity data.

compression and reduced storage costs

Second, the measurement of data growth know what is expensive

With 20 years of medical records and research into the efficacy and risks of drugs, Harvard Medical School has learned the lesson that planning data warehousing requires more than simple questions such as the number of customers, records, and transactions. Although the number and duration of patients remain stable, medical records have been enriched, as many new health-monitoring technical indicators have emerged. Therefore, it is essential to understand all dynamic requirements in advance.



Harvard Medical School

Third, data compression and cutting storage costs

Better data compression can save per TB of hardware costs. Column storage database, which can achieve 30:1 or 40:1 compression ratios than HP's Vertica, Infobright, Paraccel, and Sybase IQ. Row storage databases, such as EMC Greenplum, IBM Netezza, and Teradata, average 4:1 compression ratios. This is because the column data can be consistent, including zip code, purchase order number, and many other data. Row-like data, such as customer-related attribute combinations-names, addresses, zip codes, purchase order numbers, and so on-do not have this advantage. Aster data and Oracle databases can provide mixed row/column storage capabilities. Oracle's hybrid columnar compression can provide a 10:1 compression ratio.


Data compression

The compression ratios vary largely depending on the data itself, and column storage is not always the best choice. If you need to call a large property in a data query, the row storage scenario may show better performance. In fact, the row storage database is often used by enterprises in the Data Warehouse processing mixed queries, while the column storage database is more concentrated in the mass data query.

Iv. classification compression and reduction of processing time

Like continuous column data for compression, we can also classify data before loading to increase the compression ratio. While loading data into Sybase IQ Xeon, ComScore uses syncsort dmexpress software to classify data. Michael Brown, the company's CTO, says it can compress 10 bytes of data into 3, 4 bytes, and 10 bytes of data can be compressed into 1 bytes after categorization. "This will give us another way to store massive amounts of data." ”

(Responsible editor: Duqing first)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.