KeywordsThese these puzzles these puzzles the big data these the confused the big data we these puzzles the big data we the existence
Recently, we've heard too much about "big data", lots of new apps, about Hadoop, NoSQL and a variety of new analytics software. I recently spent a lot of time talking to people and thinking about these trends, and finally convinced that the huge changes we've seen, including the data we've collected, and how we're going to deal with that data for individuals, companies and societies.
What the organization should do with the data, how to translate the raw data into the information used to make decisions, we are only in the early stages of full reflection. However, I also believe that the term "big data" may be more confusing than practical. Jeff Bedell, chief technology officer for data analysis, MicroStrategy Jeff Buder, told me that "big data" is just a buzzword, "the whole game is the introduction of confusing terminology." ”
For example, Gartner's description of large data is not just the amount of data, but also its type, speed, and complexity. Mark Beyer, a Mark Baire analyst at the Extreme Information Management Symposium last fall, said the company needed to build a modern information management system that included logical data warehouses.
It may be more relevant to consider the various changes in how the organization handles data, rather than talking about "big data" as a thing.
There are, of course, a lot of real data in some cases. The Large Hadron Collider produces petabytes of data each year (15,000 TB), while the upcoming Spherical radio telescope project is expected to produce a number of EB (1 million TB) of data per day. However, these projects are relatively rare and more relevant to high-performance computing than typical business cases.
In contrast, the databases that the most typical organizations are working on are significantly smaller, but can still be measured as terabytes and petabytes. (This is still a large amount of data.) This data can come from a variety of sources: tracking what people do on a website or multiple sites, analyzing social networks, or processing data generated by sensors.
In talking about the results of the data, before the recent changes, it may be helpful to review some of the big trends in this area so far.
Database-The history of the collection of data, almost as long as a digital computer, specifically, IMS products running on mainframe systems like IBM. Early databases are layered systems, but the model changes and becomes a standard is still a relational model. These date back to the 1970 Edgar · F. Cod (Edgar F. Codd) is a paper entitled "Relational Model for large-scale shared data bank data".
Today, one or more of these products are still in use by each large organization to store their transaction data, such as Oracle databases, IBM DB2, Microsoft SQL Server and open source MySQL (still Oracle owned). On the relational database, a variety of applications have been built, including inventory, accounting, Enterprise resource Planning (ERP), Customer Relationship Management (CRM), human resource applications, and thousands of large organization-tailored applications.
In particular, as the number of transactions has become more complex, often distributed across multiple machines, many companies have implemented online transaction processing systems (OLTP, also known as transaction-oriented processing systems).
Over the past few decades, a big change has been the emergence of business intelligence platforms and data warehouses, often but not always running together.
Data warehouses typically store copies of data from business systems, but these systems themselves are not used in uninterrupted business operations. Instead, they are used to preserve the history of data and to integrate multiple systems, often as a starting point for analytical applications. Teradata's Data Warehouse products may be the most famous, but in recent years Oracle's Exadata product line (acquisition of sun Revenue), and IBM (including its acquisition of Netezza assets) has received more attention, as well as pure software manufacturers, such as Greenplum ( is now part of EMC).
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.