KeywordsLarge data large data presentation of large data large data representations large data published large data representations large numbers publications consumer guides
What is the big data? The answers are varied, with Gartner's famous 3V feature the most commonly cited definition-large data refers to high-capacity, high speed, and diverse information that requires new tools to manage. Still, the definition of large data remains vague, and the Open Data Center Alliance (ODCA) is trying to help companies answer big data, why it is so important, and how to benefit from its latest "Big Data Consumption Guide".
ODCA was founded in 2010 with the main goal of developing open standards for cloud computing. In a telephone interview, ODCA said the alliance's "Big Data Consumption Guide" is a logical extension of its cloud computing work. "There are a lot of cloud computing paradigms, which are the advantages that large data environments want to take advantage of," says John Pereira, chief technical advisor at ODCA. ”
Pereira pointed out that the nature of large data is very suitable for the cloud environment, especially large data data can be significantly increased in a very short period of time.
He added: "Because of the nature of the big data, you might consider a distributed environment, and the cloud computing paradigm will help you move in that direction." ”
The consumer guide summarizes how large data platforms can help businesses. For example, a bank can associate data from multiple unrelated sources to discover potential credit card fraud. In addition, the guide provides a uniform definition and terminology that can be used by enterprises in cooperation with large data service providers.
The guide also cites astonishing statistics from IDC: Unstructured data accounts for more than 90% of the information in the enterprise today, most of which are stored in documents, e-mail, text, and Web content.
Unstructured data belonging to "Big data" include machine-generated data from sensors, machine logs and mobile GPS signals, as well as data from social networking sites and online transactions.
The consumer guide calls Apache Hadoop "the leading Big data technology", but points out that there are many other open source large data items to choose from, including Riak, MongoDB, CouchDB, Redis, hypertable, Storm, Spark and High-performance Computing Cluster (HPCC).
"We are trying to introduce a vendor-neutral approach to our proposals and directions," Pereira said. "We try to avoid a preference for a particular supplier." ”
The ODCA chief says companies need to carefully plan their big data strategy in advance to avoid wasting resources and money on bad practices.
"You want to write data in the most efficient way, rather than copying the same dataset over and over again, it's very important how you record the previous information," said Marvin Wheeler, executive director of ODCA, "which is mainly about how to write data, To ensure that the data does not spread as it does in traditional methods. "Dealing with data sprawl is a key issue facing businesses. According to the McKinsey Global Research Institute, 15 of the 17 U.S. business units have more data stored than the Library of Congress. And some researchers estimate that 90% of the data is generated in the past two years. More and more use of video analysis technology is an example.
"If you go back five years ago, who would have thought of saving videos and analyzing them to make better business and shopping decisions, and now that's what everyone thinks," Pereira says, "that goes back to the very core of the big data, and that's one of the reasons that makes big data interesting new technologies and paradigms. "(邹铮 compiled)
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.