Data is growing exponentially, and unstructured data from social media (micro-credit, micro-blogging) and sensor devices has received increasing attention, along with structured data from traditional enterprise trading systems, which could lead to a new round of industrial change. Words such as machine learning, natural language processing, and public opinion analysis are reported almost daily in the media, but few are really talking about their large-scale applications.
Today, almost everyone in corporate CIOs is talking about big data, and many people think that big data is a Hadoop cluster, where all the data is stored and analyzed through a variety of API calls. The answer is not so simple, however, that large data is associated with all aspects of it, from infrastructure to middle tier to front-end applications. Hadoop is not everything, it's just a part of big data, and we need to think about more things to really harness big data.
What are the problems with our data?
In fact, the basic problem with the big data we're going to solve is storage and reporting. The key to how we store fast-growing data is that it can be accessed quickly when business people need it. Ad hoc query generates reports, forecasts the business, and makes full use of hidden values in large data streams.
What types of data do you have? Relational data, unstructured data, or audio video? How do you store different types of data and then allow people within your organization to access that data? The answer is in the cloud, cloud storage technology can basically deal with the storage requirements of large data, you can store any type of data and then easily expand. High-end SAN storage technology is outdated in the big data age, and the high cost is beyond the enterprise's affordability. SAN storage is suitable for critical business data, and each record is critical to the enterprise. Big data on the contrary, such as sales orders, this data can never be lost, and a microblog or log file is not so high requirements. Cloud storage services, including Huawei and Amazon, can use inexpensive devices to provide large data storage solutions for businesses that are reliable, scalable, and cost-efficient.
Of course, the advent of television did not doom radio stations, as was the case in the large data age. Sans still have its value, but not all of the data is there. We need cloud storage because different types of data have special storage requirements. For example, read-intensive data generally require relational databases, log files can only use HDFS storage, data requiring large amounts of write operations require a NoSQL database, and a system with a large number of read and write operations requires powerful large data architectures to support them. Your system may require low latency, high consistency, high reliability, or the need to control storage costs, each of which means a different storage solution. Low latency can mean a SDD or memory device, high consistency means building a trading system, and high reliability means using database replication capabilities. So, big data has said goodbye to the "Sihai" relational database era, Oracle+ minicomputer + high-end storage (or IOE) combination has not been able to deal with all the data problems.
So what does the enterprise need? The answer is a flexible, scalable cloud storage solution delivered in a service manner that meets these different storage requirements. such as Amazon's Rds,dynamodb, Huawei's object cloud storage and so on. Of course, not all cloud storage can meet demand, businesses need more flexibility, and for reasons such as latency and compliance, this requires the ability to quickly and easily migrate data from different systems, from the internal system to the public cloud, or from one cloud provider to another.
Business intelligence and the future development of ETL
In addition to storage solutions, we should also focus on front-end applications. Traditional ETL will also undergo change. Business people certainly don't want to let IT departments change schemas to import a little extra data, and that's too much work and cost. The ideal state is to have a simple tool that allows business people to do ad hoc queries, such as tableau, to solve such problems. However, as the volume of data grows and TB or even petabytes of data needs to be processed, the cost of the software is taken into account.
The future of the ETL and BI tools will run on the web side, and any business person can use it to generate reports. Dynamic HTML5 user interface can be dragged to complete the data query and report generation, if you need to train the business users can use these tools, then really out.
The next generation of BI tools can deal with real-time, graphical, large-object, and unstructured data while storing it in the cloud. Each type of data can be hosted on different cloud services, but can be accessed through an API. A cloud service provider can handle a business without having to worry about what kind of data to store.
Finally, it's important. More and more enterprises are beginning to realize that data analysis has become a kind of "strategic weapon". The next generation of business giants may be born from companies that know how to collect data and transform them into valuable insights. With a start, the enterprise needs to solve how to store large data, and the answer is in the cloud, before building large data analysis models or investing in machine learning and recruiting data scientists.
Original link: http://www.searchbi.com.cn/showcontent_82331.htm
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.