Big Data learning, big data development trends and spark introduction

Source: Internet
Author: User

Big Data learning, big data development trends and spark introduction

Big data is a phenomenon that develops with the development of computer technology, communication technology and Internet.
In the past, we did not realize the connection between people, the data produced is not so much now, or not to record the resulting data, even if recorded, we do not have a good tool to deal with the data, analysis and mining. With the development of big data technology, we are beginning to have this ability to explore the value of data.
Big Data technology was a batch processing technology represented by MapReduce before 2012, and after 2013 it was the big data processing engine represented by spark; Looking ahead, there is growing concern about the combination of artificial intelligence and big data, and the hope that artificial intelligence technology will unlock more value from big data. In recent years, the outbreak of artificial intelligence, but also thanks to big data technology in the storage, calculation and algorithm of the rapid development, so artificial intelligence and big data is inseparable, leaving big data, artificial intelligence is water without, no wood of this. For example, if artificial intelligence is likened to a rocket, then big data technology is the fuel that pushes the rocket.
Above, we look at the development trend of big data technology from the macroscopic point of view, let us take a technician's view to see the system architecture of big data platform which is used in most enterprise today.
First, the enterprise collects data from various channels, the data through the message subscription system, some of the loss of computation and processing, support online and real-time analysis, the other part of the data into a relatively static data lake, the middle will involve the data cleaning, filtering, reprocessing and other operations, In addition, data can be structured to optimize the business, such as merging large numbers of small files and so on. Data lake data can be used to support business analysis reports, data mining, artificial intelligence and other applications. In fact, Spark is the most common Big data computing engine currently in use. Spark is the core component of data processing and analysis in the business systems of each large enterprise. Simply put, raw data often requires a series of processing by spark to be used in applications such as artificial intelligence, and Spark has become an implementation standard in the Big Data processing field. So in the current era of big data +ai, it is because of the big data technology like spark that enables enterprises to build business systems faster and better, to serve the required applications, and to fully combine the capabilities of big data and AI to further explore the value of the data.
Let's take a look at spark next. As a star in big data technology, Spark is a versatile, high-performance cluster computing system. It originated in a research project of UC Berkeley AMP Lab, open Source in 2010, joined the Apache Foundation in 2013, and today Spark has 500,000 meetup members worldwide, and Spark's open source community has 1300+ developers, Spark is also widely used in businesses and universities.
So what exactly is it that spark can get people to favor? The 1th reason is that it's high-performance, 100 times times faster than traditional mapreduce, and makes the spark project very compelling at first. Second, it's versatility, and Spark lets you write SQL, streaming, ML, and graph applications in a pipline, and no system can do that before the spark number. 3rd, Spark supports a variety of APIs, including Java, Scala, Python, R, and SQL, and is designed to be simple and easy to use. Not only that, spark also builds a rich ecosystem around it, and he can handle a variety of data sources such as HBase, Kafka, MySQL, and more, as well as a variety of data formats such as parquet, ORC, CSV, JSON, and more. It also supports multiple modes of deployment, Yarn, Mesos, Kubernetes (also referred to as k8s), and Spark provides a separate standalone deployment model.
Through the above, we probably understand the big data trends and the characteristics of spark, whether it is not the best, want to learn more big data, spark information, please login Huawei Cloud Academy (https://edu.huaweicloud.com/)
Learning related courses "Huawei Cloud data Lake Exploration service", "Big Data Introduction and Application" ... There are more wonderful courses waiting for you to learn!

Big Data learning, big data development trends and spark introduction

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.