A note on the six major technological changes in China's large data

Last Update:2015-03-18 Source: Internet

Author: User

Keywords Big data we China

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Set "Hadoop China cloud Computing Conference" and "CSDN large data Technology conference" The essence of the great, successive Chinese large Data technology conference (BDTC) has developed into the domestic de facto industry's top technology event. From the 2008 60-man Hadoop salon to the present thousands of-person technical feast, as the industry has a very real value of the professional Exchange platform, each session of China's large data technology conference faithfully portrayed in the field of large data technology, sedimentation of the industry experience, witnessed the whole large data eco-circle technology development and evolution.

December 2014 12-14th, sponsored by the China Computer Society (CCF), CCF Large data Experts committee, the Chinese Academy of Sciences and CSDN jointly hosted the 2014 China Large Data Technology conference (DA data Marvell Conference 2014,BDTC 2014 will be opened at Crowne Plaza Hotel, New Yunnan, Beijing. The General Assembly lasts three days to promote the development of large data technology in industry applications. To set up a "large data infrastructure", "large Data ecosystem", "large Data Technology", "large Data Application", "large data internet finance technology", "intelligent information processing" and many other theme forums and industry summits. Sponsored by the China Computer Society, CCF large data committee of experts, Nanjing University with the co-organizer of the "2014 second CCF large data academic conference" will also be convened, and the technical conference to share the theme of the report.

The Conference will invite top experts and front-line practitioners in nearly 100 foreign data technology fields to discuss the latest development of OSS, YARN, Spark, Tez, HBase, Kafka, oceanbase, etc., Nosql/newsql, memory calculation, The development trend of flow calculation and graph computing technology, OpenStack ecosystem for large data computing needs, and large data visualization, machine learning/depth learning, business intelligence, data analysis, the latest industry applications, sharing the actual production system of technical characteristics and practical experience.

In the early days of the conference, we combed the highlights of the Conference to record the development of China's large data technology, and based on the current situation of the ecological circle to the upcoming BDTC 2014 outlook:

Trace the data to six major technological changes

Along with the development of large data technology conference, we have witnessed the arrival of China's large data technology and application era, and have seen the development and evolution of the whole large data ecological circle technology:

1. Compute the distribution of resources-from grid computing to cloud computing. Looking back to the previous BDTC conferences, it is not difficult to find that, since 2009, the organization and scheduling of resources has gradually shifted from a cross-domain distribution of grid computing to a locally distributed cloud computing system. Today, cloud computing has become a platform for large data resource protection.

2. Data storage changes--hdfs, NoSQL emerged. With the increasingly diverse data formats, traditional relational storage has been unable to meet the needs of the new era of application, HDFS, NoSQL and other new technologies, and has become an integral part of many large application architectures, but also led to the development of customized computers/servers, It has also become one of the hottest technologies in the large data ecology circle.

3. The calculation mode changes the--hadoop calculation frame into the mainstream. To support its search service better and cheaper, Google created Map/reduce and gfs. The original Yahoo engineer Doug Cutting, inspired by Google's paper, pioneered the Hadoop software ecosystem, which is vastly different from the High-performance computing model, to compute data. Hadoop is born noble, and today has become the Apache Foundation's most "hot" open source project, is recognized as a large data processing of the fact standard. Hadoop provides the ability to handle massive amounts of data in a distributed environment at low cost. Therefore, Hadoop technology research and practice sharing has always been one of the most bright features of China's large data technology conference.

4. Flow computing technology introduced to meet the application of low latency data processing requirements. With the expansion of business requirements, large data gradually out of the category of off-line batch processing, Storm, Kafka and so on real-time, scalability, fault tolerance and flexibility to play the most vividly the flow processing framework, so that the old message middleware technology can be reborn. Become the previous bdtc on a beautiful scenery line.

5. Memory calculation of the beginning of the clue-upstart Spark dare to challenge the veteran. Spark originated at the University of California, Berkeley Amplab Cluster computing platform, based on the memory calculation, from the multiple iterative batch processing, inclusive data warehousing, flow processing and graph calculation and other computational paradigm, is a rare all-round player. In just 4 years, Spark has developed into the Apache Software Foundation's top project, with 30 Committers, its users include IBM, Amazon, Yahoo!, Sohu, Baidu, Ali, Tencent and other well-known companies, but also includes Spark SQL, Spark streaming, Mllib, Graphx and many other related projects. There is no doubt that spark has a firm footing.

6. Relational database technology Evolution-newsql overwrite database history. The development of relational database systems has not stopped and is progressing in terms of horizontal scaling, high availability, and high performance. In practical application, the demand of MPP (massively Parallel 處理) database oriented to online analytical processing (OLAP) is most urgent, including MPP database learning and new technologies in large data field, such as multiple replica technology, column storage technology and so on. While the database oriented to online transaction processing (OLTP) is evolving toward high performance with high throughput and low latency, and the technology development trend includes full memory and no lock.

Based on the sail, look at the development of 2014 Data Ecological Circle

The NO. 2014 China Large Data Technology conference will be held as scheduled. What can be bdtc on the 2014 's when the technology is changing? Here we may wish to focus on the current technological trends:

1. MapReduce has become a decline, Yarn/tez can create more brilliant? For Hadoop, 2014 was a year of euphoria. EMC, Microsoft, Intel, Teradata, Cisco and many other giants have increased their input to Hadoop. For many institutions, however, the year is not easy: the mapreduce of real-time short boards and the need for more general-purpose large data-processing platforms, Hadoop 2.0 transition is imperative. So what kind of challenges will the organization face in the transition? How can institutions make better use of the new features brought about by yarn? What are the major changes in Hadoop's future development? To this end, BDTC 2014 invited the Apache Hadoop committer,apache hadoop Project Management Committee (PMC) member Uma Maheswara Rao g,apache Hadoop committer Yi Liu,bikas-Saha (PMC member of the Apache Hadoop and Tez) and other international top Hadoop experts, we might as well discuss it face-to-face.

2. The future of Storm, Kafka and other flow computing frameworks is uncertain. If MapReduce's slowness has brought a lot of flow-computing frameworks to the system, what is it to meet these flow computing frameworks when the Hadoop ecosystem is more mature and spark more usable? Here we may as well BDTC 2014 near hundreds's practice share to carry on one side understanding, also or with the expert face-to-face communication.

3. Spark, Subversion or supplement? Compatible with the Hadoop biosphere, the spark development is changing rapidly. However, according to the recent sort benchmark published ranking results, in the mass (100TB) off-line data ranking, compared to the last champion Hadoop,spark with less than one-tenth of the machine, only one-third of the time to complete the same amount of data sorting. There is no doubt that the current spark has not stopped in real-time computing, the goal is directed at the general-purpose large data processing platform, and the termination of shark, open Spark SQL may have begun to emerge. So, when the spark more mature, more original support off-line computing, open source large data standard processing platform This honor will be spent who home? Here we look together.

4. Infrastructure layer, what is the upgrade of our network? Today, the network has become a large number of large-scale data processing platform of the target. For example, in order to overcome the network bottleneck, Spark uses the new Netty network module to replace the original NIO network module, thus increasing the utilization of network bandwidth. So how do we overcome the bottleneck in the infrastructure layer? How much performance improvement can be generated directly by using more efficient network devices, such as InfiniBand? To build a more intelligent network, through the calculation of each phase, adaptive to adjust the split/merge phase of the data transfer requirements, not only increased speed, but also increased utilization. On BDTC 2014, we can learn from INFINIBAND/RDMA technology and application lectures, as well as a number of SDN practical experience.

5. The soul of data mining-machine learning. In recent years, the machine learning field of talent snatch has entered the white-hot, similar to Google, IBM, Microsoft, Baidu, Ali, Tencent investment in the field of machine learning is more and more high, include the chip design, system structure (heterogeneous computing), software systems, model algorithms and in-depth applications in all aspects. Large data signs a new era of arrival, PB data let people sit in Jinshan, however, the lack of intelligent algorithm, machine learning this soul, the value of the extraction has undoubtedly become mirrors. At this session, we have also prepared a number of machine learning related to share, awaiting your participation.

In addition to technology sharing, the second CCF conference will be held in 2014 and shared with the technical conference. At that time, we can also harvest many of the latest scientific research from the academic field.

(Responsible editor: Mengyishan)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More