3721.html ">2014 April 19" China Spark Technology Summit (Spark Summit Chinese 2014) will be held in Beijing, home and abroad Apache Spark community members and business users will be in Beijing for the first time. Spark contributors and front-line developers of Amplab, Databricks, Intel, Taobao, and NetEase will share their spark project experience and best practices in the production environment.
As a common parallel computing framework, Spark has become a popular open source project after Hadoop, and has been supported by many enterprises. On the eve of the Spark Technology Summit, the reporter was fortunate enough to interview the Speaker of the Summit, the senior researcher of NetEase, and Dr. Wang Jianzong, a member of the large Data committee of the China Computer Society. He has served as HP Cloud computing advanced Solution experts and the University of California Electronics and Computing Engineering researcher, is responsible for the NetEase game large data frame research and deployment work, in its promotion successful spark stable application in the production environment.
Wang Jianzong
CSDN's interview is organized as follows:
-What are the reasons that attract you to delve into spark technology?
The main thing that Berkeley Amplab attracted me, I am a believer in the phrase: "amp produced, will be fine", basically every week I will focus on their lab's homepage, read their technical reports and papers, closely follow their research forefront, currently hatching some of the biological calculation, multi-core, Machine learning and other projects may also affect the entire computer field in the future.
Remember that five or six years ago, when cloud computing was brilliant, Amplab's "adjective the clouds:a BERKELEY VIEW of CLOUD COMPUTING" was the most cited article in cloud computing, Looking back at this article, you will find that the current direction and strategy of cloud computing still cannot escape the content defined and planned by this article at that time.
Let me say another example, I used to be in storage research, and Amplab's RISC, RAID, now can be said to have changed and far-reaching impact on the computer industry, remember when I was studying in the United States, and Amplab founder David Patterson had a face-to-face communication , for this outstanding scientist still maintain the enthusiasm of the technology, every day still and students meeting to the late night of professionalism deeply moved, there are such a group of people, you say they are out of the thing with suspicion?
Spark first appeared, I was very concerned about, this is the Berkeley Amplab in the Big data era of the killer, but also the future can unified large data areas of the weapon, applied about 30 years ago when they invented raid, I think "Spark will bring tens of billions of market value."
-What are the spark advantages for solving problems?
Spark's advantage is unique, it is a complete large data processing ecosystem, in addition to the underlying storage HDFs also need to use Hadoop ecosystem, other aspects can completely replace hadoop,hadoop own in usability, reliability, Some disadvantages of real time I don't repeat this here, and Spark is the only alternative to the revolutionary Hadoop.
-What is the biggest difficulty for enterprise application Spark at present?
At present the biggest difficulty still is the human factor, the understanding spark person is still too few, I now go to and some enterprise has the big data demand CEO to chat, some units use Hadoop people are very few, how to talk about spark?
At present, Spark is in the initial stage of enterprise application, mainly some large companies in the application, indeed all aspects or immature, so the cultivation of a group of spark technical experts to promote the application of enterprises can not wait, but also the biggest problem of enterprise applications.
-According to your understanding, what is the current situation of spark development?
At present, spark the whole large data processing ecosystem, such as stream processing, graph technology, machine learning, NoSQL query and so on have their own technology, and has just become the Apache top project, in Hadoop 2.0 also perfect integration, In addition Amplab a group of people set up a company to fully promote, some professors gave up their public office, some doctors interrupted their studies, I believe that with such a hero broken wrist will certainly be able to spark development, Hadoop from the promotion to the current large-scale applications also walked more than five years, and spark still have a relatively long way to go.
-Please talk about the topic you are about to share at this conference.
I mainly introduce myself spark some experience gains and losses, because spark for you and I he is a new thing, are in the learning stage, I just walked a small step, so I have some experience to summarize expectations for everyone to apply landing spark to provide reference and help.
-which listeners should be most aware of these topics, and what are the topics that can help the audience solve?
I think all of the companies or individuals who are interested in large data processing, or who have a real time problem with data processing, can come to this summit. I believe that after this summit, we can firmly go back to use spark to mention Hadoop's determination and confidence, more content we can exchange at the summit.
Written at the end:
At the end of the interview, Wang Jianzong expressed his blessing to the China Spark Technology Summit, and he wished China Spark Technology summit to be held successfully to create more value for the society by Technology.