December 2014 12-14th, hosted by the China Computer Society (CCF), CCF large data Expert committee, the Chinese Academy of Sciences and CSDN co-organizer of the 2014 China Large Data Technology conference (DA data Marvell Conference 2014,BDTC 2014 will be opened at Crowne Plaza Hotel, New Yunnan, Beijing. The General Assembly lasts three days to promote the development of large data technology in industry applications. To set up a "large data infrastructure", "large Data ecosystem", "large Data Technology", "large Data Application", "large data internet finance technology", "intelligent information processing" and many other theme forums and industry summits. Sponsored by the China Computer Society, CCF large data committee of experts, Nanjing University with the co-organizer of the "2014 second CCF large data academic conference" will also be convened, and the technical conference to share the theme of the report.
The Conference will invite top experts and front-line practitioners in nearly 100 foreign data technology fields to discuss the latest development of OSS, YARN, Spark, Tez, HBase, Kafka, oceanbase, etc., Nosql/newsql, memory calculation, The development trend of flow calculation and graph computing technology, OpenStack ecosystem for large data computing needs, and large data visualization, machine learning/depth learning, business intelligence, data analysis, the latest industry applications, sharing the actual production system of technical characteristics and practical experience.
Before this meeting, CSDN and this conference "Big Data Application" speaker of the Software Co., Ltd. data architect Qi on his own large data technology practice, his judgment on the trend of large data technology and the application of large data in traffic information has made a simple communication.
He is one of the first companies to study map navigation and traffic information in China, currently has the most comprehensive and perfect transportation information collection and distribution system, received billions of of industry taxis and the public to return the location of data, accumulated mileage of hundreds of millions of kilometers, they have accumulated a lot of experience in the storage, processing and application of these data. Qi will be on December 14 "2014 China Large Data Technology conference" and you share and exchange more traffic data of dry goods! Click here to register!
QI Software Co., ltd data architect
2008 graduated from Beijing Polytechnic University, successively in Baidu and gold work. In Baidu, Baidu has participated in the basic KV storage system, OLAP data Warehouse, user profile warehouse and other projects, which laid the Phoenix Nest, the Network Alliance, search and other core business data base, in the development practice has accumulated rich experience in large data; After joining the high Germany, participated in the network navigation service, Traffic information processing and application projects, is mainly responsible for the design and development of large data architecture and traffic data mining and analysis, led the team to build a high gold traffic information Data Warehouse and data development platform, and traffic information data application of new road identification, road property modification and other production projects, 2014 production of high-German traffic reports, causing widespread media dissemination. When he joined Ali, he began to lead the team to move cloud computing to Ali's cloud platform.
Qi interview transcript as follows:
CSDN: What large data technologies have you used in your company? What are your satisfaction with these technologies and where are you dissatisfied?
QI: We used Hadoop, HIVE, HBase, FLUME, KAFKA, Storm and other technologies, its advantages do not repeat, to solve the massive data storage and computing problems, streaming way Mr Easy to master different language programmers, mainly dissatisfied with several points:
1 Authentication Authority management is either too complex, or too simple, certification authority did not really fall into practice, resulting in a lot of security problems.
2 Flume log acquisition is not stable, pressure and abnormal problems, often resulting in data loss or duplication.
3 The hive system still has bugs, and some of the data generated under the massive data can not even be detected.
At present, we have begun to use the Aliyun Odps,ots,timetunnel and other infrastructure, the full use of Ali's distributed storage and computing resources, the advantages are:
1 The right management is more perfect
2 Data acquisition is very convenient, and Ali's other facilities can be easily docking
3 Flow-type computing framework simplified a lot of business statistics
But there are some drawbacks:
1 ODPs for security, added more restrictions, no hadoop freedom.
2 ODPs support for data structure and syntax is weaker than hive
CSDN: According to your understanding, the current similar enterprises, in the data, the biggest difficulties encountered?
QI: I think the major data difficulties are mainly as follows:
1 The operation of large data platform is a very complex work, although the current distributed system in disaster recovery and transfer of a lot of efforts, but still prone to failure, how to solve these problems to maintain the stability of the system is still a problem.
2 the separation and sharing of resources is always a contradiction, how to meet and control the ever-expanding computing and storage needs test the ability of managers and developers, the measurement of cost, output and efficiency is still the whole process of product design and development.
CSDN: What are some of the technologies you are looking at and studying in large data areas, and why are you bullish on them?
QI: Our main focus is on the following technologies:
1 The evolution and development direction of NoSQL storage. Last year, Google released its Spanner paper, pointing out the direction of NoSQL storage, looking at the blurring of the boundaries with SQL, the existing NoSQL system chose the latter before function and performance, but it does not mean that developers have no need for such functions as distributed transactions, Often in order to achieve the latter need to repeat development to meet similar needs.
2 real-time query such as Impala system evolution, in many cases, response time determines productivity, OLAP analysis can not always rely on the pre-built model.
3 Machine learning technology is always worth learning, that is the future of large data.
CSDN: Please talk about the topic you are about to share at this conference.
QI: Random urban motor vehicles increased, more and more cities began to enter the "congestion" era, how to easily travel without navigation map and real-time traffic information support. I mainly share traffic information in the production and life of the application of large data. Gould currently has a nationwide range of industry and public users to return GPS data, how we use these data, what they produce value, how the Gold Traffic report is released and so on.
CSDN: Which listeners should know these topics best? What topics can you share to help your audience solve problems?
QI: I would like to offer some help to listeners interested in China's urban traffic and geo-information data, and this sharing will let the audience know how traffic data can help us solve travel problems, the laws of road congestion and the causes.
CSDN invites you to participate in China's large data award-winning survey activities, just answer 23 questions will have the opportunity to obtain the highest value of 2700 Yuan Award (a total of 10), speed to participate in it!
National Large data Innovation project selection activities are also in full swing, details click here.
The 2014 China Large Data Technology Conference (Marvell conference 2014,BDTC 2014) will be held at Crowne Plaza Hotel, New Yunnan, December 12, 2014 14th. Heritage since 2008, after seven precipitation, "China's large Data technology conference" is currently the most influential, the largest large-scale data field technology event. At this session, you will not only be able to learn about Apache Hadoop submitter uma maheswara Rao G (a member of the project Management Committee), Yi Liu, and members of the Apache Hadoop and Tez Project Management Committee Bikas Saha and other shares of the general large data open source project of the latest achievements and development trends, but also from Tencent, Ali, Cloudera, LinkedIn, NetEase and other institutions of the dozens of dry goods to share. There are a few discount tickets for the current ticket purchase.
Free Subscribe to the "CSDN large data" micro-letter public number, real-time understanding of the latest big data progress!
CSDN large data, focus on large data information, technology and experience sharing and discussion, to provide Hadoop, Spark, Impala, Storm, HBase, MongoDB, SOLR, machine learning, intelligent algorithms and other related large data views, large data technology, large data platform, large data practice , large data industry information and other services.