December 2014 12-14th, hosted by the China Computer Society (CCF), CCF large data Expert committee, the Chinese Academy of Sciences and CSDN co-organizer of the 2014 China Large Data Technology conference (DA data Marvell Conference 2014,BDTC 2014 will be opened at Crowne Plaza Hotel, New Yunnan, Beijing. The General Assembly lasts three days to promote the development of large data technology in industry applications. To set up a "large data infrastructure", "large Data ecosystem", "large Data Technology", "large Data Application", "large data internet finance technology", "intelligent information processing" and many other theme forums and industry summits. Sponsored by the China Computer Society, CCF large data committee of experts, Nanjing University with the co-organizer of the "2014 second CCF large data academic conference" will also be convened, and the technical conference to share the theme of the report.
The Conference will invite top experts and front-line practitioners in nearly 100 foreign data technology fields to discuss the latest development of OSS, YARN, Spark, Tez, HBase, Kafka, oceanbase, etc., Nosql/newsql, memory calculation, The development trend of flow calculation and graph computing technology, OpenStack ecosystem for large data computing needs, and large data visualization, machine learning/depth learning, business intelligence, data analysis, the latest industry applications, sharing the actual production system of technical characteristics and practical experience.
Before this Congress, CSDN and the conference "Big Data Technology Forum," the keynote speaker Admaster Technology vice President Lu billion, made a simple communication, he said that will be in the General Assembly to share "Hadoop in advertising monitoring technology practice." Click here to register!
Lu Yi Lei, vice president of Admaster Technology
He has worked in Lenovo Research Institute, Baidu Infrastructure Department, Carbonite, and focuses on high reliability, high availability, high scalability, high-performance system services, and focus on off-line, streaming and real-time distributed computing technologies such as Hadoop/hbase/storm/spark.
It has deep understanding and practical experience on distributed storage and distributed computing, super large cluster and large data analysis, and has a deep understanding of Lustre,hdfs,hbase,map/reduce,storm,spark. 2006 Master's degree, has been engaged in cloud storage, cloud computing development and architecture, many years of experience in Hadoop, focus on distributed storage, distributed computing, large data analysis and other directions, there are multiple invention patents, "a distributed file system and its data access method" and " A version management method and device of data backup. has repeatedly been 51CTO, CSDN, IT168, InfoQ, Ali Technology invited guests to share Hadoop large data in the Internet application.
The following is the original interview:
CSDN: What large data technologies have you used in your company? What are some of the areas where you are satisfied with these technologies and where are you dissatisfied?
Lu Yi Lei: Admasteradmaster is a marketing data technology company that integrates multi-source data through a software that is a service (SaaS) platform to help the business value of brand mining data applications. The major data technologies currently in use include:
: Nginx,lvs data storage: Hdfs,hbase,elasticsearch,mysql,aerospike,redis; Data analysis: Map/reduce,storm,spark Virtualization Technology: Openstack,docker, etc.
It is very satisfied with the level expansion of large data technology, especially the Web cluster of data collection, Hadoop storage and compute cluster, and the rapid updating of large data technology, which makes large data technology can quickly meet the development of business.
Large data technology is more difficult is the cost of learning will be higher, need to keep pace with the development of technology can be, there is a need to choose the right one best suited to their company's development technology.
CSDN: According to your understanding, at present similar enterprise, in the data aspect, encounters the biggest difficulty is what? (can be discussed separately from the software, hardware, Developer's point of view)
Lu billion lei: At present in the data, from the software aspect of technology selection is very important, and from the developer point of view needs to constantly update their knowledge to adapt to technology development. There is the data analysis and mining technology is very scarce, many enterprises have a large number of data, but some are isolated island data, some are invalid data, so how fast and accurate modeling is currently encountered the biggest difficulties, especially the industry and understand the technology of the data analyst is too few.
CSDN: What are some of the technologies you are looking at and studying in the Big data field?
LU billion Thunder: Focus on high reliability, high availability, high scalability, high-performance system services, attention to hadoop/hbase/storm/spark, such as off-line, streaming and real-time distributed computing technology, especially streaming and real-time computing, 2007 years or so I have access to real-time database this piece, until now, With the increasing demand of users and the lower cost of hardware, it is believed that the way of ssd+ memory will greatly accelerate the development of industry.
CSDN: Please talk about the topic you are about to share at this conference.
Lu billion Thunder: With the diversification of advertising forms, advertising data also exist in a variety of different forms, including exposure, click and so on passive access to the request, microblogging, news, blogs, forums, industry sites, such as the active crawl request;
This presentation will introduce the cleaning (ETL), storage (data Storage), mining (Mining) process. It will describe how the performance of nearly 10 billion requests per day is optimized, and how data analysis of nearly 100 billion data is achieved each day, as well as how to achieve minute-level calculations from multiple IDC acquisition to Sync Center room. The optimization practice based on Ssd+redis is mainly introduced.
Finally, it will focus on the development and characteristics of the ADH (advertising distribution Hadoop) developed by Admaster, who has accumulated more than 7 years of actual advertising and brand marketing experience in thousands of practical cases. This includes built-in advertising algorithms, Application Scheduler optimization, and integration of online data (HBase), off-line data (MapReduce), real-time data (Spark), streaming data (Storm).
CSDN: Which listeners should know these topics best? What topics can you share to help your audience solve problems?
Lu Yi Lei: Interested in the advertising industry audience, an audience interested in the practice of large data landing techniques, and an audience interested in Hadoop, will give an audience an idea of how the advertising industry applies large data technologies, particularly in many areas that are not known to the advertising business, as well as some of the application-specific optimizations of Hadoop , including some features of Admaster's own developed ADH version.
CSDN invites you to participate in China's large data award-winning survey activities, just answer 23 questions will have the opportunity to obtain the highest value of 2700 Yuan Award (a total of 10), speed to participate in it!
National Large data Innovation project selection activities are also in full swing, details click here.
The 2014 China Large Data Technology Conference (Marvell conference 2014,BDTC 2014) will be held at Crowne Plaza Hotel, New Yunnan, December 12, 2014 14th. Heritage since 2008, after seven precipitation, "China's large Data technology conference" is currently the most influential, the largest large-scale data field technology event. At this session, you will not only be able to learn about Apache Hadoop submitter uma maheswara Rao G (a member of the project Management Committee), Yi Liu, and members of the Apache Hadoop and Tez Project Management Committee Bikas Saha and other shares of the general large data open source project of the latest achievements and development trends, but also from Tencent, Ali, Cloudera, LinkedIn, NetEase and other institutions of the dozens of dry goods to share. There are a few discount tickets for the current ticket purchase.
Free Subscribe to the "CSDN large data" micro-letter public number, real-time understanding of the latest big data progress!
CSDN large data, focus on large data information, technology and experience sharing and discussion, to provide Hadoop, Spark, Impala, Storm, HBase, MongoDB, SOLR, machine learning, intelligent algorithms and other related large data views, large data technology, large data platform, large data practice , large data industry information and other services.