"Csdn Live Report" December 2014 12-14th, sponsored by the China Computer Society (CCF), CCF large data expert committee contractor, the Chinese Academy of Sciences and CSDN jointly co-organized to promote large data research, application and industrial development as the main theme of the 2014 China Data Technology Conference (big Data Marvell Conference 2014,BDTC 2014) and the second session of the CCF Grand Symposium was opened at Crowne Plaza Hotel, New Yunnan, Beijing.
Ufida Software engineer Bai Xiaoyong
2014 China Large Data Technology conference the second day morning of the Big Data Technology Forum, Ufida senior engineer Bai Xiaoyong presided over the afternoon forum. Former communications data chief strategic Officer Long, VMware Senior product line manager Dompo, Tech Data division, Data Asset Management department product director Gao Wei, Alibaba data Platform Division ODPs technical expert Xu Dong, Ctrip Senior software engineer Lio Xiaoge, Friends of the interactive data architect Lio Hairen Six experts on the major data technology related issues to launch a speech.
Chief strategic Officer Long, former communications data
Former communications chief strategic officer Long, has brought a keynote address entitled "Sharing of data on financial investment". Long from the difference between large financial data and traditional large data, the production process of financial data, the storage of large financial data, the analysis and mining of large financial data, and the analysis and research platform of online interactive financial programming, five aspects of financial data practice are shared.
The difference between large financial data and large data on consumer Internet is embodied in the following aspects:
Research object: Consumption of the Internet large data more emphasis on the study of individual behavioral signs, and large financial data data more emphasis on the study of group behavior and trends; Data correlation: Consumer Internet large data relative to individual strong data is relatively easy to obtain (such as browser cookies), data noise is small, Data related to large financial data and strong group behavior are more difficult to obtain, high data noise; algorithm complexity: Large data consumption of the Internet because the quality of the data, so the algorithm can be relatively simple, and large financial data because of the noise of the data, so the algorithm requires very high; Data capacity: Compared to the large data of consumer Internet, Large amounts of data in financial data, large data on the Internet + financial specialized data (such as market data, industry data, analyst reports, etc.) data type: Consumer Internet large data has a variety of structured and unstructured data, while the data type of large financial data, Internet data type + Financial special data type, For example, time series data, data speed: Consumer Internet large data processing speed requirements are not high, and large financial data processing speed requirements are relatively high, such as quantitative transactions, dynamic risk pricing, anti-credit card fraud, real-time news analysis and processing;
Among them, he expressed that time series data is the most important data type in the future, so it is very important to master the storage, processing and key algorithms of time series data. For example, KDB is standard for traditional financial institutions, and Cassandra has been successfully applied in the field of Internet of things and energy in foreign countries. Subsequently, Long also describes how to form structured, standardized data into meaningful financial industry data. Long also mentioned that compared with foreign countries, the introduction of the technology in the GRASP and application of the relatively far.
VMware Advanced Product line manager Dompo
Dompo, VMware Senior product line manager, shared the keynote address "VMware paves the way for large data applications." The application of large data in the enterprise usually goes through three stages: the concept verification stage, the value of fast and low cost validating large data technology, the production application phase, satisfies the application SLA, satisfies the system expansion demand, and the Hadoop is the service phase, fast agile and efficient to meet the differentiated needs of each business.
vsphere Big Data Extension can help you deploy quickly and easily, allowing you to focus on your business and BDE seamlessly integrate with third-party management tools. Vrealizeoperations Manager, can realize the system overall monitoring, intelligent automatic analysis management, based on the prediction of active operation. Vspherevmotion can eliminate planned or unplanned downtime and detect failure AutoRecover.
Dompo also mentioned that the use of the Hadoop cluster management platform can be used to balance the needs of various departments in the enterprise cluster. He said that when multiple departments demand the Hadoop Cluster service and the requirements are different, self-service platforms need to be built to solve such problems. VMware launches VCAC products that flex to handle cluster usage requirements across departments and reduce the pressure on IT managers to handle.
Tech Data Asset Management department product director Gao Wei
Tech Data Asset Management department product director Gao Wei, and the site participants share the theme of "Data asset management-the big Data age nuggets." Data asset Management is a variety of management activities taken by enterprises or organizations to ensure the safety and integrity of data assets, reasonable allocation and effective use, so as to improve the economic benefits brought about and to protect and promote the development of various undertakings. Gao said that while the concept of "data is an asset" is well known, "How to manage data assets" still lacks mature theories and tools, and there is a market gap.
Traditional data management methods are not suitable for data asset management requirements, Asia letter advocates the establishment of integrated process of data asset management system, with the following key features: Perfect data management and control, efficient data asset applications, and innovative data asset operations. Finally, Gao Wei concludes that data asset management has reached the same level as CRM, and calls on data asset management technicians to consider how technology can be combined with business and business.
Alibaba Data Platform Division ODPs technical expert Xu Dong
Alibaba Data Platform Division ODPs Technical Experts Xu Dong, sharing the theme of "ODPs MapReduce opening-up practice." In the speech, Xu Dong mainly talked about the use of ODPs (Open Data Process System) in Ariyumbaba, build lot model on ODPS, MapReduce implementation process, MapReduce API introduction and MapReduce API open User Practice. ODPs, as the underlying platform of Ali large-scale data processing, submits hundreds of thousands of tasks daily, the bottom is the super large-scale cluster across the data center, which supports a variety of programming models and paradigms.
Among them, Xu Dong mentions that the MapReduce API has been adjusted in two ways, including canceling MapReduce custom type support and hoping to make the MapReduce API like Hadoop. Finally, he says MapReduce will be open to users early next year as an open service.
Ctrip Senior Software engineer Lio Xiaoge
Ctrip Senior Software engineer Lio Xiaoge, sharing the theme for "make large data more real-time and visualization." Lio Xiaoge mainly introduced Ctrip large data platform architecture, HBase in the application of Ctrip, Ctrip product Ecological Introduction, as well as the challenges of the future.
Ctrip Daily log volume of 40T, the total number of Ching, the daily user behavior data 30T, and the rapid growth of business data, all the data need to be timely feedback to users, applications or monitoring. Ctrip Large data platform architecture and Hadoop ecological environment similar to the bottom of the use of HDFs, the above is a scheduling system, and through MapReduce, spark off-line data analysis, Storm and hbase do online data analysis. Which HBase is divided by business, the bottom of the construction of HBase access control system. Ctrip also constructs a mobile monitoring system and a UBT (user Behavior Tracking) system that tracks user behavior and user traffic and forms intuitive visual images.
Product Friends Interactive Data architect Lio Hairen
Friends of the interactive data architect Lio Hairen, brought a keynote called "Real-time bidding optimization based on data DSP". First, he introduces five features of DSP optimization:
DSP and other Adnetwork/serach is very different, it as a closed systems, and always with other DSP to compete; for DSP, advertisers may have a very diverse KPI, the settlement method is also diversified, the main advertising KPI may be Ctr, CPC, CPA, rol/ CPNC and so on, in the process of delivery, the prediction of the budget consumption is more stringent, to accurately estimate CTR/CVR; click and exposure imbalance;
Subsequently, Lio Hairen focuses on the key issues in the process of DSP optimization: First, ranking, search ads, each ad has different KPIs, and second, sample Selection bias, to solve the problem before the need for CTR estimates; third, mobile optimization , mobile optimization differs greatly from PCs and needs to be addressed separately.
More highlights, please pay attention to the live topic 2014 China Large Data Technology Congress (BDTC), Sina Weibo @csdn cloud computing, subscribe to CSDN large data micro-signal.