"IT168 Database Conference Report" April 2013 18-20th, the Third China Database Technology Congress (DTCC 2013) kicked off at four points by Sheraton Beijing Hotel. During the three-day meeting, the Conference will explore a wide range of technology areas such as large data applications, data architecture, data management (data governance), traditional database software, and will invite a group of top technical experts to share. On the basis of retaining the traditional theme of database software application practice, this session will expand to large data, data structure, data management and analysis, business intelligence and so on, to meet the urgent needs of the general practitioners and industry users.
▲2013 China Database Conference
Since 2010, the domestic leading IT professional website IT168 Joint ITPUB, Chinaunix Technology community has held three consecutive database technical conference, each Congress more than thousands of people scale, gathered in the domestic technical level of the highest data architects, DBAs, database development engineers, Research and development director, IT managers, etc., is currently the most popular database technology event.
When the big data is still being labeled as new technology, discussing the trend, everyone's game has been in the forefront of practice, to bring us more on the mobile internet and large data combined with the actual combat experience. Man game chief data scientist Chen Jidong site says: From the needs of the enterprise to the management of structured data requirements, is currently very successful it technology. Now the data is no longer deleted but needs to be saved, in addition to the query also need to mining value, the processing and analysis of data is far greater than the data query. But also faces many problems: the data is bigger and larger, the PB data relational database is also very difficult to handle, to the unstructured processing is difficult, moreover does not have the good to the data analysis, these all need to carry on the unceasing improvement.
▲ chief data scientist Chen Jidong
More and more information can be obtained from life-related data, and can be recorded. The infrastructure is cheap and can store data well. Good scalability of the basic management architecture, cloud computing provides a good architecture, basic management and so on.
▲ Development trend of data management
▲ Large Data Overall framework
Large Data key technologies and tools
The key techniques and tools of large data are combed from the perspective of information lifecycle. When we need to store data after collection, we can also store it in a distributed file system, in parallel databases, in NoSQL, or even directly in the database. Streaming and main memory are popular trends: small latency.
Key technologies include:
Mapreduce VS Parallel database
MapReduce was the first to make a large number of unstructured web pages crawled by the needs of the emerging. The disadvantage of database and mapreduce is that the pattern is inflexible.
• Current Parallel database problems
– Extensibility: Needs to be extended to thousands of nodes
-Fault tolerance: Need improved fault tolerance, finer granularity, more efficient
– Flexibility: Requires scheme free processing to better accommodate unstructured data
– Cost: More open source implementations and external open source tools
Problems with MapReduce-type programmes (Hive)
– Performance: suboptimal implementations, adding Index and query optimization
-Real-time processing capabilities: real-time loading, real-time complex query capabilities
– Standard SQL Interface: application migration, transparent access to DW
– Richer peripheral tools: OLAP tools, self-help analysis tools
NoSQL VS SQL
Customized query NoSQL A good solution is much better than the MySQL cluster. The trend of the database is how to do in-depth analysis, how to parallel data mining and machine learning. Trends in large data: Linear models + simple features are better than complex models + simple features, and are now a feature of the project.
Characteristics of NoSQL
– High scalability and resilience, flexible data models, strong fault tolerance, high availability (at the expense of acid, single-record transactions, and final consistency)
-Apply to the data customization storage, high throughput query (such as full attribute selection application)
Characteristics of SQL
-Powerful semantic expression and relationship expression, query processing and optimization, ACID, strong consistency
-poor scalability and flexibility for high-performance queries and complex analysis
▲nosql VS SQL