Big data improvements for MySQL: Support for NoSQL and Hadoop

Source: Internet
Author: User
Keywords Through copy improve function

When it comes to big data, it has to do with Alibaba. The world's leading E-commerce enterprise, the amount of data processed every day is unmatched by any other company, it is also transforming into a real data company--mysql is an important weapon in the transformation of Alibaba. A database architect who interviewed Ali, who believes Ali has the best performance of open source MySQL, beyond any relational database and NoSQL.

In 2009, Oracle acquired the copyright of MySQL by acquiring Sun, and the industry began to question Oracle's intentions, fearing the future of MySQL. Oracle has made a statement during the acquisition that it will devote more effort than sun to developing MySQL. It appears that at least the development of the MySQL community version and the third party version has not been affected by the acquisition, the MySQL business version is also continuous improvement and update. The following article will take stock of Oracle's version of the MySQL 5.6 official edition of some of the new features, and for the big data age improvements.

MySQL 5.6 Official version of the functional inventory

At the beginning of 2013, Oracle released the MySQL 5.6 official edition to provide better query execution time and diagnostics through improved MySQL optimization diagnostics, enhanced INNODB storage engine to improve performance and application availability, new features for MySQL replication to improve scalability and high availability, and has many new enhancements, including geographic information systems, precise space operations, enhanced IPV6 compliance, and optimized server defaults.

With enhanced performance, scalability, reliability, and manageability advantages, MySQL 5.6 helps users meet the most demanding network, cloud, and embedded application requirements. With subquery optimization, online data definition language (DDL) operations, NoSQL access InnoDB, new performance architecture detection, and better condition handling, MySQL 5.6 can greatly increase developer flexibility.

Four highlights:

1. Improved MySQL optimization diagnostics to provide better query execution time and diagnostic capabilities

• Subquery Optimization: Simplifies query development by optimizing subqueries before execution. The new efficiency is reflected in the query execution time, which significantly increases the selection, classification and return of the result set.

• New exponential conditions for push (index Condition pushdown) and bulk key access (Batch key access) Increase the number of select queries by up to 280 times times.

• Enhanced Optimization Diagnostics: Insert,update and delete operations via explain. The explain program outputs in JSON format, providing more accurate optimization metrics and readability, and optimizing tracking (Optimizer traces) to track the optimization decision process.

2. Improve performance and application availability by enhancing the InnoDB storage engine

• Enhanced processing and read-only capacity up to 230%: InnoDB refactoring to minimize traditional threads, flush and clean mutex conflicts and bottlenecks, enabling better concurrency on high load OLTP systems, thereby significantly increasing processing for read-only workloads (2) and processing.

• Increased availability: Online DDL operations Enable database administrators to add indexes and perform table changes, and applications can still be used for updates.

· InnoDB Full-Text search: Allows developers to create Full-text indexes on the InnoDB table to represent text-based content and to speed the search for word and phrase applications.

• Simple, key value lookup: Through the familiar memcached API, InnoDB flexible nosql access to provide a simple INNODB data, key value lookup. Users can implement the "win" effect in the same database, key operation and complex SQL queries.

3. New features for MySQL replication to improve scalability and high availability

• Self-healing replication clusters: New global processing recognition and use programs (Transaction Identifiers and Utilities) make it easier to detect and recover from failures. Collision Security Replication (Crash-safe Replication) enables binary logs and driven loads to automatically revert to the correct location in the replication stream without administrator intervention in the event of a crash and recovery replication. With automatic detection and warning errors, checksums can maintain data integrity across clusters.

• High-performance replication clustering: Increased replication capacity by 5 times times with multi-threaded Binlog (3), group commits and row-replication optimizations (Binlog group Commit and optimized row-based Replication) Allows users to maximize replication performance and efficiency as they extend their workloads across commodity systems.

• Time delay replication: To prevent the occurrence of errors in the host, such as accidental deletion of the form.

4. Enhanced performance Architecture (PERFORMANCE_SCHEMA): New instrumentation enables users to better monitor resource-intensive queries, objects, users, and applications. You can also implement a summary of the new rollup statistics by querying, threading, users, hosts, and objects. Enhancements allow easier default configurations and cost less than 5%.

Second, MySQL for large data improvement

1, NoSQL function

In Oracle's latest version of MySQL 5.6, a number of NoSQL features have been added, namely the flexible NoSQL access to InnoDB via the Memcached API, which provides a simple, critical lookup of INNODB data. It can be seen from then on that NoSQL does have a huge impact on the relational database, and MySQL's move makes it easier for developers to use NoSQL and relational databases.

But there are a lot of technical staff think that this function of MySQL slightly chicken, and can not really play the role of NoSQL. In the case of extensibility, a major advantage of NoSQL is horizontal scaling (Scale out). For example, Cassandra can be easily expanded on multiple machines, which can be a cluster of inexpensive hardware without the need to buy expensive servers or SAN storage. This is not a MySQL 5.6.

2. Support Hadoop

The MySQL team has recently launched the MySQL Applier for Hadoop (hereinafter referred to as Hadoop applier) and wants to solve the problem of replicating data from a non-MySQL server.

For example, the from server in a replication event might be a data warehouse system, such as the Apache Hive, which uses the Hadoop Distributed File System (HDFS) as the data store. If you have a hive Meta store associated with HDFs, Hadoop Applier can populate the hive datasheet in real time. The data is exported from MySQL to HDFs in the form of a text file and then populated to hive.

The operation is simple, just run the HIVEQL statement ' CREATE table ' in hive, the structure of the table is similar to MySQL, and then run Hadoop applier to start replicating data in real time.

Before Hadoop applier, no tools were available to perform real-time transmissions. The previous solution was to export data to HDFs via Apache Sqoop, although it can be transferred in bulk, but it is often necessary to repeatedly import the results to keep the data updated. Other queries can become slow when you're doing a lot of data transfer. And in the case of a large database, if only a little change is made, Sqoop may take a long time to load.

Hadoop applier reads binary logs, applies only events that occur on the MySQL server, inserts data, does not require bulk transfer, operates faster, and therefore does not affect the execution speed of other queries.

Summary

MySQL is the industry's best open source relational database software, with a large number of followers, they not only use MySQL, but also for the MySQL community to contribute to the formation of a good ecosystem. For MySQL, the support for NoSQL and Hadoop can only be a response to the age of large data, the role of the technical staff noncommittal.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.