Memsql replaces HDFS with Spark, dramatically improves performance

Source: Internet
Author: User
Tags memsql

Memsql replaces HDFS with Spark, dramatically improves performance by 3,597 reads-Infrastructure

Apache Spark is a very powerful distributed computing framework at the moment. Its simple and understandable computational framework makes it easy to understand. While Spark is an advantage in manipulating big data sets, it still needs to persist data storage, HDFS is the most common choice, and is used in conjunction with Spark, because it is based on disk characteristics that can affect performance in real-time applications such as Spark Streaming in the calculation). Also, the built-in spark does not support transactional commits (commit transactions).

The Memsql database described in this article is known as the world's fastest distributed memory database (the Earth's fastest in-memory)! It is a memory-based distributed relational database created by Eric Frenkiel (former Facebook employee) and Nikita Shamgunov (formerly Microsoft SQL Server Senior Engineer) , which stores data in memory, and to precompile SQL statements into C + + for fast execution efficiency. It is compatible with MySQL and is 30 times times faster than MySQL and can achieve 1.5 million transactions per second.

A memsql Spark Connector, recently published in its official release, is well-used with spark, allowing spark users to quickly read and write data from the database. Memsql is a natural fit for spark because it can handle large volumes of read and write efficiently, and spark often needs to do so, and memsql can provide plenty of space for creating new data for spark.

Memsql Spark Connector provides a variety of interfaces for all spark and memsql interactions, and it does a number of optimizations, such as reading data from Memsql in parallel, and when Memsql and Spark are running on a physical node, Spark writes data directly to it. Memsql offers two of the most important builds: Memsqlrdd and SaveTomemsql.

Memsqlrdd is used to store data sets that are queried from Memsql, and SaveToMemsql writes RDD data from spark to the Memsql table. The two interfaces look similar to the built-in JDBC interface of Spark and are used similarly (see "Spark and MySQL (JDBCRDD) integrated development"). Take a look at how to use Memsqlrdd. We used to read the table data from Memsql and stored it in Memsqlrdd:

01 importcom.memsql.spark.connector.rdd.MemSQLRDD
02
03 ...
04
05 valrdd = newMemSQLRDD(
06     sc,
07     dbHost,
08     dbPort,
09     dbUser,
10     dbPassword,
11     dbName,
12     "SELECT * FROM iteblog",
13     (r:ResultSet) => { r.getString("test_column") })
14 rdd.first()  // Contains the value of "test_column" for the first row

If you want to write an rdd to memsql, you can use the Savetomemsql function:

1 importcom.memsql.spark.connector._
2
3 ...
4
5 valrdd =sc.parallelize(Array(Array("www""iteblog"), Array("com""qux")))
6 rdd.saveToMemsql(dbHost, dbPort, dbUser, dbPassword,
7     dbName, outputTableName, insertBatchSize=1000)

As you can see from the example above, how easy it is to combine with memsql and spark.

This article translated from: http://blog.memsql.com/memsql-spark–connector/

Reprinted from Past memory (http://www.iteblog.com/)
This article links to the address: "Running real-time applications using spark and Memsql spark connectors" (http://www.iteblog.com/archives/1327)

Note: Reproduced articles are from the public network, only for learning to use, will not be used for any commercial purposes, if the infringement of the original author's interests, please contact us to delete or licensing matters, contact e-mail: [email protected]. Reproduced several league website article please indicate the original article author, otherwise, any copyright dispute arising from the league has nothing to do.

Memsql replaces HDFS with Spark, dramatically improves performance

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.