Memsql replaces HDFS with Spark, dramatically improves performance

Last Update:2016-06-08 Source: Internet

Author: User

Tags memsql

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Memsql replaces HDFS with Spark, dramatically improves performance by 3,597 reads-Infrastructure

Apache Spark is a very powerful distributed computing framework at the moment. Its simple and understandable computational framework makes it easy to understand. While Spark is an advantage in manipulating big data sets, it still needs to persist data storage, HDFS is the most common choice, and is used in conjunction with Spark, because it is based on disk characteristics that can affect performance in real-time applications such as Spark Streaming in the calculation). Also, the built-in spark does not support transactional commits (commit transactions).

The Memsql database described in this article is known as the world's fastest distributed memory database (the Earth's fastest in-memory)! It is a memory-based distributed relational database created by Eric Frenkiel (former Facebook employee) and Nikita Shamgunov (formerly Microsoft SQL Server Senior Engineer) , which stores data in memory, and to precompile SQL statements into C + + for fast execution efficiency. It is compatible with MySQL and is 30 times times faster than MySQL and can achieve 1.5 million transactions per second.

A memsql Spark Connector, recently published in its official release, is well-used with spark, allowing spark users to quickly read and write data from the database. Memsql is a natural fit for spark because it can handle large volumes of read and write efficiently, and spark often needs to do so, and memsql can provide plenty of space for creating new data for spark.

Memsql Spark Connector provides a variety of interfaces for all spark and memsql interactions, and it does a number of optimizations, such as reading data from Memsql in parallel, and when Memsql and Spark are running on a physical node, Spark writes data directly to it. Memsql offers two of the most important builds: Memsqlrdd and SaveTomemsql.

Memsqlrdd is used to store data sets that are queried from Memsql, and SaveToMemsql writes RDD data from spark to the Memsql table. The two interfaces look similar to the built-in JDBC interface of Spark and are used similarly (see "Spark and MySQL (JDBCRDD) integrated development"). Take a look at how to use Memsqlrdd. We used to read the table data from Memsql and stored it in Memsqlrdd:

`01`	`importcom.memsql.spark.connector.rdd.MemSQLRDD`

02

03 ...

04

`05`	`valrdd` `=` `newMemSQLRDD(`

06 sc,

07 dbHost,

08 dbPort,

09 dbUser,

`10`	`dbPassword,`

11 dbName,

`12`	`"SELECT * FROM iteblog",`

`13`	`(r:ResultSet)` `=> { r.getString("test_column") })`

`14`	`rdd.first()` `// Contains the value of "test_column" for the first row`

If you want to write an rdd to memsql, you can use the Savetomemsql function:

`1`	`importcom.memsql.spark.connector._`

2

3 ...

4

`5`	`valrdd` `=sc.parallelize(Array(Array("www",` `"iteblog"), Array("com",` `"qux")))`

`6`	`rdd.saveToMemsql(dbHost, dbPort, dbUser, dbPassword,`

`7`	`dbName, outputTableName, insertBatchSize=1000)`

As you can see from the example above, how easy it is to combine with memsql and spark.

This article translated from: http://blog.memsql.com/memsql-spark–connector/

Reprinted from Past memory (http://www.iteblog.com/)
This article links to the address: "Running real-time applications using spark and Memsql spark connectors" (http://www.iteblog.com/archives/1327)

Note: Reproduced articles are from the public network, only for learning to use, will not be used for any commercial purposes, if the infringement of the original author's interests, please contact us to delete or licensing matters, contact e-mail: [email protected]. Reproduced several league website article please indicate the original article author, otherwise, any copyright dispute arising from the league has nothing to do.

Memsql replaces HDFS with Spark, dramatically improves performance

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More