Memsql replaces HDFS with Spark, dramatically improves performance by 3,597 reads-Infrastructure
Apache Spark is a very powerful distributed computing framework at the moment. Its simple and understandable computational framework makes it easy to understand. While Spark is an advantage in manipulating big data sets, it still needs to persist data storage, HDFS is the most common choice, and is used in conjunction with Spark, because it is based on disk characteristics that can affect performance in real-time applications such as Spark Streaming in the calculation). Also, the built-in spark does not support transactional commits (commit transactions).
The Memsql database described in this article is known as the world's fastest distributed memory database (the Earth's fastest in-memory)! It is a memory-based distributed relational database created by Eric Frenkiel (former Facebook employee) and Nikita Shamgunov (formerly Microsoft SQL Server Senior Engineer) , which stores data in memory, and to precompile SQL statements into C + + for fast execution efficiency. It is compatible with MySQL and is 30 times times faster than MySQL and can achieve 1.5 million transactions per second.
A memsql Spark Connector, recently published in its official release, is well-used with spark, allowing spark users to quickly read and write data from the database. Memsql is a natural fit for spark because it can handle large volumes of read and write efficiently, and spark often needs to do so, and memsql can provide plenty of space for creating new data for spark.
Memsql Spark Connector provides a variety of interfaces for all spark and memsql interactions, and it does a number of optimizations, such as reading data from Memsql in parallel, and when Memsql and Spark are running on a physical node, Spark writes data directly to it. Memsql offers two of the most important builds: Memsqlrdd and SaveTomemsql.
Memsqlrdd is used to store data sets that are queried from Memsql, and SaveToMemsql writes RDD data from spark to the Memsql table. The two interfaces look similar to the built-in JDBC interface of Spark and are used similarly (see "Spark and MySQL (JDBCRDD) integrated development"). Take a look at how to use Memsqlrdd. We used to read the table data from Memsql and stored it in Memsqlrdd:
01 |
import com.memsql.spark.connector.rdd.MemSQLRDD |
05 |
val rdd = new MemSQLRDD( |
12 |
"SELECT * FROM iteblog" , |
13 |
(r : ResultSet) = > { r.getString( "test_column" ) }) |
14 |
rdd.first() // Contains the value of "test_column" for the first row |
If you want to write an rdd to memsql, you can use the Savetomemsql function:
1 |
import com.memsql.spark.connector. _ |
5 |
val rdd = sc.parallelize(Array(Array( "www" , "iteblog" ), Array( "com" , "qux" ))) |
6 |
rdd.saveToMemsql(dbHost, dbPort, dbUser, dbPassword, |
7 |
dbName, outputTableName, insertBatchSize = 1000 ) |
As you can see from the example above, how easy it is to combine with memsql and spark.
This article translated from: http://blog.memsql.com/memsql-spark–connector/
Reprinted from Past memory (http://www.iteblog.com/)
This article links to the address: "Running real-time applications using spark and Memsql spark connectors" (http://www.iteblog.com/archives/1327)
Note: Reproduced articles are from the public network, only for learning to use, will not be used for any commercial purposes, if the infringement of the original author's interests, please contact us to delete or licensing matters, contact e-mail: [email protected]. Reproduced several league website article please indicate the original article author, otherwise, any copyright dispute arising from the league has nothing to do.
Memsql replaces HDFS with Spark, dramatically improves performance