We are excited to announce that, starting today, the preview data bricks for Apache Spark1.5.0 are available. Our users can now choose to provide clusters with spark 1.5 or previous Spark versions ready for several clicks.
Officially, Spark 1.5 is expected to be released within a few weeks, and the community has made a version of the QA test. Given the fast-paced development of Sparks, we feel it is important to enable our users to develop and exploit new features as quickly as possible. With traditional on-premises software deployment, it can take months, even years, to receive software updates from vendors. Data brick cloud model, we can update in a few hours, let the user try their spark version of the choice.
What ' s New?
the last few releases of Spark focus on Making data science more accessible, through high-level programming APIs such as dataframes,  machine learning Pipelines R language Support under-the-hood changes to Improve Spark ' s performance, usability, and Operational stability
Spark 1.5 delivers the first phase of Project tungsten, a new execution backend for dataframes/sql. Through code generation and Cache-aware algorithms, Project Tungsten improves the runtime performance with Out-of-the-box Configurations. Through explicit memory management and external operations, the new backend also mitigates the inefficiency in JVM garbage Collection and improves robustness in large-scale workloads.
over The next few weeks, we'll be a writing about Project tungsten. To give a sneak peek, the above chart compares the Out-of-the-box (i.e. no configuration changes) performance of a AG Gregation query (Million records and 1 million composite keys) using spark 1.4 and spark 1.5 on my laptop.
Streaming workloads typically run 24/7 and has stringent stability requirements. In this release, Typesafe have introduced backpressure in Spark streaming. With this feature, Spark streaming can dynamically control the data ingest rates to adapt to unpredictable variations in P rocessing load. This allows streaming applications to is more robust against bursty workloads and downstream delays.
Of course, Spark 1.5 is the work of more than-open source contributors from over-organizations, and includes a lot More than the above. Some examples include:
- New machine learning Algorithms:multilayer perceptron classifier, Prefixspan for sequential Pattern Mining, Association R Ule generation, etc.
- Improved R language support and Glms with R formula.
- Better instrumentation and reporting of memory usage in Web UI.
Stay tuned for future blogs posts covering the release as well as deep dives into specific improvements.
How does I use it?
Launching a spark 1.5 cluster is as easy as selecting Spark 1.5 experimental version in the cluster creation interface in Databricks.
Once you hits confirm, you'll get a spark cluster ready to go with spark 1.5.0 and start testing the new release. Multiple spark version Databricks also enables users to run Spark 1.5 canary clusters side-by-side with Existin g production Spark clusters.
You can find the work-in-progress documentation for Spark 1.5.0 here. Please be aware this just like any other preview software, Spark 1.5.0 support is experimental. There'll be bugs and quirks, we find and fix in the next couple of weeks. The good news is so you don ' t has to worry about following the development or upgrading yourself. As we discover and fix bugs in the open source project, the Spark 1.5 option in Databricks would also be updated Automatica Lly. If you encounter a bug, please report it by filing a JIRA ticket.
To try Databricks, sign up for a free 30-day trial.
at the last Beijing Sparkmeetup technology sharing meeting, a spark Commiter said they were busy with spark 1.5 (the core work is said tungsten), a new dataframes/sql executes the backend. The project supports caching through code generation algorithms to improve runtime performance and tungsten out-of-the-box configuration. With explicit memory management and external operations, the new backend also reduces garbage collection for inefficient JVMs, improving robustness at large workloads
At present, the first phase of the spark1.5 is now complete, the estimated late should have a lot of optimization and code repair, but can taste the sweetness, if you want to understand the 1.5 version of the code, see GitHub spark1.5 Branch, personal feeling is mainly spark SQL upgrade it, Because most companies are the way spark on yarn, most task-boosting hopefully on spark SQL
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Spark 1.5 preview available in Databricks