Spark 1.5 preview available in Databricks

Source: Internet
Author: User
Tags databricks

We are excited to announce that, starting today, the preview data bricks for Apache Spark1.5.0 are available. Our users can now choose to provide clusters with spark 1.5 or previous Spark versions ready for several clicks.

Officially, Spark 1.5 is expected to be released within a few weeks, and the community has made a version of the QA test. Given the fast-paced development of Sparks, we feel it is important to enable our users to develop and exploit new features as quickly as possible. With traditional on-premises software deployment, it can take months, even years, to receive software updates from vendors. Data brick cloud model, we can update in a few hours, let the user try their spark version of the choice.


What ' s New?

The last few releases of Spark focus in making data science more accessible, through high-level programming APIs such asDataframes, Machine Learning Pipelines, andR Language Support. A large part of Spark 1.5, on the other hand, focuses onUnder-the-hood ChangesTo improve Spark ' sperformance, usability, and operational stability.

Spark 1.5 delivers the first phase of Project tungsten, a new execution backend for dataframes/sql. Through code generation and Cache-aware algorithms, Project Tungsten improves the runtime performance with Out-of-the-box Configurations. Through explicit memory management and external operations, the new backend also mitigates the inefficiency in JVM garbage Collection and improves robustness in large-scale workloads.

over the next few weeks, we'll be a writing about Project tungsten. To give a sneak peek, the above chart compares the Out-of-the-box (i.e. no configuration changes) performance of a AG Gregation query (Million records and 1 million composite keys) using spark 1.4 and spark 1.5 on my laptop.

Streaming workloads typically run 24/7 and has stringent stability requirements. In this release, Typesafe have introduced backpressure in Spark streaming. With this feature, Spark streaming can dynamically control the data ingest rates to adapt to unpredictable variations in P rocessing load. This allows streaming applications to is more robust against bursty workloads and downstream delays.

Of course, Spark 1.5 is the work of more than-open source contributors from over-organizations, and includes a lot More than the above. Some examples include:

    • New machine learning Algorithms:multilayer perceptron classifier, Prefixspan for sequential Pattern Mining, Association R Ule generation, etc.

    • Improved R language support and Glms with R formula.

    • Better instrumentation and reporting of memory usage in Web UI.

Stay tuned for future blogs posts covering the release as well as deep dives into specific improvements.

How does I use it?

Launching a spark 1.5 cluster is as easy as selecting Spark 1.5 experimental version in the cluster creation interface in Databricks.

once you hits confirm, you'll get a spark cluster ready to go with spark 1.5.0 and start testing the new Release. m Ultiple Spark version Support in Databricks also enables users to run Spark 1.5 canary clusters side-by-side with Exi Sting production Spark clusters.

you can find the work-in-progress documentation for Spark 1.5.0 here. Please be aware this just like any other preview software, Spark 1.5.0 support is experimental. There'll be bugs and quirks, we find and fix in the next couple of weeks. The good news is so you don ' t has to worry about following the development or upgrading yourself. As we discover and fix bugs in the open source project, the Spark 1.5 option in Databricks would also be updated Automatica Lly. If you encounter a bug, please report it by filing a JIRA ticket.

To try Databricks, sign up for a free 30-day trial.

at the last Beijing Sparkmeetup technology sharing meeting, a spark Commiter said they were busy with spark 1.5 (the core work is said tungsten), a new dataframes/sql executes the backend. The project supports caching through code generation algorithms to improve runtime performance and tungsten out-of-the-box configuration. With explicit memory management and external operations, the new backend also reduces garbage collection for inefficient JVMs, improving robustness at large workloads

At present, the first phase of the spark1.5 is now complete, the estimated late should have a lot of optimization and code repair, but can taste the sweetness, if you want to understand the 1.5 version of the code, see GitHub spark1.5 Branch, personal feeling is mainly spark SQL upgrade it, Because most companies are the way spark on yarn, most task-boosting hopefully on spark SQL

Spark 1.5 preview available in Databricks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.