Spark release 1.2.0-support for Netty Nio/sql enhancements

Source: Internet
Author: User
Tags shuffle

Spark version 1.2.0
Spark 1. 2.0is ain the1.XLineof thea third version. This versionbrought aSparkthe core enginePerformance andUsability Improvements,an important MLlib NewAPI, the extension of PythonMLSupport,a completely high-availabilitySparkFlowMode,and so on. GraphXhave seenMain performance andAPIimproved, hasfrom the alpha componentGraduation. Spark1.2 representing more than 60 organizations. 172contributing PeopletheaPatches's work.

Download Spark 1.2, visit the download page .

Spark Core
in theSpark 1.2 liter-Leveltwo main subsystems of a kernel, to improveUltra-Large scaleMixed Washthe performance and stability. First,Spark Batch transfer using of theCommunication Manager,itupgrade intothe Nettyimplementation of. Second,Sparkof theShufflemechanism toUpgrade to The first release in Spark1.1. "based on sort"shuffle mechanism. itimproves theUltra-Large scaleShufflethe performance and stability. Sparkalso inLong-runningETLtype ofin Jobincreased theaimed at improvingNetworked Usethe elasticityZoommechanism. currently supported only on yarn,Follow-up meetingin thein the future versionsupport for other clustersManager. finally, Spark1.2added a pair ofScala's2.11Support. aboutScala2.11for an introduction, seeIntroductiondocumentation.


Spark Streaming

This release includes Spark streaming Library two mainfunctionSupplement,a pythonAPI, a writeLogbeforeto befull h/a driver. Python'sAPIcovers almost allof theDSTREAMConversions andOutput Operation. currently supportsbased ontext Fileand the text through a socketinput source. the nextAdded in versionPythOf input streams for Kafka and Flume . Second, Spark streamingthroughWritenow hash/aDriver SupportLog (WAL). in theSpark1.1and earlier versions,someBuffered (received butnot yet processed),Datacan be inDriver's Restartwill be lost. to prevent this situation,Spark1.2has added aoptions availablethe Wal, thebuffersReceivedata is converted intoFault Tolerancethe file system(for example,HDFS).
Spark SQLin this release,Spark SQL adds a pair ofExternal DataSource ofathe newAPI. This API supportsinstallationExternal Data Sourcesof theTemp Table,and Supportaspredicatepush-down and optimization . Sparkof theParquetand theJSONbindingThese APIs have been used to re -Write, we hopevariousCommunity Projects, to1.2The life cyclein the processwith other systemsand Formatintegration.

HiveIntegratedfixedPrecision Decimaltypeand theHive0.13support. SparkSQLalso increased theDynamicPartitioningInsert, a popular hivethe function. aroundCacheInternalthe re-architectureimproves thePerformanceand CacheSchemarddinstance ofSemantics,added a pair ofStatistics-basedPartition Trimmingof theCache Datathe support.

GraphX        GraphX 1.2 fromAlpha version graduated., and added aa stableAPI. This means thatWrite GraphX applications can guaranteeFuture Sparkversion to use togetherNo code changes required. a newCore API,Aggregate Messages,introducedReplacenow it's obsolete.mapReduce TripletAPI. NewAggregate MessagesAPIprovidea moreurgentlythe programming modeland improve performance. some earlyTest Userfound that by switching to the new API Performance Improvement - 1 time times .

also,Sparknow supportFigureCheck Point, andBloodtruncatethis isnecessary for, tosupport a large numberRepeatof theProduction Position. finally,fewof thePerformance Improvementshave beenincreased thePageRank andGraphicsload.

Known Issues: Some minor bugs are not given in the Publish window. They will be fixed in star Spark1.2.1:

The Netty shuffle does not comply with the protected port configuration. Fix-Revert to NiO shuffle: SPARK-4837
An Java.io.FileNotFound exception occurred while creating an external hive table. Resolution-Set Hive.stats.autogather=false. SPARK-4892.
In addition to the Pyspark input text file compression function: SPARK-4841
Metricsservlet Incorrect initialization: SPARK-4595



Spark release 1.2.0-support for Netty Nio/sql enhancements

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.