The spark kernel is developed by the Scala language, so it is natural to develop spark applications using Scala. If you are unfamiliar with the Scala language, you can read Web tutorials A Scala Tutorial for Java programmers or related Scala books to learn.
This article will introduce 3 Scala
Scala Beginner's intermediate-Advanced Classic (66th: Scala concurrent programming experience and its application in Spark source code) content introduction and video link2015-07-24DT Big Data Dream FactoryFrom tomorrow onwards, be a diligent person.Watch videos, videos, share videosDT Big Data Dream Factory-scala--Advanced Classic: 66th: The first experience of Scala concurrent
Java, Scala, Python, and r four programming languages. Streaming has the ability to handle real-time streaming data. Spark SQL enables users to query structured data in the language they are best at, Dataframe at the heart of Spark SQL, dataframe data as a collection of rows, each column in the corresponding row is na
Flink may help us in the future of distributed data processing.
In a later article, I'll write myself as a spark developer's first impression of Flink. Because I have been working on spark for more than 2 years, but only in flink contact for 2-3 weeks, so there must be some bias, so we also take a skeptical and critical point of view of this article.
Article Listing 1
Original address: http://blog.jobbole.com/?p=89446I first heard of spark at the end of 2013, when I was interested in Scala, and Spark was written in Scala. After a while, I made an interesting data science project, and it tried to predict surviving on the Titanic . This proves to be a good way to learn more about spark content and
script directly to get the result of the operation.
During the run, a bug,org.apache.spark.deploy.yarn.client is found with a parameter "–name" to specify the application name:
However, in the course of use, this parameter blocks the application, viewing the source code found to be a bug, which has been submitted to spark Jira: 1 2 3 4 5 6 7 8 9 10 11 12//Location: New-yarn/src/main/scala/org/apache /
You are welcome to reprint it. Please indicate the source.Summary
The SQL module was added to the newly released spark 1.0. What's more interesting is that hiveql in hive also provides good support, as a source code analysis control, it is very interesting to know how spark supports hql.Introduction to hive
The following part is taken from hive in hadoop definite guide.
"Hive was designed by Facebook to all
You are welcome to reprint it. Please indicate the source, huichiro.Wedge
Hive is an open source data warehouse tool based on hadoop. It provides a hiveql language similar to SQL, this allows upper-layer data analysts to analyze massive data stored in HDFS without having to know too much about mapreduce. This feature has been widely welcomed.
An important module in the overall hive framework is the execution module, which is implemented using the mapr
lot above. To put it bluntly, when writing yarn application, it mainly implementsClientAndApplicatonmaster. For more information, seeSimple-yarn-app.Spark on Yarn
Combined with the deployment mode of spark standalone and the requirements of the yarn programming model, a table is provided to show the comparison between spark standalone and
-1.1.0-hadoop2.2.0.jar file, add the finished interface as follows:2.2 Example 1: Run directly"Spark programming Model (top) – Concept and Shell test" using Spark-shell for the search of Sogou logs, here we use idea to re-practice the number of Session query leaderboard, you can find that the use of professional development tools can be convenient and quick many.
the source reading, we need to focus on the following two main lines.
static View is RDD, transformation and action
Dynamic View is the life of a job, each job is divided into multiple stages, each stage can contain more than one RDD and its transformation, How these stages are mapped into tasks is distributed into cluster
References (Reference)
Introduction to Spark Internals http://files.meetup.com/3138542/dev-meetup-dec-
Apache Spark Memory Management detailedAs a memory-based distributed computing engine, Spark's memory management module plays a very important role in the whole system. Understanding the fundamentals of spark memory management helps to better develop spark applications and perform performance tuning. The purpose of thi
another set of external systems or provide the calculated results to the user. One of the big advantages of the storm ecosystem is that it has a rich mix of stream types enough to fetch data from any type of source. While it is possible to write custom streams for some highly specific applications, we can always find the right solution from the vast existing source types-from the Twitter streaming API to the Apache Kafka to the JMS broker, all covere
documentation.SummaryIn the source reading, we need to focus on the following two main lines.
static View is RDD, transformation and action
Dynamic View is the life of a job, each job is divided into multiple stages, each stage can contain more than one RDD and its transformation, How these stages are mapped into tasks is distributed into cluster
References (Reference)
Introduction to Spark Internals http://files.meetup.com
, including transformation and action.When do you use the RDD?General scenarios for using RDD:
You need to use low-level's transformation and action to control your data set;
Your data sets are unstructured, such as streaming media or text streams;
You want to use functional programming to manipulate your data, rather than using a domain-specific language (DSL) to express it;
You don't
monitoring of computing resources, restarting failed tasks based on monitoring results, or re-distributed task once a new node joins cluster.This part of the content needs to refer to yarn's documentation.SummaryIn the source reading, we need to focus on the following two main lines.
static View is RDD, transformation and action
Dynamic View is the life of a job, each job is divided into multiple stages, each stage can contain more than one RDD and its transformation, How these sta
This article is published by NetEase Cloud.This article is connected with an Apache flow framework Flink,spark streaming,storm comparative analysis (Part I)2.Spark Streaming architecture and feature analysis2.1 Basic ArchitectureBased on the spark streaming architecture of Spark
Learn Spark 2.0 (new features, real projects, pure Scala language development, CDH5.7)Share the network disk download--https://pan.baidu.com/s/1c2f9zo0 password: pzx9Spark entered the 2.0 era, introducing many excellent features, improved performance, and more user-friendly APIs. In the "unified programming" is very impressive, the implementation of offline compu
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.