CCA Spark and Hadoop Developer certification Skills point "2016 for Hadoop Peak"

Source: Internet
Author: User
Tags new set sqoop

Required SkillsSkill Requirements:Data IngestData digestion:

The skills to transfer data between external systems and your cluster. This includes the following:

The ability to transfer data between external systems and clusters, including the following:

  • Import data from a MySQL database to HDFS using Sqoop
    Import data from MySQL to HDFs using Sqoop
  • Export data to a MySQL database from HDFS using Sqoop
    Import data from HDFs to MySQL using Sqoop
  • Change the delimiter and file format of data during import using Sqoop
    Change the data delimiter and file format when importing with Sqoop
  • Ingest Real-time and Near-real time (NRT) streaming data into HDFS using Flume
    Use flume to process real-time and near-real-time streaming data into HDFs
  • Load data into and out of HDFS using the Hadoop File System (FS) commands
    Import and export data using the Hadoop FIle system command in HDFs
Transform, Stage, Storeconversion, planning, storage

convert a set of data values in a given format stored in HDFS into new data values and/or a new data forma T and write them into HDFS. This includes writing Spark applications in both Scala and python:
Converts a set of data values on a given hdfs into a new set of data values and data formats, and writes to HDFs. This includes writing spark programs using Scala and Python

    • Load data from HDFs and storing results back to HDFS using Spark
      Use spark to load data from HDFs and write the results back to HDFs
    • Join disparate datasets together using Spark
      Using spark to merge different datasets
    • Calculate aggregate statistics (e.g., average or sum) using Spark
      Calculate summary statistics using spark
    • Filter data into a smaller dataset using Spark
      Use SPQRK to filter data to get smaller datasets
    • Write a query that produces ranked or sorted data using Spark
      Use Spqrk to write queries to get ranked or sorted data
Data AnalysisData Analysis

The use DDL (Data Definition Language) in order to create tables in the hive Metastore for use by Hive and Impala.
Creating tables in the Hive metabase with DDL (data definition Language) for hive and Impala use

  • Read and/or create a table in the Hive Metastore in a given schema
    Reads or creates a table in hive Metastore using the specified pattern
  • Extract an Avro schema from a set of datafiles using Avro-tools
    Extracting Avro schema from a set of data files using the Avro tool
  • Create a table in the Hive metastore using the Avro file format and an external schema file
    Create a table in hive Metastore using the Avro file format and an external schema file
  • Improve query performance by creating partitioned tables in the Hive Metastore
    Create partitions in hive Metastore to increase the efficiency of queries
  • Evolve an Avro schema by changing JSON files
    Change JSON file Upgrade Avro schema


Charles 2016-1-1 at Phnom Phen


Copyright Notice:This article by Charles Dong Original, I support open source and free beneficial dissemination, against commercial profit. csdn Blog:http://blog.csdn.net/mrcharles Personal Station: http://blog.xingbod.cnEmail:[email protected]

CCA Spark and Hadoop Developer certification Skills point "2016 for Hadoop Peak"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.