Data Lake Past Life analysis (last)

Source: Internet
Author: User
Tags isilon gemfire

EMC acquires video storage device vendor Isilon systems to improve EMC's weaknesses in the distributed Scale-out Architecture NAS Technology, which has been consistently recognised by customers in recent years in media, big data and HPC scenarios, and last year and another product line Povital Hadoop launches a data lake solution that addresses the processing and storage needs of customers for semi-structured and unstructured data under the wave of the Internet.

Before we talk about the data lake, we are looking at the database and the data Warehouse together. The database refers to the online transaction data system, the general refers to the OLTP transaction processing, the data in the database is also classified; The Data Warehouse generally refers to the ETL tool to extract the offline data in the classification, mainly for the subsequent analysis, or further classification as a data mart.

Data Lake

Data lake refers to both structured and unstructured big Data systems, which are difficult to classify effectively before use, because they are hard to define, but can be analyzed, computed and stored in situ. Data lake also changes the way users use data, data Lake integrates structured, unstructured data analysis and storage, users do not have to build a large amount of different data database, data Warehouse, because the data lake can be completed or implemented different data warehouse functions. The future of data lake as a cloud service at any time to meet the needs of different data analysis, processing and storage requirements, data lake vs data warehouse the data lake itself is the cloud to deploy to virtual machines, physical environment or the cloud.

650) this.width=650; "Width=" 520 "height=" 255 "style=" Width:auto;height:auto; "src=" orl2fuhmgzdaduaa3mhkcpys99jeaxiyy101qhkqibhvy3jfnqiaweaadu117alia3k5caazjsye2uig2ebxvogow/640?wx_fmt=png& Wxfrom=5&wx_lazy=1 "alt=" 640?wx_fmt=png&wxfrom=5&wx_lazy=1 "/>

Isilon's Data Lake Foundation

Scale out scaling capabilities provide key capabilities for building data lakes to meet massive data storage needs, while Isilon storage (HDFS) is deployed separately from compute (HADOOP) to meet compute on-demand scale-out requirements.

Provides rich software features such as Smartpool, Smartdedupe, what is a data lake Multi-copy (EC) for data flow, space-efficient utilization, and data reliability through the Onefs system engine, seamlessly integrates with VMware virtualization platforms Vaai, Vasa, and SRM Enable data lake data to flow efficiently between virtual and physical environments.

Support a wide variety of access protocol interfaces such as: CIFS, NFS, NDMP, Swift eliminates data silos, and enables different data storage and sharing in a single set of storage.

650) this.width=650; "Width=" 554 "height=" 259 "style=" Width:auto;height:auto; "src=" orl2fuhmgzdaduaa3mhkcpys99jeaxiykj8c38t7onq1un1hzluczziamewmysyp9hto3vkogynwcqhidukm0icq/640?wx_fmt=png& Wxfrom=5&wx_lazy=1 "alt=" 640?wx_fmt=png&wxfrom=5&wx_lazy=1 "/>

Through the HDFS implementation and different data service platform docking, currently supports multiple versions of the Hadoop computing platform, such as pivotal, azure data lake Cloudera, Hortonworks and Apache Hadoop.

650) this.width=650; "style=" Width:auto;height:auto; "src=" orl2fuhmgzdaduaa3mhkcpys99jeaxiyt4goiad6uzz3nxbbmsczoysuquh6ydjbsj3vtxtpib82lqftg5pyiasw/640?wx_fmt=png& Wxfrom=5&wx_lazy=1 "alt=" 640?wx_fmt=png&wxfrom=5&wx_lazy=1 "/>

Isilon and pivotal Data Lake scenarios

Greenplum is a dedicated database company acquired by EMC, mainly including the share nothing MPP database Greenplum, support and Greenplum Hadoop,greenplum for HDFs and Onefs docking DCA integrates virtual machines to enable multi-tenant data Warehousing greenplum Chorus and Greenplum Analytics consulting services. EMC offers a traditional, standalone big data computing solution made up of gemfire/sqlfire real-time compute and Greenplum db.
The pivotal product line offers big Data solutions for Greenplum (HAWQ) and Hadoop integrations to deliver greater processing power in order to meet the demands of unstructured big data. By grafting the DBMS to Hadoop, Hadoop has structured data capabilities, improved parallelism and pipelining through the GNet parallel data flow engine, and coordinates the business flow between related nodes, moving data, collecting results, and so on when executing queries and other operations.
Pivotal HD Big Data solution from GemFire XD (developed by Gemfire/sqlfire) +hawq (from Greenplum db) + Pivotal HD engine +spring XD (distributed data, data import, intel rocket lake release date batch processing, Data export and stream processing).

650) this.width=650; "style=" Width:auto;height:auto; "src=" orl2fuhmgzdaduaa3mhkcpys99jeaxiy2qmn4sj01uibsahjkibovf48ffjpp0yxiak8roiblklejtt5hn2vnhvveq/640?wx_fmt=png& Wxfrom=5&wx_lazy=1 "alt=" 640?wx_fmt=png&wxfrom=5&wx_lazy=1 "/>

650) this.width=650; "style=" Width:auto;height:auto; "src=" orl2fuhmgzdqq4oz0utkuffic0eyr73whenex9ygcia4osxvj5knttkkd6bizdib4wtsuian6pgmz97xprk6ml64nq/640?wx_fmt=png& Wxfrom=5&wx_lazy=1 "alt=" 640?wx_fmt=png&wxfrom=5&wx_lazy=1 "/>

Built on Apache Hadoop optimization, Pivotal HD provides data processing capabilities for data lake scenarios. The job tracker dispatches the parallel task and the task tracker effectively completes the calculation and returns the result of the calculation. Calculates the median value and results of HDFs local save. Pivotal HD also offers structured processing capabilities and real-time data processing capabilities through HAWQ and Gemfile XD.

650) this.width=650; "style=" Width:auto;height:auto; "src=" orl2fuhmgzdaduaa3mhkcpys99jeaxiykdtwdonhiaaosjt2p4qu8fz8umnvwgct1xrjhsmg5bl7hzkhbavdkag/640?wx_fmt=png& Wxfrom=5&wx_lazy=1 "alt=" 640?wx_fmt=png&wxfrom=5&wx_lazy=1 "/>

Isilon provides HDFS storage interface implementations and pivotal HD docking, data storage and efficient reading through name node and data node, and the ability to solve massive data expansion with Isilon Onefs offers rich software features and multiple replicas (EC algorithms) to improve the reliability of the data lake.

650) this.width=650; "style=" Width:auto;height:auto; "src=" orl2fuhmgzdaduaa3mhkcpys99jeaxiy4bmgk4vfh7rz4pghpznmjjcxdhkfpvwewnrh4xr2mcoyqcvvp9l24w/640?wx_fmt=png& Wxfrom=5&wx_lazy=1 "alt=" 640?wx_fmt=png&wxfrom=5&wx_lazy=1 "/>

Warm tips:

Please search "Ict_architect" for more information on the public number.

This article from the "ICT Architects Technical Exchange" blog, declined to reprint!

Data Lake Past Life analysis (last)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.