Large Data Engineer (development) interview series (3) __c language

Source: Internet
Author: User
1. You feel that large data processing technology is divided into several categories.

i : roughly divided into 3 classes, Hadoop is representative of the batch processing, impala,hbase for the representative of the interactive processing based on historical data, Storm,spark,flink for the representative of the flow of processing. 2. Linux system commands What you are familiar with.

me : Cat,tree....etc 3. Tell me what kind of job you are in the eyes of data development.

I : Just finished watching the ETL and Storm series of video, so I learned this two series video on the concept of the function of data development said a shallow view: ①etl, using elk or other ETL stack for data extraction, Convert, clean to warehousing. ② take Storm as an example, the realization of bolt logic and topology scheduling is a kind of data development.
complement :
① the following is Baidu Encyclopedia of Data Development annotation, personal feeling more inclined to the big data era before the traditional sense of data development, of course, this tradition is also our necessary foundation:
Data development Baidu Encyclopedia

②[Large Data engineer Skill map]

4. Tell me what the role of the Hadoop framework is.

me : Then let me say hadoop2.0+: The Namenode,datanode,journalnode,yarn frame: Resourcemanager,nodemanager, Dfszkfailovercontroller. 5. Assuming that a scenario, we have a customer, we have negotiated the data model is based on four fields, but later may be due to other reasons will have new field add to do.

me : Hbase,nosql column Database, when the new field is added, there is no relationship type annoyance.
interviewer : Think of NoSQL This is very good, we take the program is Elasticsearch + Hive (several warehouses).

A flexible data model
NoSQL you can store a custom data format at any time without creating a field for the data that you want to store. And in the relational database, adding and deleting fields is a very troublesome thing. If it's a very large amount of data, adding fields is a nightmare. This is particularly evident in the web2.0 era of large data volumes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.