1. You feel that large data processing technology is divided into several categories.
i : roughly divided into 3 classes, Hadoop is representative of the batch processing, impala,hbase for the representative of the interactive processing based on historical data, Storm,spark,flink for the representative of the flow of processing. 2. Linux system commands What you are familiar with.
me : Cat,tree....etc 3. Tell me what kind of job you are in the eyes of data development.
I : Just finished watching the ETL and Storm series of video, so I learned this two series video on the concept of the function of data development said a shallow view: ①etl, using elk or other ETL stack for data extraction, Convert, clean to warehousing. ② take Storm as an example, the realization of bolt logic and topology scheduling is a kind of data development.
complement :
① the following is Baidu Encyclopedia of Data Development annotation, personal feeling more inclined to the big data era before the traditional sense of data development, of course, this tradition is also our necessary foundation:
Data development Baidu Encyclopedia
②[Large Data engineer Skill map]
4. Tell me what the role of the Hadoop framework is.
me : Then let me say hadoop2.0+: The Namenode,datanode,journalnode,yarn frame: Resourcemanager,nodemanager, Dfszkfailovercontroller. 5. Assuming that a scenario, we have a customer, we have negotiated the data model is based on four fields, but later may be due to other reasons will have new field add to do.
me : Hbase,nosql column Database, when the new field is added, there is no relationship type annoyance.
interviewer : Think of NoSQL This is very good, we take the program is Elasticsearch + Hive (several warehouses).
A flexible data model
NoSQL you can store a custom data format at any time without creating a field for the data that you want to store. And in the relational database, adding and deleting fields is a very troublesome thing. If it's a very large amount of data, adding fields is a nightmare. This is particularly evident in the web2.0 era of large data volumes.