Large Data Engineer (ETL) interview series (1) __c language

Source: Internet
Author: User
1. What do you think is the difference between spark and Hadoop, please briefly say.

me : Hadoop is suitable for off-line analysis, batch processing, spark for real-time analysis, near real-time streaming, and micro batch processing. 2. What do you think is the difference between Python and Java in use?

I : In fact, in peacetime use, and did not overly fragmented these two, because after all, they are result-oriented so whether Python's indented format or Java to add commas, finally can achieve my needs on it.
complement : Now, to look at this problem, you will find that in fact, in the use of the process, Python,java do have some need you to turn the attention, such as the "Python" list of the Remove function and the "Java" List of the Remove method, the same name ability. And how python some of the wheels are implemented in Java is also to be noted. 3. Give you two tables, table A and table B, where table A has 3 pieces of data, table B has 5 data, Q: There are several after table a LEFT join table B.

me : Less than or equal to the number of a, which is less than or equal to 3 bar
complement : Now it seems that at that time was caught in the work of a misunderstanding, that is, the interviewer did not say join the field is the primary key, it can not only ha, not only will lead to greater than 3, because there are duplicates, and B table records are not enough places are null, so may be less than. Do not believe Sir you see:
Example 1:
The number of records in Table A is all in table B and the B-table ID is unique

Example 2
The Records of table A are missing in the C table but the C table record is unique :

Example 3
The Records of table A are all in the D table, but the D table IDs are not unique :

So the positive solution should be greater than or equal to the number of bars in table a

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.