1. What do you think is the difference between spark and Hadoop, please briefly say.
me : Hadoop is suitable for off-line analysis, batch processing, spark for real-time analysis, near real-time streaming, and micro batch processing. 2. What do you think is the difference between Python and Java in use?
I : In fact, in peacetime use, and did not overly fragmented these two, because after all, they are result-oriented so whether Python's indented format or Java to add commas, finally can achieve my needs on it.
complement : Now, to look at this problem, you will find that in fact, in the use of the process, Python,java do have some need you to turn the attention, such as the "Python" list of the Remove function and the "Java" List of the Remove method, the same name ability. And how python some of the wheels are implemented in Java is also to be noted. 3. Give you two tables, table A and table B, where table A has 3 pieces of data, table B has 5 data, Q: There are several after table a LEFT join table B.
me : Less than or equal to the number of a, which is less than or equal to 3 bar
complement : Now it seems that at that time was caught in the work of a misunderstanding, that is, the interviewer did not say join the field is the primary key, it can not only ha, not only will lead to greater than 3, because there are duplicates, and B table records are not enough places are null, so may be less than. Do not believe Sir you see:
Example 1:
The number of records in Table A is all in table B and the B-table ID is unique
Example 2
The Records of table A are missing in the C table but the C table record is unique :
Example 3
The Records of table A are all in the D table, but the D table IDs are not unique :
So the positive solution should be greater than or equal to the number of bars in table a