Ctrip Machine Learning Internship Interview

Source: Internet
Author: User

2018/3/10 more

2018.02.06-Ctrip Interview question:

(1) Node partitioning criteria for decision trees:

1, Information entropy change-Information gain 2, information gain rate: In the process of overcoming computational information gain, because of the value of a certain feature is too large, resulting in a large information gain, but this feature is not helpful to the classification, so through the information gain rate to overcome the problem 3, Gini index: Reaction Sample Set purity, the smaller the Gini index, The higher the purity of sample set, the paper points out that the choice of three criteria will not have much effect on the result.

(2) Learning rate problem:

Learning rate set, too large lead to skip the most advantages, too small lead to iterative process time is longer, the improved method is variable length: that is, at the beginning of the time to increase the step, and then reduce the step size;

(3) Xgboost and text mining are understood:

I want to learn, I usually use random forest, and then did not do text mining related things, according to internship, but have pure SQL and Python, for module development, their own crawler crawl data and wash data; the second is Cisco, usually doing some machine learning demo, Cisco interview will be followed by a blog post, to be concerned about.

(4) Stochastic forest parameter control problem:

In general, the experimental method is used to iterate the tree which satisfies the performance, and the minimum value is taken to reduce the cost of calculation.

(5) SQL left and right connection differences:

Left join with left table as the main table, the right table with the same data as the left table, mapping, and remain in the left table, the right connection vice versa;

(6) Whether there is a machine learning project:

There is no good answer here, it should be prepared fully, the resume inside kaggle all the process of the game to do things, speak more specific, other companies internship project to speak more carefully, their own paper and the company internship demo is also, their own words brought, to the second technical officer of a lack of practical experience feeling. Reflection.

(7) Sample Data imbalance processing method:

Because my second paper is this, so the answer is better. Because the model is trained on unbalanced data sets, it will result in the low precision and accuracy of the small sample class, but the small sample class is often important, such as short message fraud, etc. processing methods are generally used smote, over sampling, less than sampling technology on the data set to deal with, reduce the tilt rate of the dataset At the algorithm level, cascade, cost sensitive transformation and integrated learning (Adboost) can be used.

Summarize:

The interview question is not difficult, but feel that they are not prepared enough, the interview or lack of experience, not to achieve the best personal level, a lesson, the next efforts. The results are unknown, but I think every interview is a learning process, hereby share, I hope to help you.

Finally, the interviewer asks you if you have any questions. I just asked what the company was mainly doing and what I should have asked. Of course, suddenly thought, can ask the interviewer, I in the interview what is insufficient, help next raise.

Reprint please indicate the source, thank you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.