Data mining and machine learning interview questions

Source: Internet
Author: User
Tags machine learning data analysis convolution kernel data standardization neural network

In the past few months, I have interviewed many companies about internships in data science and machine learning. Introducing my background, my postgraduate career was machine learning, computer vision, and most of the time I was studying academics, but I had an eight-month entrepreneurial experience in the early days (not related to ML). My interviews include data science, traditional machine learning, natural language processing, or computer vision, and interviews are big companies like Amazon, Tesla, Samsung, Uber, and Huawei, as well as many startups.

Today, I will share all the questions I encountered during the interview and share how to answer them. Some of these questions are relatively normal and have a certain theoretical background, but some are very innovative. For some common interview questions, I will simply list them because they are easy to find online. Mainly explain in depth the less common interviews. I hope that after reading this article, I will be able to help you perform better in machine learning interviews and find the work you have always dreamed of.

Let's start:

How to weigh deviations and variances?

What is the gradient drop?

Explain the over-fitting and under-fitting, how to solve these two problems?

How to deal with dimensional disasters?

What is a regularization item. Why use regularization and say some common regularization methods?

Explain the principle of PCA

Why is the Relu activation function used more in the neural network than the Sigmoid activation function?

What is data standardization and why is data standardization?

I think this issue needs attention. Data normalization is a pre-processing step that normalizes data to a specific range to ensure better convergence in backpropagation. In general, the value will be averaged and then divided by the standard deviation. Without data normalization, some features (large values) will have a greater impact on the loss function (even if this particularly large feature only changes by 1%, but his effect on the loss function is still large and will make other values Smaller features become less important). Therefore, data standardization can make the importance of each feature more balanced.

Explain what is dimension reduction, where is the use of dimensionality reduction, what are its benefits?

Dimensionality reduction refers to the removal of some redundant features and the reduction of the dimensions of data features by retaining some of the more important features. The importance of a feature depends on how much dataset the feature can express, and on which method is used for dimensionality reduction. Which dimension reduction method is used is the effect of repeated experiments and each method on the data set. In general, a linear dimensionality reduction method is used first and then a nonlinear dimensionality reduction method is used to determine which method is appropriate. The benefits of dimensionality reduction are:

(1) saving storage space;

(2) speed up the calculation (such as in machine learning algorithms), the fewer the dimensions, the less the amount of calculation, and the ability to use algorithms that are not suitable for high dimensions;

(3) Removing some redundant features, such as dimensionality reduction, so that the data does not retain the characteristics of the square size and square miles representing the size of the terrain;

(4) Reduce the data dimension to 2D or 3D to make it visible, easy to observe and mine information.

(5) Too many features or too complex features can over-fitting the model.

How to deal with missing value data?

There may be missing values in the data, there are two ways to deal with it, one is to delete the entire row or the entire column of data, and the other is to use other values to fill these missing values. In the Pandas library, there are two useful functions for handling missing values: the isnull() and dropna() functions help us find missing values in the data and delete them. If you want to fill in these missing values with other values, you can use the fillna() function.

Interpretation clustering algorithm

Please refer to (https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68--detailed various clustering algorithms)

How would you conduct exploratory data analysis (EDA)?

The purpose of EDA is to mine some important information about the data. EDA exploration is generally performed from coarse to fine. In the beginning we can explore some global information. Observe some unbalanced data and calculate the variance and mean of each class. Take a look at the information in the first few lines of data, including what features and other information. Use df.info() in Pandas to find out which features are contiguous, discrete, and their type (int, float, string). Next, delete some of the columns that are not needed. These columns are not useful in the process of analysis and prediction.

For example, some of the values in some columns are the same, or there are many missing values in these columns. Of course, you can also use some median, etc. to fill these missing values. Then we can do some visualization. A bar chart can be used for some category features or values that are relatively small. A bar chart of the class label and the number of samples. Find some of the most general features. Visualize the relationship between some features and categories to get some basic information. You can then visualize the relationship between two features or three features and explore the connections between the features.

You can also use PCA to find out which features are more important. Combine features to explore their relationships, such as when A=0, what is the category of B=0, A=1, B=0? Comparing different values of features, such as gender characteristics, there are two values for men and women. We can see whether the sample categories of male and female values are different.

In addition, in addition to basic drawing methods such as bar graphs and scatter plots, PDFCDF or overlays can also be used. Observe some statistics such as data distribution, p-value, and so on. After these analyses, you can finally start modeling.

Some simple models such as Bayesian models and logistic regression models can be used at the outset. If you find that your data is highly non-linear, you can use polynomial regression, decision trees or SVMs. Feature selection can be based on the importance of these features in the EDA process. If you have a large amount of data, you can also use a neural network. Then observe the ROC curve, recall rate and precision.

How do you think about which models to use?

In fact, this is a lot of routines. I wrote an article on how to choose the right regression model, linked here (https://towardsdatascience.com/selecting-the-best-machine-learning-algorithm-for-your-regression-problem-20c330bad4ef).

Why use a convolutional neural network instead of a fully connected network in image processing?

This problem was encountered when I interviewed some visual companies. The answer can be divided into two aspects: First, the convolution process takes into account the local features of the image and can extract spatial features more accurately. If you use a full connection, we may consider a lot of irrelevant information. Secondly, CNN has translation invariance. Because the weights are shared, the image is shifted, and the convolution kernel can still be identified, but the full connection can't.

What makes CNN translationally invariant?

As explained above, each convolution kernel is a feature detector. So just as we are investigating something, it doesn't matter where the object is in the image. Because in the convolution process, we use the convolution kernel to perform a sliding convolution on the entire picture, so CNN has translation invariance.

Why do you need Max-pooling in the CNN that implements the classification?

Max-pooling can reduce the feature dimension, which reduces the calculation time, and does not lose too much important information, because we save the maximum value, which can be understood as the most important information under the window. At the same time, Max-pooling also provides a lot of theoretical support for CNN's translation invariance. For details, see Wu Enda's benefits of MaxPooling (https://www.coursera.org/learn/convolutional-neural-networks/lecture/hELHk/pooling -layers).

Why do CNNs applied to image cutting generally have an Encoder-Decoder architecture?

The Encoder CNN is generally considered to perform feature extraction, while the decoder part uses the extracted feature information and performs image cutting by decoding these features and scaling the image to the original image size.

What is batch normalization, what is the principle?

Batch Normalization is in the training process, adding a standardized process to each layer of input.

One reason why deep neural networks are complicated is that the input of each layer is constantly changing due to the update of the parameters of the previous layer during the training process. So there is a way to standardize the input of each layer. The specific normalization method is as follows. If only the normalized result is input to the next layer, this may affect the characteristics of this layer learning, because the feature distribution learned by this layer may not be normally distributed. Therefore, forced to become a normal distribution will have an impact, so you need to multiply γ and β. These two parameters are learned during the training process, so that the learned features can be retained.

From the network

The neural network is actually a series of layers combined, and the output of the upper layer is used as the input of the lower layer, which means that we can regard each layer of the neural network as a small sequence network with the layer as the first layer. This way we normalize the output of the layer before using the activation function and then use it as the input to the next layer, which solves the problem of constant input changes.

Why are convolution kernels generally 3*3 instead of larger?

This problem is well explained in the VGGNet model. There are two main reasons: First, compared to using a larger convolution kernel, multiple smaller convolution kernels can be used to obtain the same receptive field and obtain more feature information, while using a small convolution. Fewer nuclear parameters and less computation. Second: You can use more activation functions and have more nonlinearity, making the decision function in your CNN model more deterministic.

Do you have any projects related to machine learning?

For this question, you can answer the connection between the research you have done and the business of their company. Do some of the skills you have learned may be related to their company's business or the position you applied for? It does not need to be 100% consistent, as long as it is related in some way. This helps them think that you can make more value in this position.

Explain your research during your postgraduate studies? What kind of work are you doing? What is the future direction?

The answers to these questions are consistent with the answer to the 20 questions.

To sum up:

All the interview questions I encountered while interviewing data science and machine learning positions are here. I hope you can enjoy this article and learn some new and useful knowledge!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.