So far, we are still not fully aware of the level of mathematics required to start machine learning, especially those who do not study mathematics or statistics at school.

In this article, my goal is to present the mathematical background needed to build a product or conduct a machine learning academic study. These recommendations stem from conversations with machine learning engineers, researchers, and educators, as well as my experience in machine learning research and industry roles.

First, I will come up with different mindsets and strategies to get close to real mathematics education outside of the traditional classroom. Then, I will outline the specific contexts required for different types of machine learning work, ranging from high school statistics and calculus to probability graph models (PGM).

An explanation about math anxiety

It turns out that many people, including engineers, are afraid of mathematics. First of all, I want to talk about the myth of "being good at mathematics."

The truth is that people who are good at mathematics have a lot of habits of practicing mathematics. It's not that they are good at mathematics innate, you may find them handy when they watch mathematics. Be clear that it takes time and effort to achieve this comfort, but it is certainly not something you have. The rest of this article will help you determine the level of mathematical foundation you need and outline the strategy for building it.

Getting Started

As a prerequisite, we assume that you have basic knowledge of linear algebra/matrix operations and probability calculations. I also hope that you have some basic programming skills that will support as a tool for learning math in context. After that, you can adjust your main direction according to the type of work you are interested in.

How to learn mathematics outside the school? This problem has almost plagued many of us. I believe that the best way to concentrate on learning math is in the age of students. Outside of this environment, you may not have the atmosphere, peers, and resources available in the academic classroom.

Studying mathematics outside the school, I suggest forming a study group and learning to share their resources in a timely manner. Mutual motivation plays an important role here, and this “extra” research should be encouraged and motivated so that it is very motivating to learn.

Mathematics and code

Mathematics and code are highly intertwined in machine learning workflows. The code is usually built from mathematical models, and it even shares mathematical symbols. In fact, modern data science frameworks (such as NumPy) make it intuitive and efficient to convert mathematical operations (such as matrix/vector products) into readable code.

I encourage you to write code as a way to consolidate learning. Both mathematics and code are based on rational thinking. The process of writing code is actually the process of understanding mathematical formulas. For example, a manual implementation of a loss function or optimization algorithm can be a good way to truly understand the underlying concepts.

An example of learning mathematics through code: implementing backpropagation of ReLU activation in a neural network. As a brief primer, backpropagation is a technique that relies on calculus chain rules to efficiently calculate gradients.

First, we visualize the ReLU activation, defined as follows:

To calculate the gradient (intuitively, the slope), you can imagine a piecewise function, represented by the indicator function as follows:

NumPy provides us with a useful and intuitive syntax. Our activation function (blue curve) can be interpreted in the code, where x is our input and relu is our output:

Relu = np.maximum(x, 0)

Next is the gradient (red curve), where grad describes the upstream gradient:

Grad[x < 0] = 0

In the absence of first deriving the gradient yourself, this line of code may not be very clear. In our code line, (grad) sets all values in the upstream gradient to 0 [h<0] for all elements that satisfy the condition. Mathematically, this is actually equivalent to a segmented representation of the ReLU gradient, which, when multiplied by the upstream gradient, compresses all values less than 0 to 0!

As we have seen here, we can think clearly about the code through our basic understanding of calculus. A complete example of this neural network implementation can be found here.

Math for building machine learning products

To write this part, I talked to a machine learning engineer to determine where mathematics is most helpful when debugging a system. The following is the question of mathematics in machine learning that the engineer himself answers. I hope that you can find some valuable questions from it.

Q: What clustering method should I use to visualize high-dimensional customer data?

Method: PCA and tSNE

Q: How should I calibrate the threshold for “blocking” fraudulent user transactions?

Method: Probability Calibration

In general, statistical and linear algebra can be used in some way for each of these problems. However, to get a satisfactory answer usually requires a specific domain approach. If so, how do you narrow down the types of math you need to learn?

Define your system

There are many resources on the market (for example, scikit-learn for data analysis, keras for deep learning) that will help you jump to writing code to model your system. When you are planning to do this, try to answer the following questions about the pipeline you need to build:

1. What is the input/output of your system?

2. How should you prepare the data to suit your system?

3. How do you build features or plan data to help your model summarize?

4. How do you define a reasonable goal for your problem?

You may be surprised to define a system that needs to deal with so many problems! After that, the engineering required for pipeline construction is also very important. In other words, building machine learning products requires a lot of heavy work and does not require an in-depth mathematical background.