After four years of college experience, postgraduate entrance exams, and research in year 11, new confusions are emerging. In Year 11, I had gained a lot from my research. In addition to my class, the mentor gave me several ways to explore, including (1) databases and data warehouses, Taking Microsoft SQL Server 2008 as the main line. (2) distributed storage and computing, with the hadoop ecosystem as the main line. (3) data mining, with WEKA as the main line. (4) recommendation system, with taste as the main line. (5) social networks are dominated by ucinet. (6) machine learning is based on scikit-learn and mahout. (7) The search engine is dominated by Lucene, nutch, and SOLR. (8) crowdsourcing theory, focusing on AMT (Amazon Mechanical Turk. (9) The topic model is based on LSA, plsa, and lda. In Year 11, I also read a lot of papers on recommendation systems and social networks, especially topic models, time models, and trust models.
In short, in year 11, I studied classes, read papers, and explore my favorite directions. In the process of exploration, I learned how to search for materials and solve a problem. I think this is my biggest achievement. For example, to understand the topic model, you have learned data mining, machine learning, search engines, natural language processing, pattern recognition, artificial intelligence, graphics and images, machine vision, and parallel computing, there are also mathematical analysis, advanced algebra, probability theory, statistics, information theory, random processes, discrete mathematics, specific mathematics, and time series analysis.
A year has passed. After the summer vacation, I am about to study twice. This means (1) You can no longer explore your favorite directions. (2) I have to write my thesis, because I cannot answer my graduation question without a thesis. (3) If you are looking for a job, you need to study in depth. (4) I am about to go out for an internship and prepare for thinking and technology in order to step into the society.
Now, the biggest problem you face is to find a direction you like and then study in depth. This is a matter of direction and direction. After a long time of thinking and his own situation, he decided to focus on distributed machine learning. The specific plan is as follows:
I. Preparations
1. Focus on mahout.
Note:
Machine learning is a very complicated problem. It is definitely not just a few tools that can be done, because machine learning is based on mathematics and cannot do well in mathematics. However, considering the actual situation, you can only learn distributed machine learning while laying the foundation.
2. Learn about the hadoop ecosystem and big data processing tools such as spark.
3. proficient in C, Java, and Python programming.
4. Learn data structures and algorithms.
5. Familiar with MySQL, especially stored procedures.
Note:
All learning is based on the code, and the above content must be skillful.
Ii. Professional Theory
1. Data Mining and machine learning
Note: Learn scikit-learn as the main line.
2. Statistics
Note: Learning SPSS is the main line.
3. Parallel Computing
Note: The main line is mpich2 and mapreduce.
III. Basic Theory
1. Discrete Mathematics
2. Probability Theory
3. Random Process
4. Time Series Analysis
Note: MATLAB and R are used as the main line.
Although I focus on distributed machine learning, I am not sure what kind of work I will do in the future. No matter what kind of work you are engaged in, you should be familiar with at least one programming language (Java and Python), one Database (MySQL), one platform (hadoop, etc.), and several common tools (SPSS, MATLAB and R) to harden the data structure and algorithm (c ). Considering your actual situation, you also need to master parallel computing (mpich2 and mapreduce ).
In short, the learning strategy developed in year 21 is deep learning, which attaches importance to the depth of learning and master the level of proficiency.