Work for more than two years, has not written a summary. I think it's time to write a summary and see the recent gains and lessons.
I was working on big data development, starting in 2015, with limited access to technology, starting with MapReduce, to HDFs and Hadoop she ' ll commands, to Spark, Hive, Hbase,sqoop, and the basics, A Hadoop cluster was also deployed during the period (because it was just testing the Hadoop cluster deployment, the back cluster was withdrawn), a big data project was made with Sqoop, MapReduce and Spark, which was officially launched at the end of 2015 and is now functioning normally.
By the year 2016, several projects have been done. Specific projects have recommended platform, user portrait, real-time referral system. In the process, a lot of new technologies were used, Hadoop, Spark, Hive, Sqoop, Hbase, Flume, Kafka, Redis, MemCache, parquet, Avro, etc.
In the aspect of platform construction, I researched the Hue Platform management tool and Ooozie this task scheduling tool.
2017-Year Plan
0, review the basic Java grammar, concurrent programming, and so on. Java programming ideas, Java and design patterns, in-depth understanding of Java virtual machines, Java concurrent Programming combat
1. Look at the Scala grammar several times: "Learn Scala fast"
2, summarize spark, write a few summary articles
3. Read the Hadoop and Spark source code
4, continue to study the application of big data-related technologies, Hadoop, Hbase, Hive, Kafka, flume, etc.
4, review "calculus", "Probability theory and Mathematical Statistics", "Linear algebra"
5. Master the idea, derivation process and code of several main algorithms of machine learning.
2016 summary and 2017-year plan