Big Data Current major trends (self-understanding)file system, deployment, various streams and open source tools-------ETL Development (BI project)----Data statistical analysis------data Mining, machine learning Image from the analysis
first, about KAKFA Kafka relatedKafka, a distributed messaging system developed by LinkedIn, is written in Scala and is widely used for horizontal scaling and high throughput rates. At present, more and more open-source distributed processing systems, such as storm,spark,flink, support integration with Kafka. Now our data real-time processing platform is also used in the Kafka. It has now been used by several different types of companies as multiple types of data pipelines and messaging systems.
Ii. Spark Spark and spark streaming core principles and practices
Third, Sqoop Sqoop study translating the Import or Export command into a MapReduce program to implement the InputFormat and OutputFormat in the translated MapReduce is primarily customized.
iv. FLume (blog)
v. The difference between Hive and HBase
Big Data Warehouse Collection