In the work of time and space, compared to the age of the students, the speed of this dimension seems to be much faster, although every day almost full of technology, constantly in the existing requirements of technology to meet the bottleneck, and the beginning is not a bottleneck, because the beginning is nothing, almost a blank. So the demand relative technology is difficult to realize, undoubtedly benefited from the Java opensource, after appropriate research {according to the same requirements, the most popular online, classic Open Source solution}, introduced the "that solution", such as real-time search under Big data conditions solution, SOLR was introduced, because the existence of hbase has already been identified, so it can only be solr as a level two index of hbase. In fact, like other HDFs-based SQL, indexed, but the formation of a "..." situation, HBASE+SOLR.
A bunch of crap, you know what to say. Face big Data processing post half a year work experience, oneself feel do miscellaneous, many, scattered, and seems how a data processing just use MapReduce, encapsulation also in MapReduce granularity, no algorithm. Just a simple data item extraction package, is a primitive relational processing strategy, of course, I personally think this is not a permanent, but the return of the most primitive, but also in the granularity of MapReduce Data Association, thus processing, it is like metaphor, someone to Kokonoe days, and then often in a heavy day activities, I did not go to Kokonoe Day, has been in a heavy day activities, there is no doubt that the actual nature of the two one-day activity is different, when you pass the actual correlation algorithm analysis, and then determine the simplest implementation, that is the entire data analysis process, otherwise it is just blind, simple data processing. This data analysis and processing is just a lot and important part of all the "data flow", but he is a prerequisite, data flow design, data structure design, metaphor: process, avro-hadoop-hbase-solr-web, data structure design, such as the particularly important SOLR schema, HBase's Rowkey and table schema, of course, are part of the structural design, which can be called data structure design. In addition there are cluster structure design, then is two aspects, one in the cluster granularity above design, such as Hadoop cluster NN{1+},DN, second, the bottom infrastructure "each cluster integration and relationship" design ", For the time being just imitate apatche big data processing ecology circle as well as each commercial company's data processing solution. In order to facilitate the cluster tuning to build a Ganglia+nagios cluster monitoring and management system, but very helpless to get the company's use, and then ..., just as their own technology to achieve, not to get commercial applications, not to create business value, Mourning!!!!!!!!!!!!! One of the reasons for the machine "1.7 4g" as SOLR, the storage speed has been not satisfied with the CPU "ganglia" exploded table, found a lot of information, no solution, and finally want to put data with MQ to disk??????????????? One of them wanted to put the entire SOLR indexing action to the MapReduce implementation of "No, ...", but ..., and no end.
All in all, the most commonly used is SOLR, HBase Official Reference Guide, and thank Itbook, almost formed half a month 700pages English book reading habits, this is a good thing, I hope that they will always stick down, Use this habit forever in your own life, and do things just in terms of use, such as Hadoop, SOLR, HBase, Zookeeper, Java, ganglia, Nagios, and so on spark read two or three book flow clear more Hadoop almost { Relative encoding}, but not commercially available. From the language point of view, Java mapreduce for data analysis and processing, shell-related operations of the cluster.
Then next, there are three parts, according to the importance of "only personal" Java mapreduce algorithm encapsulation, in-depth research, Hadoop-level data retrieval Learning "", non-solrcloud level, shell cluster related. The most important thing is to maintain and follow up the existing ones.
Pre-stage Learning summary, next plan