Big Data extracting value information technology Realization Scheme
In 5 steps:
1. collect files via FTP
2, the file into the HDFS system
3. Use hive to select data from HDFs
4. Use DataStage or Infomatica to put the data into storage
5. Warehousing to Sybase IQ database
Precautions:
1, do not necessarily use FTP to collect files, anyway, as long as the collection of massive files can be;
2, the source of the acquisition of files must be massive, can file a large number of files can be a huge amount of content, otherwise it is not called Big data;
3, This is mainly used in the Hadoop HDFs, no use of mapreduce;
4, MapReduce is actually the hive to help you achieve;
5, the use of hive is because as long as the SQL will use hive, the study cost is low, the general enterprise, especially the old enterprise will SQL developers a lot of people;
6, DataStage is IBM, feel not good, so now replace with Infomatica;
7, IBM's things are sold very cheap, but the maintenance fee is very expensive, he does not open source so you have to find him to help maintain, so I always hated it;
8, IBM's things not only maintenance expensive, and expansion of the node is not cheap, now some of the company's main engine has turned to HP;
9, not necessarily choose Sybase IQ, so the company chose also no big problem, query speed is very fast, update and insert temporarily also don't feel very slow, it is based on the column storage and the price is very much cheaper than Oracle.
Application Scenarios:
For example, your site has a large number of user search information, you can put this information file into HDFs, and then select the number of each keyword search, finally put this keyword and the number of times into the IQ. So, if you look directly at IQ, you can see the most recent search of the most attention is what the word.
This article originates from: Ouyida3 's csdn
2015.3.18
Big Data extracting value information technology Realization Scheme