Big data refers to the analysis of massive data processing, may be the number of EB-level processing, we have previously mentioned that big Data has 4V features, Volume (Large), Velocity (high Speed), Variety (multiple), value (value), for the analysis of Big data processing, Need to have specialized hardware and software tools for professional processing, big data through the collection to clean calculation and mining to show and use, each step of the tools available are different, let's take a big data area of some practical tools. Free to use like eight claw fish collection similar tools will also recommend Oh, borrow a quote from the present, if you are useful, I will be free to the end!
1. Collection Tools
As the volume of Big Data datasets is very large up to EB level, and data sources come from a variety of public information, sensors, climate and other channels, with examples of Internet channels, web site public information, transaction records, evaluation information, blog, video, event monitoring, large-scale e-commerce, portal information or vertical media and other channels. Like the eight-claw fish harvester, which is a big data collection tool for the next generation of acquisition technology, the data source collection is now a common tool:
Scraperwiki (can get data from multiple data sources, generate custom views)
Needlebase (can write code automation crawl public Web site)
Bazhuayu (can crawl more than 98% of web data and 60% of mobile data)
2. Data processing
Hadoop (a software framework that enables distributed processing of large amounts of data with high reliability, scalability, efficiency, and high fault tolerance)
Storm (distributed real-time computing system, open source systems, support for multiple programming languages to handle Hadoop bulk data)
Apache Drill (enables Hadoop users to query massive datasets faster)
RapidMiner (Distributed data Mining, free data mining technology and library, Java code support)
3. Data storage (calculation)
Apache Hadoop
NOSQL database –mongodb, Cassandra, Hbase
SQL database –mysql (Oracle), MariaDB, PostgreSQL, Tokudb
4. Data cleaning
datawrangler(visual Group data cleansing and rearrangement tool based on Web services, text editing is simple but not suitable for commercial or sensitive information processing)
Google Refine(supports clustering of multiple formats, no spreadsheet calculation)
Openrefine (Interactive Data conversion tool, which encodes new row data)
5. Data analysis
Jaspersoft (Reporting and Analysis Server)
Pentaho (data integration and Business Analytics)
Splunk (It analytics platform)
Talend (Big Data integration, data management and application integration)
6. Presentation Tools
Visual Presentation Tools
excel/ Csv/json(data analysis tools for business people)
Google Chart API (dynamic icon tool, must be used on devices that support JavaScript)
Flot (wireframe Chart Library, support for all canvas-enabled browsers )
D3 (another JavaScript library that supports SVG rendering, providing a large number of linear graphs and complex chart styles outside of bar charts)
Processing (a signature tool for visualizing data, compiling simple code to compile into Java)
FUSION TABLES (can visualize data based on geographic location)
Gephi (common tool for visualizing the social graph data)
SPSS and SAS
R (Big Data Predictive analytics tool)
Modest maps (its own map library is small, but with wax and other extension libraries is very powerful)
Openlayers (High reliability Map Library)
In addition to some of the tools described above, there are many other tools available for each category, and interested friends can learn more about them.
Big Data Technology