Big Data Technology

Source: Internet
Author: User

Big data refers to the analysis of massive data processing, may be the number of EB-level processing, we have previously mentioned that big Data has 4V features, Volume (Large), Velocity (high Speed), Variety (multiple), value (value), for the analysis of Big data processing, Need to have specialized hardware and software tools for professional processing, big data through the collection to clean calculation and mining to show and use, each step of the tools available are different, let's take a big data area of some practical tools. Free to use like eight claw fish collection similar tools will also recommend Oh, borrow a quote from the present, if you are useful, I will be free to the end!

1. Collection Tools

As the volume of Big Data datasets is very large up to EB level, and data sources come from a variety of public information, sensors, climate and other channels, with examples of Internet channels, web site public information, transaction records, evaluation information, blog, video, event monitoring, large-scale e-commerce, portal information or vertical media and other channels. Like the eight-claw fish harvester, which is a big data collection tool for the next generation of acquisition technology, the data source collection is now a common tool:

Scraperwiki (can get data from multiple data sources, generate custom views)

Needlebase (can write code automation crawl public Web site)

Bazhuayu (can crawl more than 98% of web data and 60% of mobile data)

2. Data processing

Hadoop (a software framework that enables distributed processing of large amounts of data with high reliability, scalability, efficiency, and high fault tolerance)

Storm (distributed real-time computing system, open source systems, support for multiple programming languages to handle Hadoop bulk data)

Apache Drill (enables Hadoop users to query massive datasets faster)

RapidMiner (Distributed data Mining, free data mining technology and library, Java code support)

3. Data storage (calculation)

Apache Hadoop

NOSQL database –mongodb, Cassandra, Hbase

SQL database –mysql (Oracle), MariaDB, PostgreSQL, Tokudb

4. Data cleaning

datawrangler(visual Group data cleansing and rearrangement tool based on Web services, text editing is simple but not suitable for commercial or sensitive information processing)

Google Refine(supports clustering of multiple formats, no spreadsheet calculation)

Openrefine (Interactive Data conversion tool, which encodes new row data)

5. Data analysis

Jaspersoft (Reporting and Analysis Server)

Pentaho (data integration and Business Analytics)

Splunk (It analytics platform)

Talend (Big Data integration, data management and application integration)

6. Presentation Tools

Visual Presentation Tools

excel/ Csv/json(data analysis tools for business people)

Google Chart API (dynamic icon tool, must be used on devices that support JavaScript)

Flot (wireframe Chart Library, support for all canvas-enabled browsers )

D3 (another JavaScript library that supports SVG rendering, providing a large number of linear graphs and complex chart styles outside of bar charts)

Processing (a signature tool for visualizing data, compiling simple code to compile into Java)

FUSION TABLES (can visualize data based on geographic location)

Gephi (common tool for visualizing the social graph data)

SPSS and SAS

R (Big Data Predictive analytics tool)

Modest maps (its own map library is small, but with wax and other extension libraries is very powerful)

Openlayers (High reliability Map Library)

In addition to some of the tools described above, there are many other tools available for each category, and interested friends can learn more about them.

Big Data Technology

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.