With the maturity of large data and predictive analysis, the advantage of open source as the biggest contributor to the underlying technology licensing solution is becoming more and more obvious.
Now, from small start-ups to industry giants, vendors of all sizes are using open source to handle large data and run predictive analytics. With the help of open source and cloud computing technology, startups can even compete with big vendors in many ways.
Here are some of the top open source tools for large data, grouped into four areas: data storage, development platforms, development tools, and integration, analysis, and reporting tools.
Data storage:
Apache Hadoop–cloud Foundry (VMware), Hortonworks, hadapt
NOSQL database –mongodb, Cassandra, Hbase
SQL database –mysql (Oracle), mariadb, PostgreSQL, Tokudb
Development platform:
Apache Hadoop Platform –impala (open source large data analysis engine); Lingual (ANSI SQL); Pattern (analytics); Cascading (Open source large Data application development framework)
Apache Lucene and SOLR platforms
OpenStack (Building private cloud and public cloud)
Red Hat (standard Linux distribution with Hadoop server)
REEF (Microsoft's Hadoop developer platform)
Storm (integrated with various queuing systems and database systems)
Development tools and integrations:
Apache Mahout (machine learning programming language)
Python and R (predictive Analysis programming language)
Analysis and reporting tools:
Jaspersoft (report and Analysis Server)
Pentaho (data integration and business analysis)
Splunk (It analysis platform)
Talend (large data integration, data management and application integration)
The above is our summary of the big data aspects of good tools, I hope to help you.
You might also like:
1. Hadoop Deployment Small script sharing
2. Hadoop fully Distributed Environment setup
3. Hadoop Yarn FAQ and Solutions