Large data is one of the most active topics in the IT field today. There is no better place to learn about the latest developments in big data than the Hadoop Summit 2013 held in San Jose recently.
More than 60 big data companies are involved, including well-known vendors like Intel and Salesforce.com, and startups like SQRRL and Platfora. Here are 13 new or enhanced large data products presented at the summit.
1. Continuuity Development company now supports batch processing
Continuuity released Continuuity Developer Suite 1.7, which supports batch processing, and integrates mapreduce into the platform to provide developers with a broader workload capability.
Continuuity helps Java developers build applications that run Hadoop and HBase databases. These applications support real-time applications such as operational analysis. But Continuuity's chief executive, Jon Gray, says some applications still require MapReduce's bulk processing architecture.
Continuuity Developer Suite 1.7 also provides some application templates for streaming real-time analysis, positioning and personalization, and anomaly detection.
2. Datameer first show large data analysis software
Datameer publishes Datameer 3.0 data integration and analysis software for enterprise users. This version adds an "intelligent analytics" feature that automatically identifies models and associations from a large amount of complex data stored in Hadoop.
Datameer 3.0 uses four machine learning techniques: clustering, decision tree, column dependencies, and recommendations. Although these are often the domain of data scientists, they are integrated into the Datameer software so that enterprise users can use it as a self-service service.
Datameer 3.0 will be available for beta testing in the next few months.
3. Hortonwork Community Preview Support Yarn HDP 2.0 platform
Hortonworks will preview the next generation of Hortonworks data Platform in the community that support yarn (Next generation Hadoop data processing framework).
As part of the ASF Hadoop project, Yarm is designed to implement multiple user instances rather than a single dataset. Supporting yarn in the HDP 2.0 Community Preview release will allow Hortonworks partners and customers to use the new technology to participate in the final specification, Hortonworks marketing vice President Dave McJannet said.
4. Kognitio launched the eighth Generation analysis platform
Kognitio introduces a new generation of kognitio analytic Platform, which enhances connectivity between multiple programming languages and improves performance. The new version provides NoSQL processing capabilities, as well as large-scale parallel processing of any script or binary code such as r, Python, or Java.
Benchmarking with this version shows that it returns the answer to a complex query request at twice times the speed of the previous version.
The new version also provides high speed data output for high-speed data backup and memory compression as an optional feature.
5. MapR and Fusion-io to improve hbase performance
MapR and Fusion-io, when running read-intensive hbase applications, combine the MAPR M7 large data platform based on Hadoop with Fusion-io iomemory system to achieve important performance improvements.
According to MAPR, HBase application performance is often limited by disk storage bottlenecks. 25 times-fold performance improvement with Fusion-io IOMEMORY,MAPR system.
The limitations of I/O performance can slow the use of HBase open source databases for High-performance computing tasks.
6.Pentaho add large data platform integration capabilities
Pentaho, a Business Analytics application development company, has launched the so-called "adaptive large Data layer" in its software, providing the ability to integrate with large data platforms.
The new technology connects Pentaho to the Hadoop release, such as Cloudera, Hortonwork, MapR Technologies, Intel, NoSQL database Cassandra, and MongoDB.
7.RainStor Upgrade database security and search capabilities
Rainstor introduced important updates to its database software, enhanced security features, and said it would improve the use of Hadoop among security-sensitive customers, such as government agencies, banks and telecoms companies.
New security features in the Rainstor database, which itself runs in Hadoo, including data encryption, data masking, and viewing, review tracking, tamper-proof, configurable data disposal, support for Kerberos, LDAP, Active Directory and Pam (Linux pluggable authentication module).
According to Rainstor said, the new search function to improve the database query performance of 10 times times ~100 times, to achieve a more high-speed text search. Now the database can search billions of records, several PB of data.
8.Splunk releases data analysis tools for Hadoop
Splunk, which is well known for its real-time operation of intelligent software, has launched a new beta version of hunk: Splunk Analytics for Hadoop.
Hunk integrates tools for mining, analyzing, and virtualizing Hadoop data. It employs splunk virtual indexing technology for data analysis, providing tools for providing tables, graphics, custom dashboards, and reports.
The software supports mainstream Hadoop distributions from Cloudera, Hortonworks, and MAPR.
9.SQRRL Release security Large data platform
The start-up company Sqrrl is about to launch SQRRL Enterprise 1.1, a secure, scalable platform for developing real-time analytics applications. With this release, SQRRL will be connected to the full supply phase from the limited release phase.
The 1.1 release also provides more advanced security tools based on Apache Accumulo, enhanced analytics, and features like JSON. New analytics features include Full-text search, using Apache Lucene, SQL, statistics, and graphical search.
Accumulo technology was originally developed by the U.S. National Security Agency and was spun off as an Open-source project in 2011.
10.Teradata releases a portfolio of Hadoop products
Teradata has launched Teradata Portfolio for Hadoop, a portfolio of hardware platforms, software, consulting services, training, and customer support for developing and managing Apache Hadoop.
This includes Teradata Appliance for Hadoop and Teradata Aster big Analytics Appliance Such "premium platforms" to choose from. The former loaded the Hortonworks Hadoop release, Mellanox InfiniBand hardware, and Teradata Bynet V5 software. The latter includes Aster databases, Sql-mapreduce, and Apache Hadoop.
Teradata also provides Teradata commodity revisit for Hadoop products that want to deploy Hadoop on a Dell Standard server. Teradata Software only for Hadoop is a software bundle for businesses that want to use and configure their own hardware.
11.VMware supports HADPP and large data workloads
VMware unveiled a public beta version of VMware vsphere Big Data Extensions, a new feature that extends the VMware virtualization platform to support Apache Hadoop and large data processing.
Enterprise customers can use new software to develop, run, and manage Apache Hadoop clusters, as well as applications on other common virtual infrastructures. This brings the benefits of virtualization to the Hadoop system, including scalability, performance, and resiliency, said Fausto Ibarra, senior director of VMware Product management.
VMware vsphere Big Data extensions originates from VMware's Serengeti Open Source project, which is expected to be fully delivered to customers by the end of this year.
12.WANdisco releases Hadoop new release and Ha software
WANdisco will launch Non-stop Namenode–wan Edition, a new replication technology that enables 100% uptime for global distributed large data systems based on the Hadoop platform. The company has provided a LAN version of the software.
WANdisco also shows a new version of WANdisco distro (WDD 3.6), based on Apache Hadoop 2.0, which supposedly supports migrations from Amazon Web services to private clouds. WANdisco also open-source the S3 API on Hadoop, allowing businesses to use their custom applications rather than Hadoop with S3hdfs. WANdisco will also provide support for shark real-time analytics and spark memory data processing technology in the future as an additional option for WANdisco distro 3.6.
13.Zettaset show support for the latest Cloudera and Hortonworks platforms
Zettaset's orchestrator Hadoop cluster management software now supports the Hadoop release from Cloudera and Hortonworks. Cloudera CDH and Hortonworks HDP users can now use orchestrator software to automatically secure and manage their Hadoop infrastructure.
Zettaset, chief technology officer Brian Christian, says the complexities of security and managing the Hadoop cluster are hampering the adoption of Hadoop. orchestrator software avoids manually configured processes, reduces the complexity of Hadoop, and brings enterprise manageability, security, and availability to Hadoop.