Open source tools to solve large data

Source: Internet
Author: User
Keywords They open source big data these

Open source code platforms for large data are becoming popular. In the past few months, almost everyone seems to have felt the impact.

Low cost, flexibility and applicability to trained personnel are the main reasons for open source prosperity. Hadoop, R, and NoSQL are now the backbone of many of the enterprise's big data policies, whether they use it to manage unstructured data or perform complex statistical analyses. ”

It's almost impossible to keep up with it: SAP AG recently released a new product, SAP BusinessObjects Predictive analytics, software that integrates the open source R language algorithm, which is widely used in academic communities targeting advanced statistical models.

A few weeks ago, Teradata announced that its new integrated analysis portfolio would include R features and a connection to GeoServer, a Java based Open-source geo-location platform. Countless other companies are rushing to build links to Hadoop.

Widespread adoption of fanatical innovation

James Kobielus, an analyst at the Forrester Research firm (now the senior project director at IBM's Big Data Analysis solution product marketing), wrote in an e-mail that "the open source approach has the most widely used momentum and the most frenetic innovation".

But what's the hurry?

First, Kobielus explained that just as the scope of open source products from Mozilla to Android has won widespread acceptance in the IT community after the birth pains, open source data storage and analysis software is now ripe ("no longer a risky bet two years ago", as he said).

Second, Kobielus writes, platforms like Hadoop, R, and NoSQL have an advantage over specialized software because they evolve faster. They are also being developed and improved for various groups. He predicts that open source will soon dominate large data markets

"With the footprint of closed source software shrinking in many data/analysis environments, many existing suppliers will develop their business model in the direction of open source," he wrote, "and will also increase professional services and system integration to help customers move to open source, cloud-based Analytics, most of which focus on Hadoop and R.

For example, Forrester sees Hadoop as the core of the next Generation Enterprise Data Warehouse (EDW) in the cloud, and sees R as the main code base for future waves of large data development tools. We also expect a wide range of open source NoSQL databases and tools to become a rich alternative to closed content analysis products. ”

Red Hat model

Different enterprises approach open source integration in different ways. Some businesses, like SAP, chose to use Hadoop or R's functionality to develop products using their own in-house experts, while their businesses, like Teradata, handed over a lot of work to the Revolutionary Analytics Company (Revolution Analytics Inc.) It's a lot like Red Hat Company's Big Data company. The company offers a business version of R for businesses, as Red Hat does for Linux.

"In particular, we let it run on a real big dataset," says David Smith, marketing and community vice president of Revolution Analytics, a small company that stands among big data giants, a company that specializes in modifying different business processes. ”

Using open source in a product is a way for companies to highlight themselves in the marketplace, Smith said. "By definition, it means you don't do what your competitors are doing," he said. ”

' Open source technology is a natural choice for companies with advanced, scientific attitudes to large data analysis, ' says Smith. "Those companies have a little bit of data science culture, and the exploration and curiosity of data is really attracted to open source technology because they are so flexible and provide them with these different ways to think about the data and explore different things." ”

"Big companies will benefit the most from open source commercial software packages so they can stay focused on their particular line of business," said Scott Gnau, president of the Teradata Lab, a partner in revolutionary analysis.

"There is a lot of value being created in the adoption of new technologies, developed in Hadoop and mapreduce environments, but as an enterprise-class software with reliable versioning and reliable scalability and availability of support."

"It has to be packaged and reliably into the mainstream because most companies don't want to focus on software development," he said.

EMC Greenplum's product marketing manager, would Davis, agreed. Larger companies, he says, need more stable and reliable avatars of open source large data platforms, whether they add their own improvements or rely on others to help them.

"Many enterprises ... EMC's traditional customers, Fortune 500 companies, and the like, really need the deployment of this technology for businesses to meet stringent service-level contracts (SLAs) and always available online, "he said."

Some early Open-source technology adopters developed expertise to go it alone, but the "second wave" of companies eager to build and run quickly. They may not have their own staff to do development work.

Introducing Data scientists

There is a real demand for big data right now, and companies are realizing that running open source platforms is the best place to attract trained talent. Open source technology, especially R, is widely used in academia.

In addition, these data scientists can make better use of open source platforms. Imran Ahmad, a data scientist, has developed his own grid computing algorithm, known as Bileg's Hadoop competitor, based on the Open Source Globus Toolbox (GT4). ' The fundamental advantage of the open source platform is that people like him can see its fundamental mathematical basis, ' said the managing director of Cloudanum, a Toronto company that develops data analysis technologies for cloud environments.

"If you're on open source, you can dig down and see why I get these results and why these results are optimal," Ahamad said.

Proprietary data analysis software works best most of the time, he adds. But when an "unusual scene" arises, it makes you unable to trust your results. "They're going to deviate from what you're looking for," he said. "It's a terrible situation," he said.

Unsurprisingly, there is also a shortage of intelligence in the context of statistical modelling, especially in other sectors such as financial institutions that are lookout.

"They've hired a lot of people off campus to the data Science department or the Research and development department and the modeling department," Smith said, "and they found out that these people have studied R, not SAS." ”

So it's no surprise that smart people with statistical modelling backgrounds are lookout, especially when the financial industry hires them heavily.

"We offer Greenplum consulting business," Davis said, "This is our data science team, these people are ph. D, and are experts in various industries and related industries." I have intelligent and diligent people, and frankly, these people are working with clients to make their data work. ”

"Companies that need to perform complex tasks such as predictive analysis are undoubtedly searching for college talent," said Jason Kuo, SAP Group marketing Manager. He said SAP's new product combines a friendly user interface and drag-and-drop functionality that will make it easy for data scientists to transform roles into the corporate world.

"These people carry their r expertise, r background, and look for tools related to R," he said. "What's interesting now is that in an academic environment, for whatever reason, whether it's cheap or familiar, they're more likely to use R without a GUI than a graphical interface." And now they're going into the corporate world where they're demanding more, the project is changing more quickly, and maybe tracking roi and so on.

"The company can say ... What do you need to be more successful? How can we make you more efficient? And they prepared the budget for these statisticians in the past. ”

If you can't beat them,

Paul Kent, vice president of platform development at the SAS research company, works for a company that is often seen as the antithesis of large data areas, and has developed proprietary data analysis algorithms to replace Open-source languages such as R.

In a way, Kens says, SAS has turned the open source community into a competitor. New technologies can be developed very quickly in open source environments, and his companies may need more time to study them before translating technology into market-ready product features.

"We need a little more time to respond to the technology and test all the different corners and arrange the way you might use it." So our response may be a bit slow.

However, he said, SAS has an advantage in large technical support markets and has the expertise to apply technology to different institutions, whether it is a retail business, a bank, or a medical institution. The advantage of SAS lies in "the application of mathematics in special fields." Kent said.

At the same time, he says, SAS grasps the trend and gives its customers the same open source options. Kent said SAS had "built a bridge to r" as it did with Hadoop. Kent says that every time the open source community has good ideas, SAS will pay attention.

"In the long run, it is useful to build bridges or interfaces to such ideas, rather than trying to pretend it doesn't exist." ”

Original link: http://www.chinabi.net/Article/binews/201209/2227.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.