At the heart of today's critical analysis progress is large data, which is seen as a collection of structured and unstructured data, mostly from Web applications, server logs, and social media sites. While large data applications are often associated with fast-growing organizations that can quickly feed back real-time data, large data is not necessarily synonymous with real time.
Industry experts point out that large data does differ when it is in motion with large data at rest. External help is necessary to push it forward.
While MapReduce and Hadoop are modern, distributed, and parallel, both of these open source technologies are closely linked to large data and are batch-oriented. This can surprise some people, but they are often when large data is resting, that is, unless they are accompanied by fairly advanced middleware. Intrinsic data grids or databases, complex event processing (CEP) engines, and low latency messaging middleware are several types of application infrastructure software that takes on the challenge of driving large data campaigns like architects.
Fast data is not just a technology but a series of methods, according to Tony Baer, an analyst at Ovum Research Group in the UK. Fast data contains high-performance, low-latency CEP applications, and data streams are processed in memory to detect ambiguous and complex patterns, Baer wrote in a blog post earlier this year.
As users become more and more familiar with large data, the need for more advanced messaging middleware types along with this large pool of information will grow, according to Roy Schulte, a Gartner analyst. Gartner is of the opinion that CEP is important for large data because it can quickly process upcoming data by temporarily storing information in the computer's main memory.
Measure the extensibility of the system
Large data represents a typical computer I/O problem in which a large number of "input" and "output" issues are key bottlenecks in performance. Generally, there is a tendency to dispose of this problem by discarding the hardware and not necessarily having a better effect. The Hadoop framework is an example.
"People talk about extensibility, but they don't talk about Hadoop performance," said Michael Kopp, a technical strategist at Detroit Compuware's performance management team. "Another thing I remember the most about is people's assumptions, because it's big data, so it's fast, big data." If you look at Hadoop, you see it as a batch-oriented process. It's fast, but it's never real. ”
Just because it's open source doesn't mean it saves the company money.
"People are very tangled. Hadoop is really inexpensive and difficult to manage, and many jobs run at different rates. The more hardware you throw away, the more difficult it will be to manage, "he said, hinting that some nosql and other systems in the big data market might look like CEP-they're heavy on speed.
"The CEP system will play an important role throughout the discussion," he said. While he sees the Hadoop and NoSQL development teams are working to improve query performance and optimize the database, he believes they are rarely optimized to be efficient to accommodate the way applications actually use data.
Access to High-performance Messaging
Low latency messaging is emerging as another middleware approach to speed up large data. While Wall Street financial applications remain the primary use case, high-performance messaging is positioned for wider use. Manufacturers offer such work including ibm,informatica,prismtech,rti,red Hat,software Ag,solace Systems,tervela,tibco and others.
Large data applications, using sensors or so-called IoT, represent use cases that require low latency middleware outside of Wall Street applications. Such software has been used to analyze applications covering aviation, defense, power companies, and even parking systems, according to Angelo Corsaro, Prismtech's chief technology officer. Corsaro monitors the work of Opensplice DDS, Opensplice DDS supports the Data Distribution Service (DDS) real-time system of the object Management Group.
"Applications use Opensplice to distribute and cache high-volume, fast-changing data," he told Searchsoa.com in an e-mail. "The line between some technologies is becoming blurred. ”
"In a sense, Opensplice provides some CEP functionality," he said, noting its content-based subscriptions, which can be queried as continuously as in the CEP domain.
"Regardless of the overlap, technology will continue to be specialized and integrated," he added.
Of course there are CEP elements that can distinguish between large data and their use. CEP tends to work with small data sets, said Merv Adrian, a Gartner analyst. Still, he looked at the various techniques in their way and would speed up large data as we now know it.
"So far, big data has not yet become a real-time shopping mall. New approaches have emerged, but as they say, some combinations are needed, "Adrian said. "In spite of hindsight, Hadoop is now a toolset. Look back, it's your business intelligence. ”
Real-time ability is what people expect from big data, Adrian says. "It will be realized soon." But there is some pressure, "he said.
Large data work already represents a completely new architecture, which is largely dependent on the outcome of the project, if compared to the current scenario. People aren't asking for trouble, adding a new architecture to see what they've done over the past year, Adrian said.
(Responsible editor: Schpeppen)