The biggest effect of cloud computing is that it does not have to buy a large number of server clusters, or hire servers to handle large data, to reduce costs when doing large processing. As a heavyweight distributed processing open source framework, Hadoop is already known in large data-processing areas, and many companies want to use Hadoop to plan their own future dreams of data processing. From Oracle, EMC to Microsoft, almost all of the tech vendors have announced their new strategy for Hadoop over the past few months. Today, Hadoop has become one of the most important factors in the IT marketplace that attracts customers.
The growth of Hadoop has been supported by individual developers, start-ups and large businesses. This also gives users the potential to use Hadoop for a long time. However, due to the continuous improvement of the code by different vendors, the product can not operate with each other. The current state of Hadoop is very similar to Android.
Most companies don't really understand big data
The advantage of "big data" is not just scale, but performance, regardless of the number of dimensions of the data set. This is important for direct analysis, such as evaluating a customer's behavior on the site to better understand what support they need or what products to look for, or how the current weather and other conditions affect delivery routes and schedules. This is where server clusters, high-performance file systems, and parallel processing work. In the past, these technologies were too expensive to be used by large enterprises. Today, virtualization and business hardware significantly reduce the cost of using these technologies, making "big data" available to small and medium-sized enterprises.
The smaller companies also have another way to use the "Big Data" analysis-the cloud. Large Data cloud services are emerging, providing the platform and tools for rapid and efficient execution of analysis.
Gemini's CTO Joe Coyle says big data will be a future trend, but many companies don't understand what that means. The concept of cloud computing and big data is the most frequently asked by customers, and the industry is making a different sound today in the heat of the Hadoop technology. Some manufacturers point to the notion that companies are too hyped up about Hadoop. Building and maintaining the complexity of the Hadoop cluster requires the support of relevant practitioners ' expertise, and the cost of hiring related personnel is expensive. JP Morgan Chase general manager Larry Feinsmith has said that they are not only willing to hire qualified professionals, but also provide a 10% higher than the industry generous treatment.
Not all industries should deploy Hadoop
The manufacturing business itself and product lifecycle management typically create large amounts of relational and non relational data for the ERP and inventory systems in the manufacturing industry. Companies want to have a perfect large data collection and analysis solution, but not all businesses must immediately switch to Hadoop.
The GE Intelligence Platform department has built testing software to collect data from complex manufacturing. The move has also pushed its own Proficy historian 4.5 software to develop faster. Proficy historian promises that the method provided is more reliable than using Hadoop. Brian Courtney of GE's enterprise data management Department says the company's out-of-the-box solutions offer an environment comparable to Hadoop, while the advantage of Hadoop is that they are cheaper and better steered than Hadoop.
GE has a large number of historical data, most of which come from the production and testing stages. The Proficy historian is used to process relational and non relational data generated by product manufacturing and testing like waveforms, and can be leveraged to anticipate possible problems.
For example, when the turbine engine is started, the Proficy historian can detect and view the corresponding electronic signature. What happens when there is an exception when a normal startup and load test occurs? Did you have a similar situation before? You can also see how long it took to resolve this failure when you found a similar system failure, so that the manufacturer chooses the priority for them to exclude errors. Proficy historian can also compare past historical data to see if there are any similar problems in the past, and to generate reports ahead of possible other anomalies. Brian Courtney said.
The new version of the Proficy software is designed to handle more large data. Earlier versions of Proficy support 2 million tabs, and today Proficy has supported up to 15 million tabs.
Amazon deploys HPCC on its cloud computing platform
Amazon has tuned the running software on its cloud computing platform to HPCC. HPCC is the LexisNexis company launched an open source data processing program. The move also allows the HPCC system to replace today's popular Hadoop ideas further.
Armando Escalante, CTO of the HPCC system, said in September that although HPCC is not yet able to attract large companies and governments like Hadoop, it has also prompted HPCC's developers to develop ecologically as if they were hadoop.
There are also some analysts who are bullish on the HPCC system, but it will take a long way for the HPCC community to be as vibrant as the Hadoop community. Now Amazon has brought a good example of HPCC in AWS or the Cloud, HPCC supports AWS's elastic MapReduce. Amazon says the future will bring more surprises.
From a technical point of view, Amazon Web Services now runs only part of the HPCC processing of large data,--thor data refinery Cluster. The platform also includes another way of processing data Roxy Rapid data IBuySpy Cluster. Roxy, as a data warehouse and data query layer, acts like the Apache Hive and HBase.
Both HBase and hive in the Hadoop project have their own language. The HPCC system platform uses the language known as ECL (Enterprise control Language).