Unlock the code for large unstructured data processing and analysis

Source: Internet
Author: User
Keywords Large data unstructured data
Tags analysis based big data business code computing data data mining

Ufida UAP Data platform has the ability of large data processing and analysis, it mainly relies on unstructured data processing platform Udh (UAP distribute for Hadoop) to complete. UDH includes Distributed file system, storage database, distributed analysis and computing framework for Distributed batch processing, real-time analysis query, stream processing and distributed batch processing based on memory, and distributed data mining.

In today's big data, companies can not blindly follow, but should understand why big data is so hot, why pay attention to it. One important reason is that large data is different from ordinary data, it adds a lot of semi-structured data and unstructured data, and its order of magnitude and value are not the same.

The IDC report showed that unstructured data accounted for between 80%~90% in the current 1.8 trillion GB of large data and would grow by 44 times times by 2020. If the structure of data in a detailed way to record the production of business transactions, then unstructured data is the key to grasp the lifeblood of enterprises, the information reflected by a number of enterprises to improve the opportunities for efficiency. Therefore, only by solving the problem of unstructured data analysis can we effectively excavate the value behind these data and drive the enterprise value promotion.

When it comes to large data-related technologies, many insiders will associate them with Hadoop. Because Hadoop is a framework for distributed data and computing, it is good at storing a large number of semi-structured datasets. It is suitable for mass data processing on large-scale clusters, which makes it easy for programmers to write distributed parallel programs and run them on computer cluster to complete large-scale data calculation.

Ufida based on Hadoop open source product system released Udh products, and around the Udh developed a series of enterprises to solve the large data application requirements of management tools and integration, development, presentation components. Enable enterprises to achieve large-scale structured, unstructured data centralized, integrated analysis and processing needs.

Dedicated to large enterprises and organizations to provide computing platform Ufida UAP, including the development platform, integrated platform, dynamic modeling platform, business analysis platform, data platform, lightweight platform, cloud management platform, mobile application platform, web platform, RIA platform, social platform and other areas of products. The data platform has the ability of large data processing and analysis, it mainly relies on unstructured data processing platform Udh (UAP distribute for Hadoop) to complete. UDH includes Distributed file system, storage database, distributed analysis and computing framework for Distributed batch processing, real-time analysis query, stream processing and distributed batch processing based on memory, and distributed data mining.

Udh Product Architecture

The Udh in the Ufida UAP data platform is to help the enterprise to save most of the cost of cluster management, service monitoring and deployment from the perspective of a consolidation solution. Using UDH, the entire system's human input will be reduced to 1 to 2 engineers can go to the operation of a fairly large cluster. Usually even if the enterprise by the larger technical team and strong technical force reserves, also need months or even more than a year, the use of UDH, can shorten the time to one months or so. This is important for businesses because it means smaller costs and faster return on investment.

UDH platform is optimized based on open source Hadoop,hive,storm, spark, etc., which covers large-scale unstructured data integration, storage and analysis calculation. Integrated yarn supports a wide range of distributed computing frameworks (MapReduce, Spark, Storm, etc.) while providing a more efficient storage structure. Single cluster can reach more than 100 units, can manage PB level data.

The Udh of Ufida UAP has unique technical advantages in real-time calculation and analysis of unstructured data.

First, multi-cluster real-time computation. Udh can reach a second-level delay, and the exception can be detected in seconds. The amount of data can be processed is large, the total flow rate reaches at least 100gb/s. Udh can provide 5 clusters, 900 nodes, 2-4 slot per node. The free resources of the cloud storage can be reasonably exploited. Udh can be real-time log statistics, Web analytics, image processing, face recognition and so on. Processing about the amount of data per day 120TB,200 billion;

Second, real-time analysis. Udh is a flow-type processing, without saving a large number of data batch processing. The data is in memory and does not pass through the disk. Udh is very flexible, using DAG calculation model, can increase or decrease according to business needs bolt combination calculation process. Udh supports large throughput, single cluster one topology each bolt 10 concurrent, processing 10gb/s. Abnormal access can be detected in 10 seconds;

Third, unstructured data ad hoc analysis. Udh supports the SQL-92 features of most hive query Language (HIVEQL), including queries, continuous and aggregated, and supports file formats such as text, Sequencefile, Rcfile, Avro files, and parquet. Supports compression algorithms such as snappy, GZIP, deflate, and bZIP. Uses a common query interface: Odbc/jdbc,hue beeswax and Cloudera Impala the query UI. Provides command line interface and Kerberos security authentication;

Four, memory analysis calculation. Udh is 100 times times faster than a Hadoop mapreduce if it is fully computed in memory, or more than 10 times times faster if you include disk data. Supports cyclic data flow and memory calculations. You can use the programming language Java,scala or Python. It can use more than 80 functions that it contains to develop distributed applications quickly. Provides a shell interface for Scala and python.

In addition, unstructured data has a security and stability risk. What's particularly troubling is that in the HA section, it is difficult for enterprise users to deploy HA if they are not familiar with the development, evolution, and technical architecture of the whole Hadoop product system. The security section, how to realize the data recognition, but also quite complex, very easy to make mistakes in some settings and time steps. Through the use of Ufida UAP Udh, these things are all made automation, the user basically will not feel any problems.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.