Exposing cloud data and analyzing business intelligence value
Source: Internet
Author: User
KeywordsLarge data providing cloud data analysis business intelligence exposing
The public cloud not only changes the price structure of computing and storage, but it also expands the scope of what enterprise it can do. This is especially true when working with large datasets, where there is no practice for flexible computing and storage access.
The loose definition of "big Data" is a dataset that is too large to be processed with traditional data management techniques and infrastructure. Detailed server logs, click Stream data, social network data, and mobile device data are complementary to transactional data types in the Data warehouse and in business intelligence systems. In addition, the public cloud data repository and Third-party accelerators also provide large dataset topics, from Twitter streaming and Meetup Bowen to economic and census data.
Merging these data sources allows for more detailed and sophisticated analysis. Get more insight into customer preferences, not just tracking product purchases, but how customers can browse through your site and how long they will be browsing for different products.
Large Data search: three source
Before you can handle large data, it is important to determine which type of data you are dealing with. Large data sources are divided into three broad categories: internally generated data, dataset markets, and third-party data generators.
Generating large data internally is often a by-product of IT operations. Including network traffic, click Stream data and Application log. In the past, companies captured limited information about important events, such as customers who bought things. Now we can capture more and more important information and use your business applications to analyze low-level details about the customer's interactions. By combining these details with data mining algorithms, you will find more insight, such as interface availability, the pattern associated with low-margin transactions, or unexpected customer type clustering.
Data set markets, such as Infochimps, Amazon Web Services (AWS), and Windows Azure Marketplace, provide a wide range of dataset access to supplement your internal data. If you are interested in prescription drug use, retail data, trading data, or a wider range of other topics, you can find data in these data markets. Many data markets provide cloud data analysis, so you can work directly with virtual machines in the cloud.
Third-party generators are organizations that focus on collecting and providing data to customers or for public use. Both the federal government and the European Union generate large numbers of demographic, economic and public health data. Private companies, such as Hoover, also offer value-added services, such as providing market and risk management data to customers.
Enterprise tools mining Large data potential
It is difficult to combine a large number of unstructured and semi-structured data into relational databases. Cloud data analysis tools provide enterprises with all the specifications to analyze this data.
If the data structure is good, you may want to continue to do relational databases, such as Oracle or Microsoft SQL Server, which are available to AWS, Microsoft Windows Azure, and other cloud providers.
When you start processing billions of rows of data, it's time to consider Hadoop or Google BigQuery. AWS has a Hadoop service called Flex MapReduce, saving time to install and configure the Hadoop cluster. Hadoop is well suited to package-oriented analysis, but BigQuery is more suitable for interactive analysis. BigQuery uses class SQL query language and supports Tableau software visualization tools, which are two important considerations for professional analysis.
Data consolidation and Management
In many tasks of data warehouse for large data analysis, it is associated with extraction, transformation and loading (ETL) operations. Coupling entities across multiple datasets is a challenge when datasets use unique identifiers, and data formats require table conversions.
Focus on the differences in the aggregation level. For example, some of the data is aggregated at the daily level, and other data can only be viewed as a normal level of tracing.
The most important thing to know is that the cost of data transmission is usually accompanied by large data. If possible, use virtual machines in the same cloud where you store the data. When dealing with Google BigQuery, remember to pay for the amount of data processed by tidal, so just check the rows and columns you need.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.