Cloud services are playing an important role in large data applications, especially for short-term tasks or applications that have large amounts of data stored on the cloud.
Cloud services are attractive to everyone. When someone says to you that their big data strategy is "store all the data in the cloud," you can't tell whether these people are visionary or simply repeat what the experts have suggested to them at the industry meeting.
There is no doubt that there is a huge overlap between the big data and the cloud paradigm. These intersections are so extensive that you can justifiably claim that you are using existing local Hadoop, NoSQL, or enterprise data Warehouse environments to handle large cloud-based data. Keep in mind that cloud services are widely understood to include "privatized" deployments in addition to the public cloud, SaaS (software as a service), and a multi-tenant hosting environment.
If you limit the definition of "cloud" to a public order service, then you'll get to the heart of the problem: it's about identifying large data applications that are better suited to public cloud or SaaS deployment patterns than local deployments, such as those involving pre-optimized hardware tools, or the local deployment of virtualized server clusters.
From another perspective, when will you be able to increase the scalability, flexibility, performance, cost-effectiveness, reliability, and manageability of large data when external service providers provide you with management services? Here are a few typical cases where large data is stored on public cloud services.
Business applications that have been hosted on the cloud: If you're like many organizations, especially small businesses, using cloud-based applications provided by external service providers, many of your source transaction data are already located on the public cloud. If you store a large amount of historical data on the above cloud platform, they may have accumulated to a large data level. For value-added analytics services offered by service providers or their partners (such as customer churn analysis, marketing optimization or offsite backup and customer data archiving), storing data on the cloud may be more meaningful than storing the data locally.
Massive external data sources that require a lot of preprocessing: if you are using social media data feedback for customer sentiment monitoring, local servers, storage, or bandwidth will not be able to meet the need for correlation analysis. This is a typical application case. In this case, you should take advantage of the social media filtering service provided by the large data service based on the public cloud.
In addition to the local large data capabilities of the tactical application: If you have specifically for an application to deploy a local large data platform, such as dedicated to unstructured data sources in the mass ETL (extraction/conversion/loading) operations of the Hadoop cluster, Then using a public cloud can better handle new applications such as multi-channel marketing, social media analysis, geo-spatial analysis, archive with query function, elastic data research sandbox, because the existing platform is not suitable for these applications, and the public cloud on demand service can be more powerful and more cost effective. In fact, if you need the ability to handle byte-level, streaming, and unstructured data as quickly as possible, a public cloud solution might be the only viable option.
Flexible configuration of a large, short-term analysis sandbox: if you have a short-term data research project that requires an exploratory data mart (that is, a sandbox), and the size of the sandbox is well above the normal size, then the cloud may be the only option you can or will be able to afford. You can quickly get cloud-based storage space and processing power during project startup. After the project is over, you can quickly release these storage space and processing power. I call this pattern "bubble Mart" deployment mode, which is especially good for cloud services.
If you've already done any of these, the strategic problem of large cloud-based data is not where to start. As the cloud-based data services mature, and the cost-performance, scalability, flexibility, and manageability are rising, the problem will be where you will stop. By 2020, with more and more applications and data migrating to the public cloud, the idea of creating and operating your own large data deployments may be as unrealistic as the design of your own server at this stage.
(Responsible editor: The good of the Legacy)