E-commerce, social media, mobile communications and machine to machine data exchange all produce terabytes and even petabytes of data, they are the enterprise IT departments must be stored and processed. Mastering fragmentation best practices is an important step in the cloud planning process when users are dealing with data facing cloud computing databases.
Fragmentation is the process of splitting a table into a manageable size disk file. Some highly resilient key-value data stores (such as Amazon simple DB, the Google App engine data store, or Windows Azure tables) and document databases (like Couchdb, MongoDB, or ravendb) can handle tables vertically The large data in. MongoDB built-in automation fragmentation features, RAVENDB will also be added in the near future. Automated fragmentation automatically balances fragmentation and eliminates the need for DEVOPS teams to monitor the process. Automated fragmentation of the MONGODB database is not as simple as it might seem, as Todd Hoff posted in a blog post about slicing problems.
However, there is still a small subset of key values and the document (called NoSQL) database lacks the transactional data consistency functionality provided by the traditional relational database management system (RDBMS). You can extend the RDBMS (vertically) by throwing money at the memory, the processor, or both. You can configure 256GB of RAM to high-end commercial servers, but essentially adding more CPU cores is impractical. If your database is in the cloud, you will be limited to memory and processor, which depends on the price list of the cloud vendor.
Extending the RDBMS (horizontally) leads to an essentially technical challenge. In August 2009, Morgan tocker A detailed blog about why you wouldn't choose a fragmented MySQL database. Tocker that you might need a partitioned database table when you encounter the following problems:
Too large working set: Your work set, composed of frequently accessed and updated data and indexes, not suitable for RAM installed on the local server, not suitable for hardware budget or the number of hardware the cloud service provider can achieve. The solution is fragmentation.
Excessive write frequency: Your database I/O system cannot handle the number of writes per second caused by local or cloud-based server requests. The solution is to detach read operations to read copies, which may require fragmentation to implement decentralized I/O loads to multiple database servers.
AWS's relational database service for MySQL provides its largest high-memory Quadruple Extra Large DB instance, which contains 68GB of memory and 26 ECUs. These ECU are composed of 8 virtual cores, equivalent to each 3.25 ECU has a core. It is priced at $2.60 per hour (1872 dollars a month). According to the AWS quotation, an ECU delivers performance equivalent to the performance of the 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. The AWS RDS DB instance can choose from 5GB to 1TB of related storage performance at a price of USD 0.1 per GB per month. Data transfer costs are for each GB0.12 dollar output, plus an I/O request per million 0.1 dollars. Amazon has cancelled the cost of data entry since July 1 and has lowered the cost of data output.
You may also need additional storage costs to implement a read copy to achieve a highly available commercial server. Fortunately, Scalebase provides a third-party database Load balancer application to automate fragmentation and read and write separation for MySQL running in Amazon EC2 or Amazon RDS.
In the August 2009 issue of the "fragmented Mess" blog, Simon Munro a general introduction to the fragmentation of the relational library, specifically discussing Microsoft SQL Server 2008 's SQL Azure custom cloud implementation. At the time, SQL Azure had a maximum database size of 10GB, which is now 50GB.
Scott Guthrie, vice president of the new company of Microsoft's Azure Application Platform team, spoke at the Norwegian Developer Conference (NDC) 2011 Conference on June 9 this year:
“...... We also have automated fragmentation as part of SQL Azure, which means that from an extended perspective, we can handle ultra-high workloads and achieve any type of load balancing and scaling for our users. ”
Today, SQL Azure's support for a database is 50GB of relational storage, but you can have any number of databases.
Automated fragmentation via SQL Azure federations is currently in the Technology Preview (Community technical Preview) phase, and it is hard to tell from Guthrie that fragmentation is expected to "support gigabytes or terabytes". In addition, SQL Azure federations also promises to do a good job of schema migration. SQL Azure contains a major and two secondary backup for high-availability, services are available on-call, 1GB to 5GB (Web version) of 9.99 dollars per gigabyte, and 10GB to 50GB (commercial database) per 10GB for 99.99 US dollars a month. Monthly fixed costs do not include data transfer costs, in North America, Europe, the data center per GB output cost of 0.15 U.S. dollars, in Asia per GB output cost of USD 0.2. Microsoft has canceled data entry fees since July 1. Unlike Amazon RDS, you do not generate I/O costs for SQL Azure.
Microsoft has yet to disclose the CPU and memory instructions for SQL Azure, but the company says it is the same size as the database. You can get a schedule on the Cihan Biyikoglu blog about the end of this year's SQL Azure Federations commercial release.
Google announced in its May I/O 2011 meeting that the commercial release of the Google App Engine beta will be combined with an RDBMS and will be included with the commercial version of Gae, but the company did not disclose details about performance and price.
Unless Google magically implements the high scalability of the RDBMS, it's ready to have a fragmented relational database to handle the big data in cloud computing.