A database is a warehouse that organizes, stores, and manages data according to its structure, which is generated more than 60 years ago, with the development of information technology and the market, especially after the 1990s, data management is no longer just the storage and management of data, and transform it into a variety of data management methods that users need. There are many types of databases, ranging from the simplest tables with various data to large database systems that can store massive amounts of data, are widely used in all aspects.
The trend in big data applications is to have a large storage capacity with a converged hardware device and analysis package for performing data analysis. These applications are typically not used to process operational data; Instead, users analyze past product sales, forecast trends, and determine future customer purchase patterns by querying data. Big data applications are often not positioned as critical business systems, although they also support sales and marketing decisions, but do not significantly affect core businesses such as customer management, orders, inventory, and distribution.
Let's look at how to do disaster recovery for big data. the data is too large to be backed up
Disaster recovery best practices include the ability to restore important data to a consistent state in a timely manner at a specified time. This time is called the recovery time Objective (RTO), and it must be within the limits of the operational data that the business relies on (up to a few hours). Most companies think that backup and recovery of big data is not important. These include the following reasons.
Operational systems are more important. In the aftermath of a disaster, the highest priority is to recover data that supports operational systems. These systems include accounting, order entry, payment processing, wages, etc., which are necessary to ensure the normal operation of the company. After these data recovery, the second priority is to support the operation of these systems.
Big data is not a critical business system. Prediction and trend analysis may be an important means of marketing, but these analyses and their related queries and user reports are based on historical data rather than real-time data.
Big Data volume is huge, and a Big Data application can store dozens of times times as much data as the sum of all operational data. This is because Big data applications work on historical snapshots of the data. Ten years of historical data will contain a snapshot of thousands of days. What media does it back up on, how long does the backup take, and how large is the backup storage required?
The backup and recovery process requires I/O channel capacity. Migrating large volumes of data in a short period of time requires a larger capacity. Backup and recovery exhausts the I/O channel, and the only viable alternative is to install enough additional capacity to handle these tasks.
backup methods for Big data: If you are ready to recover all or part of your big data application during a disaster recovery plan, consider choosing the following backup methods. The most important thing to remember is that big data is primarily historical and static data. Operational data snapshots are extracted into a staging staging area, collated and transformed, and then loaded into enterprise data warehouses and big data applications. After this, none of them will be updated. This means that only one backup process needs to be run on each snapshot.
The most commonly used backup methods are: data replication . This is a common method of backup. When data is loaded into a data warehouse or big data application, they are transferred synchronously to a backup process, which loads a backup copy of the Big Data application. This process typically occurs in a disaster recovery site, and then it retains an up-to-date data in the event of a disaster.
virtual Snapshots . This is a hardware workaround that allows virtual backups of the entire system to be created on the storage media. Database writes are interrupted for a short period of time, and the hardware that manages the storage subsystem performs internal copy operations on all files. This replication process can be very fast and sometimes complete in seconds. After the replication is complete, the database management system again allows the write operation to be performed. Snapshots provide an ultra-fast recovery time, which is assumed to be recoverable to a specified point in time at which the snapshot was created. In addition, reverting to a point in time when a non-snapshot was created requires some way to apply all the latest database changes (log captures) to the snapshot. Another problem is storage capacity. The snapshot may require doubling the currently used storage. Also, when a disaster occurs, a snapshot of the time is the current data, but another snap area must be assigned to cope with the new disaster event.
local and remote replicas. This is a classic method that consists of backup-to-disk and array backups that contain physical disk drives or databases. DBAs use vendor tools to access data that is typically stored as a compressed private format. These backups are executed and loaded quickly because they are in the internal data format.
It takes time, money, and resources to summarize big data, whether deployed or used. Many companies are eager to get a return on these big investments, and queries and reports provide valuable insights to help implement decisions, cope with change, and gain revenue. Big data applications will eventually become critical business systems. Before doing so, make sure that your IT infrastructure is able to back up and recover the data.
The difference between big data and database, backup and recovery of big data