What is a snapshot
A snapshot is a collection of meta-information that allows an administrator to revert to the previous state of the table. A snapshot is not a copy of a table, but a list of file names, so data is not copied.
Full Snapshot recovery refers to reverting to the previous "table structure" and the data at that time, and the data that occurs after the snapshot is not recovered.
the role of snapshots
The method of backing up or cloning tables that exist in HBase is to copy all the hfile in HDFs using the Copy/export table or after closing the table.
Copying or exporting is a series of tools that call MapReduce to scan and copy tables, which can have a direct effect on regionserver. Closing a table stops all read and write operations and is often unacceptable in the real world.
In contrast, hbase snapshots allow administrators to clone a table without copying data, which has minimal impact on domain servers. exporting a snapshot to another cluster does not directly affect any server; the export is only a cluster of data synchronization with some additional logic.
Snapshot Benefits
Export snapshots and copy/export tables in addition to better consistency, the main difference is that the export snapshot is done at the HDFs level. This means that Hmaster and domain servers are independent of the operation. Therefore, there is no need to create cache space for unnecessary data, and there will be no scanning process. Because of the GC pauses caused by a large number of object creation, the main performance impact for HBase is Datanode additional network and disk load.
Application Scenarios
1, restore from the user/application exception.
2. Recover/Restore from a known security state.
3. View previous snapshots and selectively merge different write product environments.
4. Save the snapshot when the main application is upgraded or revised.
5. Review and/or report data at specified times.
6. capture Monthly data according to regulations.
7. Generate the end of day/month/quarter report.
8, application test.
9, through the snapshot simulation of the production environment structure or application changes, test completion can be discarded.
For example: Generate a snapshot, build a new table with the contents of the snapshot (original structure + data) and modify the new structure, add or remove columns, and so on. (Original table, snapshot, and new table remain independent of each other)
10, reduce work pressure.
11. Generate snapshots, import to other clusters, and run MapReduce jobs. Because the exported fast is the HDFS level, it does not reduce the efficiency of the HBase master cluster as if it were replicated.
Snapshot Operations
To generate a snapshot:
This operation attempts to generate a snapshot of the specified table. If the cluster performs operations such as data balancing, partitioning, or merging, it may cause the operation to fail.
To clone a snapshot:
This operation constructs a new table using the same structure data as the specified snapshot. The result of the operation produces a fully functional table, and any modifications to that table will not affect the original table or snapshot.
To Restore a snapshot:
This operation restores the table structure and data to the state when the snapshot was generated. (Note: This operation will discard any changes after the snapshot was generated.)
To Delete a snapshot:
This action removes the snapshot from the system, frees up the disk space that is not shared, and does not affect other clones or snapshots.
To export a snapshot:
This operation copies snapshot data and metadata to other clusters. The operation will only involve HDFS and will not have any contact with hmaster or regionserver, so the HBase cluster can be shut down.
Demo
Verify that the snapshot license is turned on by checking hbase-site.xml
hbase.snapshot.enabled
Whether it is set to true.
1. Get a snapshot of the specified table using the snapshot command (no file copy is generated)
hbase>snapshot ‘tableName‘,‘snapshotName‘
2. List all the snapshots, using the list_snapshot
command. The snapshot name, the source table, and the date and time of creation are displayed
hbase>list_snapshots
3. Delete snapshot using deleted_snapshot
command. Deleting a snapshot does not affect the cloned table or the resulting snapshot.
hbase>delete_snapshot ‘snapshotName‘
4. Use clone_snapshot
the command to generate a new table (clone) from the specified snapshot. Since data replication is not generated, the final data used will not be twice times that of the previous one.
hbsse>clone_snapshot ‘snapshotName‘,‘newTableName‘
5. Use restore_snapshot
the command to replace the specified snapshot content with the current table structure or data;
hbase>restore_snapshot ‘snapshotName‘
6. Use the Exportsnapshot tool to export an existing snapshot to another cluster. The export tool does not affect the load on the domain server, it only works at the HDFs level, so you need to specify the HDFs path (the hbase root of the other cluster).
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshotName -copy-to hdfs :///srv2:8082/hbase
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
HBase Snapshot (Snapshot) technology