Exadata FireWire Rescue: 10TB-level Data Repair Classic case Detailed!

Source: Internet
Author: User

1:30 A.M., the dim phone ringing, a exadata serious failure ....

From previous post (5-hour data evaporation | | 24-hour service downgrade, Salesforce's encounter is only a case? Not far, we met again and again data rescue work. Coincidence with Salesforce, everyone is running on Exadata, unfortunately Salesforce lost 4 hours of data (follow-up did not see the press release, and whether to recover part of the) business stalled, I encountered more trouble today.

Recent Exadata fault more, the more important is the hardware life cycle, X2 from September 2010 began to release the line, until now nearly 6 years, even if the traditional "high-end" minicomputer also to the time of the downline. Remind friends who use Exadata to make a backup, or you may have to experience an unforgettable rescue experience. The problem happened incredibly, and it was taken for granted that the details were not said. In short, worse:

The diskgroup that holds the data file cannot be loaded (mount), the Celldisk state is unknown, the header of some asmdisk is invalid, even the block that it automatically backs up is invalid, there is physical damage to the disk, The mirror of the physically damaged disk has also failed. Close to 10TB of data, think also headache it. After the specific data rescue work, or remind the use of asm/exadata friends, at least build a dataguard bar, just build Rong also did this share, hurriedly to read.

Since the problem is very tricky, we have made the following scenario:

    1. Extracting the database files
    2. Try Open
    3. If you fail again Dul

To extract a database file (control file, data file, log file) from a disk group that is not loaded, you need to use AMDU.

Amdu:oracle is a source data dump tool developed for ASM, all known as ASM Metadata dump Utility

Specific steps to extract:
    • Find the startup parameters (including control files) from the alert log and edit them into a new parameter file/tmp/pfile
    • Find the location of the control file from the Pfile and extract it with AMDU
    • Use the extracted control file to mount the database.
    • Find all data files from the Mount library, there are 2 possible formats
    • OMF format (data file with Oracle Auto-generated numbers)
    • Custom format (hand-cheap), deal with some trouble
    • Log file ditto processing

Extracting Data files

First step: Draw the control file to find the control file location from the alert log first:

Control_files string +data/exdb/controlfile/curren t.266.278946847955,

11g start AMDU does not require compilation to be used directly. To the/data file system, start operation

Amdu-diskstring '/o/*/* '-extract data.266

In the current directory will generate a DATA_266.F file and a report.txt file, DATA_266.F is the control file.

Step two: Find data files and log files

If you have a backup of the pfile the best, if not, from the alert log to find the start of the initialization parameters, really no, manually edit a line, including sga_max_size,db_name,control_file these parameters.

Then boot the database into the Mount state to find the data files and log files:

Select name from V$datafile;

Select member from V$logfile;

Good luck, that's all. (OMF format):

+data/exdb/datafile/system.256.278946847955 +data/exdb/datafile/sysaux.257.278946847955 +DATA/exdb/datafile/ undotbs1.258.39804295139 +data/exdb/datafile/users.259.48049295141

Bad luck, maybe there is such a (custom format):

+DATA/EXDB/DATAFILE/USERS_2013084.DBF +data/exdb/datafile/tbs_jifen_cx_0123.dbf

For the OMF format, follow the extraction of the control file, pumping:

Amdu-diskstring '/o/*/* '-extract data.256

For a custom format, extract the metadata from the <diskgroup>.6 and then find its corresponding number

    1. Amdu-extract data.6-diskstring ' o/* /data ', generating data_6.f file

for ((I=1; i<15; i++)

Do

kfed Read DATA_6.F blknum= $i |egrep ' Name|fnum ' >>aa.out

Done

All data files are extracted in the OMF format.

It is worth saying that we encountered a 3T bigfile,extract consumed for nearly 24 hours =. --nfs file system is very slow to hang over = =

Finally to all the files with dBV do a check, there is no physical bad block.

Try Open Database

When it comes to this step, it's actually similar to the usual database recovery. We also encountered ORA-1555 and ORA-704 errors in open.

Record the parameters and events we use.

Event

Implied parameters:

Here is more annoying is rollback segments not easy to determine, because you are mounted state database, even v$rollname can not query.

There are two ways to solve this:

Method one, use strings to go to the system file to catch.

Method two, with Dul/aul/odu/gdul and other similar tools. Relatively speaking, this method gets some accurate

The resulting sys_undo.dmp is imported into the normal user, and the rollback segments (restore segments) with status 1 and 2 are removed and placed into the empty 2 parameters above.

Open may also be reported ORA-1555, need to push the SCN to upgrade mode open.

The method of promoting SCN many netizens also have shared, Degree Niang or gu elder brother can. Here we need to focus on the follow-up needs of the small partners, did not get up in two and do not lose heart. This single push the SCN this piece, we also toss for a long time, even once twice to give up preparation dul.

First look at the description of Oradebug poke:

The first is to find the memory address of the SCN:

The value after the equal sign is the SCN that is currently displayed, but is displayed as 0 because it is the Mount state. Add the current SCN (from v$datafile_header#) with 1 million, turn hex, and push a look:

Check again to see the value of the SCN:

Then "ALTER DATABASE open Uprade", repeatedly try ....

It also uses bbed to modify the block and delete data dictionary records ....

Finally, the database finally open, the data back.

For more detailed details, please follow the dba+ technology salon theme.

Dul and ahttp://mp.weixin.qq.com/s?__biz=mzi4nta1mdewng==&mid=2650756087&idx=2&sn= 126b19493ff2a87130bc9c80c2dd8112&scene=21#wechat_redirectmdu

Fortunately, did not go to the last step, did not use Dul to draw data, otherwise, with this turtle speed, less said is one weeks of things.

Dul and Amdu are all life-saving straws, we have the ability to use, does not mean that we must go to use. And we never talk to customers at this time charges, as a service provider we insist on emergency such as fire! And these lifesaving tools are like a nuclear weapon in a cave, and we want every customer to be able to do pre-planning, maintenance, backup and disaster recovery, and let them lie quietly as a deterrent.

About the maintenance of Exadata

The good thing, you do not care about it, there will always be problems, Exadata is no exception.

Excerpt from a few tools in the Exadata Expert Toolkit, for reference only:

Sundiag

Exawatcher

    • DiskInfo
    • Ibcardino
    • Iostat
    • Netstat
    • Ps
    • Top
    • Vmstat
Exachk

Checkhwnfwprofile

It is necessary for these commands to be checked two weeks.

About database Operations Management tools

Problems happen to others, we sound incredible, think others are not silly ah, or lazy ah, in fact, is not, sometimes really too busy too busy, busy, this time need a set of tools to help everyone.

Yes, that's what you're talking about. Remember our chat yesterday, you say, they are not stupid ah, do not monitor it, usually do not go to see it? I say that if you manage thousands of databases and you don't have the right management tools, it's inevitable that an edge system is going to happen.

So what kind of database operations management tools are appropriate?
    • Database Multi-Dimension monitoring
    • Daily operation and maintenance of scene
    • Real-time Database performance analysis
    • Application Performance Traceability

These aspects complement each other and gradually make operations more easy.

1, the database is a very professional subdivision field, the traditional Itom tools integrated monitoring functions are often too extensive, so the need for professional database multi-dimensional monitoring, monitoring indicators data need to be collected and stored in real time, according to the trend of alarm.

In this case, if there is monitoring of the survival of the Exadata service, the problem can be alerted at least one week before the failure occurs and processed in a timely manner.

2. Scene of daily operation and maintenance

Too many databases mean that the maintenance of any one point requires a lot of time consumption, so you need to integrate and encapsulate some operations scenarios. Like what:

    • Automated Daily Database Inspection
    • Compression and archiving of alarm logs, trace logs
    • such as the maintenance of scheduled work
    • Capacity trend alert and semi-automatic expansion
    • And some custom scenarios (some customers have hundreds of sets of data guard log fixes)
    • Automatic archive of historical data
    • .......

With these features, you can save a lot of time to delve into new technologies and contribute your energy to the renewal of your enterprise's core skills, without having to think about escaping from the abyss of misery all day long.

3. Real-time performance analysis of database

This function is very significant, see the following two scenarios:

    • For example, a phone call, Xiao Zhang, just now Xiao Wang said yesterday 2 o'clock in the afternoon 22 to 2 30 between the database is very slow, they restarted the machine to solve, you analyze the reasons. This time you usually can only hope in Dba_hist_sqlstat, but this size is too coarse, the result is often no results;
    • Time not so long, the database has a large number of TX lock resources, to help see who the source is. You go to see the source process is 3456, but someone is the idle process, is a SELECT statement, obviously it is not locked.

If there is a tool that can help you to record the database in real-time information, and do not query the database, but directly read the SGA, then some of these problems can be solved in minutes, is not very cool?

4. Application Performance Traceability

There are some problems that are obviously applied, but if you don't tell him exactly which application module it is and which user is doing it, you can hardly say that it is an application problem.

If the OPS management tool can not only help you find out which SQL statement is causing the program, but also tell you which path is crawling from, which jar package originated, is it all obvious? Let's go to hell with the pot-back day.

So, is there such a database operations management tool?

The answer is yes.

Author Introduction Yang Zhihong
    • Co-sponsor of the "Dbaplus community", chief evangelist of the New Torch network. Oracle ACE, OCM, Oracle Core technology translator.
    • Data management experts, with more than 10 years of telecommunications, banking, insurance and other large industry core system Oracle Database Operations support experience, master ITIL operation and maintenance system, good at end-to-end performance optimization, complex problem processing. is mainly engaged in data architecture, high availability and disaster recovery consulting services

Exadata FireWire Rescue: 10TB-level Data Repair Classic case Detailed!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.