tl;dr: Don ' t ever setFsync=off, don ' tkill-9The postmaster then deletepostmaster.pid, don ' t run PostgreSQL on network file systems.
Reports of database corruption on the PostgreSQL mailing list is uncommon, but hardly rare. While a few data corruption issues has been found in PostgreSQL over the years, the vast majority of issues is Caused by:
- Administrator Action;
- misconfiguration; Or
- Bad hardware
A Recent Mailing List Post asked what can is done to reduce the chance of corruption and keep data safe.
If you think a corrupt PostgreSQL database, stop the database server and take a complete copy of the data dir Ectory now. See the corruption page on the wiki. Then ask for help on the Pgsql-general mailing list, or for critical/urgent issues contact a profesional support provider.
attempt to fix the problem before taking a complete copy of the entire data directory. You might make the problem much worse and turning a recoverable problem into unrecoverable data loss.
Here's my advice for avoiding DB corruption issues, with some general PostgreSQL administration advice thrown in for good Measure
Read the Manual
Read The Reliability section of the manual in detail.
Many of the issues people on the mailing lists ask about could is avoided if they ' d only read the manual. The PostgreSQL manual is well written, comprehensive, and would teach you on SQL and databases in general, not just Post Gresql. Reading It cover-to-cover helped me learn more on database systems than any one other thing I ' ve ever done. Read it.
PostgreSQL is easy enough-use, so you don't have to read the manual, but you'll get a lot more out of the The Syst EM if you do. You'll be able to tune it for better performance, know what to avoid a few pitfalls, and know Es. You ' d know to avoid usingSIGKILL(kill -9) to terminate the postmaster, for example.
Updates
BUG fixes is no good if you don ' t apply them. Keep up to date with the latest PostgreSQL patch releases. Don ' t be one of the those people still running 9.0.0 when 9.0.10 was out. The team releases updates for a reason. Minor point releases require minimal downtime as there's no need to dump and reload; Just stop the DB, install the new binaries, and start the db. Even this downtime can is avoided with failover and failback using something like repmgr.
As per the versioning policy they ' re extremely conservative about what gets back-patched to released major versions so u Pgrades is very safe. Read the versioning policy. Just read the release notes for each version of between yours and the new version, then install it.
For topic read the FAQ entry A bug i ' m encountering was fixed in A newer minor release of PostgreSQL N ' t want to upgrade. Can I get a patch for just this issue?
Keep an eye on the end of the life dates. Plan to upgrade to a new major version well before your version goes EOL.
Administrator Action
Don ' tkill -9(SIGKILL) any PostgreSQL processes. It should is fine, but it's still not smart.kill -9should was your last resort, first usepg_cancel_backendthenpg_terminate_backend(which is effectively akill -15). If youkill -9a backend the whole DB would terminate and restart, breaking all connections and forcing crash recovery.
Never deletepostmaster.pid.
(corruption should not actually occur unless youkill -9the postmaster if leave otherpostgresprocesses running and then delete Andpostmaster.pidstart a new postmaster. If you insist on unlocking the gun, loading the shotgun, aiming it at your foot and pulling the trigger we can ' t stop You. Don ' t do that.)
Don ' t delete anything in the data directory, with the sole exception of the contents ofpg_logif your logs is wit Hin your data directory. Everything else is vital and should never being messed with directly by the administrator. Do not delete anything frompg_clogorpg_xlog, these is vital components of the database system.
Do not run your database with ,fsync = off if It contains any data of care about. Turning off The fsync configuration parameter effectively gives PostgreSQL permission to avoid slow syncs at the Risk of severe and unrecoverable data corruption if the server crashes or loses power. If It's possible to recover such a database at all it ' ll take many hours of time from a serious (and expensive) expert. These days there ' s rarely a reason to turn Fsync = offanyway; You can achieve many of the same benefits without the corruption risks usingasynchronous Commit and a commit del Ay.
The system catalogs (pg_catalog.*) is there for looking, not for touching. Do not everINSERT,UPDATEor in theDELETEsystem catalog tables unless is absolutelycertain you know What is doing. Only a superuser or (for some catalogs) the database owner can mess directly with the catalogs, and you shouldn ' t is Runni Ng as either day-to-day.
If you need thenpg_resetxlogalready has a corruption problem. Ispg_resetxloga last resort for getting a damaged database going Again, not a semi-routine operation like MySQL ' smyisamchk. If you had topg_resetxloga db, you need to work out what went wrong and why, then dump the DB (s), re-initdb, and restore the D Umps. Don ' t just keep on using the damaged database.
Backups, Replication and PITR
The backups section of the manual is good. Read it. You should also look at the Barmantool, which helps automate many of these tasks.
Maintain rolling backup dumps with proper ageing. For example, keep one day for the last 7 days and then one a week for the last 4 weeks and then one a month for the rest of th Year of E, then one. Backups won ' t stop corruption or damage happening, but they ' ll help you recover with a minimum of pain. In my view good the pg_dump are the tool of choice here; I tend to produce one-Fcdump per database and a forpg_dumpall --globals-onlyusers/roles/etc.
Use warm standby with log shipping and/or replication to maintain a live copy of the DB, so you don ' t has that painful WI Ndow between your last backup and when the failure occurred. Do not rely solely in replication, take dumps too.
If you want point-in-time recovery, keep a few days or weeks worth of WAL archives and a basebackup around. That's ' ll help you recover from those "oops I meant notDROP TABLE unimportant;DROP TABLE vital_financial_records;" issues with a minimum of data loss.
Barman can help with warm standby and PITR.
If taking filesystem-level backups of the data directory, you do not have to stop the DB if your usepg_basebackup, or use andpg_start_backuppg_stop_backupcorrectly. seethe Manual. Your do has to include the whole data directory. Do not omit folders you think is unimportant from the backup. People who leftpg_xlogor out ofpg_clogtheir backup as "unimportant" periodically turn-asking for help on the mailing Li St. Don ' t be one of them.
Remember to test your backups, don ' t just assume they ' re fine.
Hardware
Plug-pull Test your system when you ' re testing it before going live. Put it under load with something like pgbench, then literally pulling the power plug out so your server loses power and shuts Down Uncleanly. If your database doesn ' t come back to fine you have hardware, OS or configuration problems. Do the repeatedly while under load, until your ' re confident it ' ll come up reliably.
Use good quality hardware with proper cooling and a good quality power supply. If possible, ECC RAM is a nice extra.
Never, ever, ever use cheap SSDs. Use good quality hard drives or (after proper testing) high end SSDs. Read the SSD Revie WS periodically posted on this mailing list if considering using SSDs. Make sure the SSD have Power-loss protection for its Write cache-a supercapacitor/ultracapacitor, low-power capacitor array, secondary battery, or other write-cache protect Ion. Always does repeated plug-pull testing when using SSDs, don ' t just trust the specifications.
Avoid RAID 5, mostly because the performance is terrible, but also because I ' ve seen corruption issues with rebuilds from Parity on failing disks. If you ' re using spinning disks RAID, a better option for a database.
If you had a hardware RAID controller with a battery backed cache (BBU), test the BBU regularly. A battery backup that doesn ' t was no good, and a RAID controller in Write-back caching mode without a working BBU is G Uaranteed to severely corrupt your data if the system loses power.
Use a good the quality hardware RAID controller with a battery the backup cache unit if you ' re using spinning disks in raid. Much for performance as reliability; A BBU would make an immense difference to database performance. Be aware this if your RAID controller fails you must is able to get a hold of a compatible one before you can read your AR Ray. If you don't add a battery backed cache won ' t gain much over Linux ' smdsoftware RAID, which have the advantage that Arrays is portable from machine to machine. Do not use cheap On-motherboard/in-bios "raid" systems, they combine the worst features of hardware and software RAID.
If you ' re going to has a UPS (you shouldn ' t need one as your system should is Crash-safe), don ' t waste your money on a CH EAP one. Get a good online double-conversion unit that does proper power filtering. Cheap UPSs is just a battery with a fast switch, they provide no power filtering and what little surge protection they of Fer is do with a component this wears out over time and load, becoming totally ineffective. Since your system should be crash-safe a cheap UPS would do nothing for corruption protection, it ' ll only have help with uptime.
I shouldn ' t has to say this, but ... don ' t run your database off a USB flash drive ("USB memory Key" or "USB stick").
Software and OS
Use a solid, reliable file system. Zfs-on-linux, Btrfs, etc is not the right of choices for a database of your care about. Never, ever, ever use FAT32.
If on Windows, does not run a anti-virus program on your
database server. Nobody should is using it for the other things or running programs on it anyway. Antivirus software intercepts system calls in all sorts of the fun and interesting ways. Many of these is subject to exciting race conditions and bugs when software like PostgreSQL have several processes all Rea Ding from and writing to the same files. Antivirus Software may also try to "clean" the database files, causing severe corruption. If you absolutely must run antivirus software, add exclusions for the PostgreSQL Data directory and PostgreSQL install dir Ectory. If your AV software supports process exclusions, add an exclusion for all programs in the postgresql bin directory, especially postgres.exe.
Make sure your OS are disabling your hard drive's write cache or properly flushing it. These days this is a pretty safe bet unless your ' re using a cheap SSD that ignores flush requests. Plug-pull testing should catch write-cache issues, as should the tools mentioned in the reliability documentation linked T o above.
Also wish to consider reading these wiki articles:
- PostgreSQL FAQ
- Reliable writes
- Direct Storage vs SAN
- SCSI vs Ide/sata disks
- Installation and administration best practices
Reference:
Http://blog.ringerc.id.au/2012/10/avoiding-postgresql-database-corruption.html
Avoiding PostgreSQL Database corruption