At noon received a phone call in the coma, the server after the expansion of the database can not start, so without thinking of the tokumx need to close the use of transparent large pages to start normally, and this happens that everyone is not very clear, here write down the shell:echo never >/sys/ Kernel/mm/transparent_hugepage/enabled generally writes it to the boot-up script, or the database does not start normally.
However, when I looked at the server remotely, the database is missing, the impression that the service installed under the dev mount point, so I went through the dev, the overall grep,find and TOKUMX related documents, and later only found some irrelevant things, panic.
Who would be careless to delete the database, or I remember the wrong directory, panic in the search for the home,opt,etc .... etc., and so on, the whole server has not found a trace, a thought flashed over, not every server has done a boot, the inside child does not have a database path. Heart know this small problem to be solved, open rc.local A look: This is a big problem (in fact, my heart is collapsed, sheep camel in a happy run). I do not believe, I do not accept the fact, so I went to the dev again, a variety of grep, even find-name command I do not believe. Facts tell me that it is useless to believe, but none is not.
I began to wonder if someone had deleted the database, and I am still very calm, because we have a backup machine, the worst can find the correct installation path of the database. Whenever I am complacent with intelligence, reality will always make me wake up. The backup machine is offline .... The phone learned that in order to expand the backup machine also shut down, so inform the boot backup. Then, the backup machine is finished, the same directory, nothing ... Arrange funeral, death will have to bury, tube kill will be buried, the leadership is responsible for the customer to do the aftercare work, I began to bury. At this point I still think that someone accidentally thought that this directory has a strange folder to look at the inappropriate deletion, and this directory is exactly our database directory.
So I started a random search system log, a variety of searches folder delete logs, Linux system logs, view the/var/log under the various logs and did not get any useful content.
I even think of data recovery, the online look at the various Linux system data recovery methods and tools, try a variety of search, but think of some of the previous things careless consideration is not safe, so the data recovery program as a last resort, the end ...
Rationale, because the primary service database load dev mount point, the backup machine in order to maintain consistency is also convenient memory, also did not want to install the same mount point, so changed the search keyword: dev subdirectory missing, I think I got the answer
Still do not forget, until I see this answer after the attempt. Mount/dev/mongodb, all kinds of mount, fancy mount, hint at/etc/estab/does not have this mount point, in fact there is no record of this mount point, multiple attempts, useless.
In order to prove the authenticity of the above statement, hands-on experiment, in the dev set up the test directory, Inside Touch 123.txt,reboot, view, No.
The meaning is very clear, the database can not find back, all things have to start from scratch. Kung Fu, leadership tips, I think of remedial measures, database files and database should not be in the same directory, delighted to find all the files in the library one by one listed here (the name of this directory Toku misspelled)
What to do next, think of the beginning of the company is because of the wrong database path caused by the database is not loaded, however, there is no original cluster configuration of these files and what is the use. But think of such a big company made out of the software should not be so silly, and originally because of the Linux operating system and MongoDB do not know and fear not to delve into the cause, this time regardless of how I want to restore this data back.
At this time, because the previous use of MongoDB has tried to directly change the configuration file in the DBPath is not working, and even not sure that the DBPath configuration database file will be rebuilt, in a cautious manner I reinstall the TOKUMX, this time the database installation is specified in the home directory, The database file directory is also specified in the new TOKUMX installation directory:/home/tokumx/data/(feeling about a key installation really good, save a lot of time).
To the two servers here Tokumx re-installed, the cluster seems to be a bit small problem, manually processed a bit to restore the cluster.
This time, unlike before, the database is functioning as a cluster, assuming that only the data is restored, will the cluster still function? No matter how to first make a server with standalone mode is also OK, so I search a variety of MongoDB data import method, learned to use Mongoimport, with Mongoimport–h to see the Help, notify with--dbpath Specify Import folder. So./mongoimport--dbpath/home/cdastuko/data/as a result, Mongoimport only supports import of types such as json,csv and so on. Then search how to load the database files directly, like the separation and loading of SQL Server, without getting any useful answers. Mongorestore also does not apply.
Later thought tokumx not MongoDB after all, it has more humane side, why not try to directly modify the DBPath, in case you can load it, even if there is no backup machine. Give yourself a strong courage to confirm that the backup machine on the database file exists after the start.
Rationale, presumably this: in order not to affect the cluster, first modify the standalone mode configuration file, start in a non-clustered way, if you can load on the cluster configuration file to make the same changes, if the cluster does not start to delete a machine database let the cluster automatically synchronized past.
Stop the database service
./mongod–shutdown–f Tokushard.cfg
Modify the standalone configuration file
VI tokumx.cfg
Modified: dbpath=/home/cdastuko/data/
Reboot, with standalone configuration
./mongod–f Tokumx.cfg
Start to complete, quickly check the data, a lot of all in. Success! Very excited!
The next work is simple, the data in the database, the cluster is not good, so according to the above method modified Tokushard.cfg will be inside the dbpath more into the old database path, the same as the backup machine. The modification completes, the TOKUMX which launches two servers separately in the cluster configuration file, is extremely fortunate, succeeds. At this time although the early morning but because happy completely not sleepy, excitedly connected to the cluster looked at the state
What the hell is a STARTUP2? Can you eat??? Searched for the various meanings of the statestr of the MONGO cluster, waiting, 1 minutes, 2 minutes, 5 minutes, 10 minutes passed, still STARTUP2, thinking bad, something to happen. Hurriedly open Robomongo Check the library, sure enough, two servers can not query, in suspended animation state, any operation is not working. Once again disheartened.
Look carefully at the time shown above, but also ignorant, completely on the number, and then use the Date command to view the time of the two servers, the difference of 1 minutes or so, will not be because of the synchronization, with the Date–s command manually modified the two server time, so that it remains in the difference within 10 seconds ( The time synchronization service may not be working).
Date–s 02:44
Then shutdown the cluster and reboot. The situation did not change, then looked at Rs.status () and saw the state:
Replset Initial sync Pending
Cluster initialization synchronization hangs ... So according to this keyword search a bunch of articles, someone proposed to use Rs. Slaveok (), executed on two servers respectively, no results. Think of the particularity of the cluster, whether it is not directly to modify the database path, otherwise affect the normal operation of the cluster, whether there are other configuration records in the admin or local library, so I went through the two servers of the local and admin library table, Get the only feeling not big right is the DBPath field in the Cluster Operation log table also records the installation of/home/tokumx/data/, although it is not a log feel useless, anyway, has come to this step, you see the dbpath are replaced with the old database file path (/home/ cdastuko/data/).
Think about the normal cluster configuration and now is not what difference, so open the local virtual machine, see the virtual machine on the local library, found no difference.
May be the excitement of the head, a long time to remember the log, in order to be clear, I emptied the log folder, restart the cluster, 5 minutes after hurriedly opened the log file Cat/home/tokumx/log/tokumx.log
I saw a few lines of very interesting words :
I can't find the main service, and I can't elect myself (because it's a master-slave backup)
The following solution was found to reset the cluster configuration:
There is no effect on the two servers after they are executed separately, even if using admin login is completely useless. Later found the relevant information, the cluster in the election of the main service will have a priority, change priority after the restart of the cluster will elect the main service, so we connect to the cluster, get to the configuration file, here will one of the priority to 3:
Config=rs.config ()
Config.members[0].priority=3 (the larger the number, the higher the priority, the default is 1)
But I still get the failure. Although there is an alternative is to delete all the data of the backup machine, reset the cluster, automatically synchronize the data of the main service, but this scheme is too large to ensure foolproof, the large amount of data do not know when to synchronize, forced to use this program.
So back to the initial problem, the cluster card in the state StartUp2, the pattern search, there is no useful information, it seems to have to ask for the wrong nut, change the search content: Replicaset stuck at startup2. Don't say it's really working, Magnum StackOverflow, a Web site of the ape world known for load balancing has been answered
Looked at the landlord and the answer to the dialogue, found and I met the exact same.
There are several solutions to the dialogue between the host and the solution, the steps are written in detail, the content is more than the address
Http://stackoverflow.com/questions/21642396/mongodb-all-replset-stuck-at-startup2
Here I only use the first step, force reset the cluster state, hit Enter the moment I know it works, don't ask why I know, because I see it stuck, not immediately returned the error message.
Lost a nap, probably after half an hour, back to the server a newspaper wrong, but immediately following the cluster status is primary. This means that this long effort has finally been rewarded.
Big sad exultation, emotional ups and downs of the night, is really thrilling and extremely happy, big problem solved also sleep what sleep, while just finished still have the impression, strike down this article, in case of occasional suffering.
Remember once TOKUMX database cluster recovery