(一) mongodb 官網對主從複製鎖部分的說明
How does concurrency affect a replica set primary?
In replication, when MongoDB writes to a collection on the primary,
MongoDB also writes to the primary’s oplog, which is a special collection in the local database.
Therefore, MongoDB must lock both the collection’s database and the local database. The mongod must
lock both databases at the same time keep both data consistent and ensure that write operations, even with replication, are “all-or-nothing” operations.
How does concurrency affect secondaries?
In replication, MongoDB does not apply writes serially to secondaries.
Secondaries collect oplog entries in batches and then apply those batches in parallel. Secondaries do not allow reads while applying the write operations, and apply write operations in the order that they appear in the oplog.
MongoDB can apply several writes in parallel on replica set secondaries, in two phases:
- During the first prefer phase, under a read lock, the mongod ensures
that all documents affected by the operations are in memory. During this phase, other clients may execute queries against this member.
- A thread pool using write locks applies all write operations in the batch as part of a coordinated write phase.
(二) 結合mong 官網的理論理解源碼
1. mongdb啟動代碼流程
-------db.cpp-----
main
--mongoDbMain
----initAndListen
------_initAndListen
---------Listen
------------createServer(options, new MyMessageHandler() );
------------startReplication
上面我們主要關注最後的兩個點
(1) MyMessageHandler 這個類是mongdb接收到請求後處理message 的,這個類中的process函數用來處理請求,我們看process函數的調用流程
process
--assembleResponse
if ( op == dbQuery ) { 查詢資料
}
else if ( op == dbGetMore ) { 查詢
}
else if ( op == dbMsg ) {
}
else {
try {
if ( op == dbKillCursors ) {
}
else if ( !nsString.isValid() ) {
}
else if ( op == dbInsert ) {
receivedInsert(m, currentOp); 插入資料
}
else if ( op == dbUpdate ) {
receivedUpdate(m, currentOp); 更新資料
}
else if ( op == dbDelete ) {
receivedDelete(m, currentOp); 刪除資料
}
else {
}
我們這裡只分析 寫資料的主體部分
receivedInsert(m, currentOp);
while ( true ) {
try {
Lock::DBWrite lk(ns); 申請了資料庫的寫鎖
。。。。。。。。。。。。。。。。。。。
if (multi.size() > 1) {
const bool keepGoing = d.reservedField() & InsertOption_ContinueOnError;
insertMulti(keepGoing, ns, multi, op); 寫資料和oplog
} else {
checkAndInsert(ns, multi[0]); 寫資料和oplog
globalOpCounters.incInsertInWriteLock(1);
op.debug().ninserted = 1;
}
return;
}
上面代碼的粗體部分是重點, insertMulti函數主要調用的仍然是checkAndInsert,
void checkAndInsert(const char *ns, /*modifies*/BSONObj& js) {
{
while ( i.more() ) {
theDataFileMgr.insertWithObjMod() 資料插入資料庫
logOp("i", ns, js); 寫oplog
}
}
logOp函數的作用是寫 oplog(local庫的oplog表), 此函數的主體部分如下
if ( replSettings.master ) { 判斷主從模式的 主節點
_logOp(opstr, ns, 0, obj, patt, b, fromMigrate); 主體如下
}
_logOpOld
----Lock::DBWrite lk("local"); 鎖local 庫
----寫oplog 寫oplog表
2)總結主節點的寫操作
1. 寫鎖 要寫的庫
2. 寫資料
3. 寫鎖local庫
4. 寫oplog
(3) startReplication()函數的主要流程及作用
函數主體代碼如下
if ( replSettings.slave ) {
boost::thread repl_thread(replSlaveThread) 從節點,啟動一個replSlaveThread 線程
replSlaveThread);
}
if ( replSettings.master ) {
replSettings.master = true;
createOplog(); 主節點建立oplog表相關
boost::thread t(replMasterThread); 開啟一個replMasterThread線程
}
1. 先分析主節點replMasterThread線程的作用
static void replMasterThread() {
int toSleep = 10;
while( 1 ) {
sleepsecs( toSleep ); 睡眠10秒
{
writelocktry lk(1);
logKeepalive(); 重點 ,下面分析
}
}
logKeepalive 函數主體如下
Void logKeepalive() {
_logOp("n", "", 0, BSONObj(), 0, 0, false); 見前面分析主節點寫資料的部分,鎖local庫
}
現在就清楚了,這個線程就是往oplog裡每10秒寫一條資料 ,如下
{ "ts" : Timestamp(1373347524, 1), "op" : "n", "ns" : "", "o" : { } }
2. slave 部分的 replSlaveThread 線程分析
void replSlaveThread() {
sleepsecs(1);
Client::initThread("replslave");
while ( 1 ) {
try {
replMain(); slave線程在迴圈中調用此函數,下面分析函數作用
sleepsecs(5);
}
}
}
replMain 函數 主體部分如下,調用_replMain
while ( 1 ) {
s = _replMain(sources, nApplied);
}
_replMain 函數主體部分如下
_replMain
{
Lock::GlobalWrite lk; 擷取全域鎖
ReplSource::loadAll(sources); 一個slaver節點可以配置多個主節點
} 釋放所
for ( ReplSource::SourceVector::iterator i = sources.begin(); i != sources.end(); i++ ) {
res = s->sync(nApplied);
sync
----sync_pullOpLog
---------從主節點的oplog擷取資料
---------Lock::GlobalWrite> lk( justOne ? 0 : new Lock::GlobalWrite() ); 擷取全域寫鎖
--------sync_pullOpLog_applyOperation(BSONObj& op, bool alreadyLocked) 把oplog 的資料寫入自己的庫中
3. 總結 slave 部分
開一個線程迴圈從master節點讀取oplog資訊, 擷取 全域寫鎖, 寫資料庫
和官方文檔描述的一致,擷取主節點資訊時從節點依然可以服務,當把oplog應用於自身資料庫時擷取了全域寫鎖,無法服務於用戶端。