標籤:lca nested nec ota cert github 文檔 level text
MongoDB在大多數的情形中都是作為資料存放區的模組而被使用,作為一個資料庫,一般不應該承擔更多的任務。
從專業性的角度來說,將文本搜尋的任務交由專業的搜尋引擎來負責,往往是更好的選擇。
常用的搜尋引擎與MongoDB往往都有著現成的工具,可以方便的進行結合。
1、Sphinx與mongodb-sphinx
Sphinx是一個C++編寫的文本搜尋引擎,其本身與MySQL結合的非常好,可以非常方便的從MySQL中匯入資料。
對於其他的資料庫來說,Sphinx並不提供原生的支援,但是Sphinx提供了xmlpipe2介面,任何程式只要實現了相應的介面就可以與Sphinx進行資料互動。
對於MongoDB來說,mongodb-sphinx(https://github.com/georgepsarakis/mongodb-sphinx)就是一個xmlpipe2介面的實現。
mongo-sphinx中帶有一個stackoverflow的範例資料,以及啟動並執行參數範例,只需要將範例資料匯入MongoDB再執行以下的命令即可實現資料向sphinx的匯入
./mongodb-sphinx.py -d stackoverflow -c posts --text-fields profile_image link --attributes last_activity_date _id --attribute-types timestamp string --timestamp-from=1366045854 --id-field=post_id
常用的參數包括:
-d 指定資料庫,-c指定集合,-H指定MongoDB的地址,-p指定MongoDB的連接埠
-f起始時間戳記,-u結束時間戳記,-t需要建立搜尋索引的欄位
-a不索引的屬性,--attribute-types為-a中的屬性指定屬性類型包括字串,時間戳記,整數等等
--id-field用作文檔ID的欄位,--threads線程數
非常重要的一點在於,mongodb-sphinx預設MongoDB資料中的_id為ObjectID,即帶有時間資訊的ID,所以如果需要使用自己的ID系統則在時間判斷上會出現問題,需要自行修改代碼。
2、Elasticsearch和Mongo-Connector
在es2.0及之前的版本中,經常用到的與MongoDB之間進行資料結合的是mongodb-river。
不過在es5之後的版本中,外掛程式已經無法再想之前的版本一樣安裝,所以網上的mongodb-river教程都無法使用。
同時mongodb-river已經有幾年沒有更新,可能對es5的支援不如別的程式。
MongoDB官方提供了類似的工具Mongo-Connector(https://github.com/mongodb-labs/mongo-connector)
安裝方法非常簡單:pip install mongo-connector
Mongo-Connector支援多種不同的搜尋引擎,對於es來說支援1.x,2.x,5.x等多個版本,只需要安裝對應的doc-manager
也可以直接使用,pip install ‘mongo-connector[elastic5]‘安裝,即可直接使用。
使用之前,需要將MongoDB切換為複本集模式,這樣MongoDB才會記錄oplog。
$ mongod --replSet singleNodeRepl$ mongo> rs.initiate()# MongoDB is now running on port 27017
之後,編輯一個設定檔,例如配置密碼資訊等:
{"authentication": {"password": XXX}}
官方內建了一個設定檔的範例:
{ "__comment__": "Configuration options starting with ‘__‘ are disabled", "__comment__": "To enable them, remove the preceding ‘__‘", "mainAddress": "localhost:27017", "oplogFile": "/var/log/mongo-connector/oplog.timestamp", "noDump": false, "batchSize": -1, "verbosity": 0, "continueOnError": false, "logging": { "type": "file", "filename": "/var/log/mongo-connector/mongo-connector.log", "__format": "%(asctime)s [%(levelname)s] %(name)s:%(lineno)d - %(message)s", "__rotationWhen": "D", "__rotationInterval": 1, "__rotationBackups": 10, "__type": "syslog", "__host": "localhost:514" }, "authentication": { "__adminUsername": "username", "__password": "password", "__passwordFile": "mongo-connector.pwd" }, "__comment__": "For more information about SSL with MongoDB, please see http://docs.mongodb.org/manual/tutorial/configure-ssl-clients/", "__ssl": { "__sslCertfile": "Path to certificate to identify the local connection against MongoDB", "__sslKeyfile": "Path to the private key for sslCertfile. Not necessary if already included in sslCertfile.", "__sslCACerts": "Path to concatenated set of certificate authority certificates to validate the other side of the connection", "__sslCertificatePolicy": "Policy for validating SSL certificates provided from the other end of the connection. Possible values are ‘required‘ (require and validate certificates), ‘optional‘ (validate but don‘t require a certificate), and ‘ignored‘ (ignore certificates)." }, "__fields": ["field1", "field2", "field3"], "__namespaces": { "excluded.collection": false, "excluded_wildcard.*": false, "*.exclude_collection_from_every_database": false, "included.collection1": true, "included.collection2": {}, "included.collection4": { "includeFields": ["included_field", "included.nested.field"] }, "included.collection5": { "rename": "included.new_collection5_name", "includeFields": ["included_field", "included.nested.field"] }, "included.collection6": { "excludeFields": ["excluded_field", "excluded.nested.field"] }, "included.collection7": { "rename": "included.new_collection7_name", "excludeFields": ["excluded_field", "excluded.nested.field"] }, "included_wildcard1.*": true, "included_wildcard2.*": true, "renamed.collection1": "something.else1", "renamed.collection2": { "rename": "something.else2" }, "renamed_wildcard.*": { "rename": "new_name.*" }, "gridfs.collection": { "gridfs": true }, "gridfs_wildcard.*": { "gridfs": true } }, "docManagers": [ { "docManager": "elastic_doc_manager", "targetURL": "localhost:9200", "__bulkSize": 1000, "__uniqueKey": "_id", "__autoCommitInterval": null } ]}
之後執行mongo-connector -c config.json命令即可開始進行資料同步。
對MongoDB中的資料進行搜尋(2)