A document database is used in a recent project. By the way, I checked NoSQL information. Currently, several popular document database engines include MongoDB, CouchDB, Couchbase, and OrientDB. We recommend MongoDB and RethinkDB, similar to MongoDB, RethinkDB is a database engine mainly used to store JSON files (MongoDB stores BSON). It can be easily connected to multiple nodes for distribution.
A document database is used in a recent project. By the way, I checked NoSQL information. Currently, several popular document database engines include MongoDB, CouchDB, Couchbase, and OrientDB. We recommend MongoDB and RethinkDB, similar to MongoDB, RethinkDB is a database engine mainly used to store JSON files (MongoDB stores BSON). It can easily connect multiple nodes into a distributed database, very easy-to-use Query Language and support for table joins and group by operations.
I tried RethinkDB yesterday and tested it on a virtual machine. The performance of inserting 25 million rows of records is very average, far from MongoDB and Couchbase fast, but it is still stable, the RethinkDB data sharding function is simple and can be completed with one click. It is maintained between 1.5 K rows and 2 K rows per second. The following installation and test are completed on Ubuntu 12.04.4 LTS Server.
Add RethinkDB official source and install:
$ sudo apt-get install python-software-properties$ sudo add-apt-repository ppa:rethinkdb/ppa$ sudo apt-get update$ sudo apt-get install rethinkdb
Copy the configuration file of an example and modify the bind part for access from other machines:
$ cd /etc/rethinkdb/$ sudo cp default.conf.sample instances.d/default.conf$ sudo vi instances.d/default.conf...# bind=127.0.0.1bind=0.0.0.0...
Start rethinkdb:
$ sudo /etc/init.d/rethinkdb startrethinkdb: default: Starting instance. (logging to `/var/lib/rethinkdb/default/data/log_file')
Access http: // 192.168.2.39: 8080/to view the rethinkdb management interface:
If you do not like to work in the command line, the web interface also provides the Data Explorer online query tool, supporting syntax highlighting, online function prompts, and so on, without the need for additional help files.
If you want to use a program to deal with rethinkdb, you need to install the client Driver (client drivers). The drivers officially supported include JavaScript, Ruby, and Python, the drivers supported by the community include almost all mainstream programming languages, including C, Go, C ++, Java, PHP, Perl, Clojure, and Erlang. I use Python more, so install the Python client driver here:
$ sudo apt-get install python-pip$ sudo pip install rethinkdb
Test whether the driver can work. If there is no error in import rethinkdb, the module is successfully installed:
$ pythonPython 2.7.3 (default, Feb 27 2014, 19:58:35)[GCC 4.6.3] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import rethinkdb>>>
Gene2go.txt is a text file containing genetic data. It contains approximately more than 10 million lines of records in the following format:
$ head -2 gene2go.txt#Format: tax_id GeneID GO_ID Evidence Qualifier GO_term PubMed Category (tab is used as a separator, pound sign - start of a comment)3702814629GO:0005634ISM-nucleus-Component
Write a simple program to import data from gene2go.txt to rethinkdb:
#!/usr/bin/python# -*- coding: UTF-8 -*-import os, os.path, sys, re, csv, stringdef csv2db(): data = csv.reader(open('gene2go.txt', 'rb'), delimiter='\t') data.next() import rethinkdb as r r.connect('localhost', 28015).repl() r.db('test').table_create('gene2go').run() gene2go = r.db('test').table('gene2go') for row in data: gene2go.insert({ 'tax_id': row[0], 'GeneID': row[1], 'GO_ID': row[2], 'Evidence': row[3], 'Qualifier': row[4], 'GO_term': row[5], 'PubMed': row[6], 'Category': row[7] }).run(durability="soft", noreply=True)def main(): csv2db()if __name__ == "__main__": main()
RethinkDB: Click here
RethinkDB: Click here
This article permanently updates the link address: Http://www.linuxidc.com/Linux/2015-08/121784.htm