Similar to MongoDB, RethinkDB is a database engine mainly used to store JSON files (MongoDB stores BSON). It can easily connect multiple nodes into a distributed database, very easy-to-use Query Language and support for table joins and group by operations.
Yesterday I tried RethinkDB and tested it on a virtual machine. The performance of inserting 25 million rows of records is relatively stable and maintained between 1.5 K rows and 2 K rows per second. The RethinkDB Data Partition (sharding) the function is very simple and can be completed with one click. The following installation and test are completed on Ubuntu 12.04.4 LTS Server.
Add RethinkDB official source and install:
Copy codeThe Code is as follows: $ sudo apt-get install python-software-properties
$ Sudo add-apt-repository ppa: rethinkdb/ppa
$ Sudo apt-get update
$ Sudo apt-get install rethinkdb
Copy the configuration file of an example and modify the bind part for access from other machines:
Copy codeThe Code is as follows: $ cd/etc/rethinkdb/
$ Sudo cp default. conf. sample instances. d/default. conf
$ Sudo vi instances. d/default. conf
...
# Bind = 127.0.0.1
Bind = 0.0.0.0
...
Start rethinkdb:
Copy codeThe Code is as follows: $ sudo/etc/init. d/rethinkdb start
Rethinkdb: default: Starting instance. (logging to '/var/lib/rethinkdb/default/data/log_file ')
Access http: // 192.168.2.39: 8080/to view the rethinkdb management interface:
If you do not like to work in the command line, the web interface also provides the Data Explorer online query tool, supporting syntax highlighting, online function prompts, and so on, without the need for additional help files.
If you want to use a program to deal with rethinkdb, you need to install the client Driver (client drivers). The drivers officially supported include JavaScript, Ruby, and Python, the drivers supported by the community include almost all mainstream programming languages, including C, Go, C ++, Java, PHP, Perl, Clojure, and Erlang. I use Python more, so install the Python client driver here:
Copy codeThe Code is as follows: $ sudo apt-get install python-pip
$ Sudo pip install rethinkdb
Test whether the driver can work. If there is no error in import rethinkdb, the module is successfully installed:
Copy codeThe Code is as follows: $ python
Python 2.7.3 (default, Feb 27 2014, 19:58:35)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> Import rethinkdb
>>>
Gene2go.txt is a text file containing genetic data. It contains approximately more than 10 million lines of records in the following format:
Copy codeThe Code is as follows: $ head-2 gene2go.txt
# Format: tax_id GeneID GO_ID Evidence Qualifier GO_term PubMed Category (tab is used as a separator, pound sign-start of a comment)
3702 814629 GO: 0005634 ISM-nucleus US-Component
Write a simple program to import data from gene2go.txt to rethinkdb:
Copy codeThe Code is as follows :#! /Usr/bin/python
#-*-Coding: UTF-8 -*-
Import OS, OS. path, sys, re, csv, string
Def csv2db ():
Data = csv.reader(open('gene2go.txt ', 'rb'), delimiter =' \ t ')
Data. next ()
Import rethinkdb as r
R. connect ('localhost', 28015). repl ()
R. db ('test'). table_create ('gene2go '). run ()
Gene2go = r. db ('test'). table ('gene2go ')
For row in data:
Gene2go. insert ({
'Tax _ id': row [0],
'Geneid': row [1],
'Go _ id': row [2],
'Vious': row [3],
'Qualifier ': row [4],
'Go _ termin': row [5],
'Pubmed ': row [6],
'Category ': row [7]
}). Run (durability = "soft", noreply = True)
Def main ():
Csv2db ()
If _ name _ = "_ main __":
Main ()