Tutorial on creating a MongoDB copy based on PostgreSQL, postgresqlmongodb
I have an idea of laziness. How can I start with this idea? Well, this is an appropriate little crazy: Why don't we build our own MongoDB version on the basis of ipvs? It sounds a little far-fetched, but it is simple and practical.
When NoSQL sports are surging, The Postgres community does not sit and play with their thumbs up. They continue to develop and run through the entire Postgres ecosystem. Several outstanding features have attracted my attention: Integrated JSON support and PLV8. PLV8 introduced V8 Javascript Engine into Postgres, making Javascript a first-class language ). The JSON type makes it easier to process JSON (which is very effective ).
Preparations before you start:
- Postgres 9.2 + (as of this blog entry, 9.2 is in beta)-http://www.postgresql.org/ftp/source/
- V8 https://github.com/v8/v8
- PLV8 http://code.google.com/p/plv8js/wiki/PLV8
The minimum level of MongoDB is set. The set can be expressed in tables:
CREATE TABLE some_collection ( some_collection_id SERIAL NOT NULL PRIMARY KEY, data JSON );
JSON of the struct type is saved in the s table, which is easy to use (as shown in the following figure ).
The following describes how to automatically create a set and save it in the set table:
CREATE TABLE collection ( collection_id SERIAL NOT NULL PRIMARY KEY, name VARCHAR ); -- make sure the name is unique CREATE UNIQUE INDEX idx_collection_constraint ON collection (name);
Once the table is created, you can use the stored procedure to automatically create a set. The method is to create a table first and then insert the sequence of table creation.
CREATE OR REPLACE FUNCTION create_collection(collection varchar) RETURNS boolean AS $$ var plan1 = plv8.prepare('INSERT INTO collection (name) VALUES ($1)', [ 'varchar' ]); var plan2 = plv8.prepare('CREATE TABLE col_' + collection + ' (col_' + collection + '_id INT NOT NULL PRIMARY KEY, data JSON)'); var plan3 = plv8.prepare('CREATE SEQUENCE seq_col_' + collection); var ret; try { plv8.subtransaction(function () { plan1.execute([ collection ]); plan2.execute([ ]); plan3.execute([ ]); ret = true; }); } catch (err) { ret = false; } plan1.free(); plan2.free(); plan3.free(); return ret; $$ LANGUAGE plv8 IMMUTABLE STRICT;
With the stored procedure, it is much easier:
SELECT create_collection('my_collection');
To solve the problem of centralized storage, let's take a look at MongoDB data parsing. MongoDB completes this operation through the point annotation method:
CREATE OR REPLACE FUNCTION find_in_obj(data json, key varchar) RETURNS VARCHAR AS $$ var obj = JSON.parse(data); var parts = key.split('.'); var part = parts.shift(); while (part && (obj = obj[part]) !== undefined) { part = parts.shift(); } // this will either be the value, or undefined return obj; $$ LANGUAGE plv8 STRICT;
The above function returns VARCHAR and does not apply to all cases, but it is useful for strings:
SELECT data FROM col_my_collection WHERE find_in_obj(data, 'some.element') = 'something cool'
In addition to string comparison, MongoDB also provides comparison of numeric types and the keyword exists. Below are different implementations of the find_in_obj () method:
CREATE OR REPLACE FUNCTION find_in_obj_int(data json, key varchar) RETURNS INT AS $$ var obj = JSON.parse(data); var parts = key.split('.'); var part = parts.shift(); while (part && (obj = obj[part]) !== undefined) { part = parts.shift(); } return Number(obj); $$ LANGUAGE plv8 STRICT; CREATE OR REPLACE FUNCTION find_in_obj_exists(data json, key varchar) RETURNS BOOLEAN AS $$ var obj = JSON.parse(data); var parts = key.split('.'); var part = parts.shift(); while (part && (obj = obj[part]) !== undefined) { part = parts.shift(); } return (obj === undefined ? 'f' : 't'); $$ LANGUAGE plv8 STRICT;
The next step is data query. Use existing materials to implement the find () method.
It is easy to save data to the collection. First, we need to check the JSON object and find a _ id value. This part of the code is a native assumption. If _ id already exists, this means an update; otherwise, it means an insert. Please note that we have not created objectID yet and only use one sequence for it to happen:
CREATE OR REPLACE FUNCTION save(collection varchar, data json) RETURNS BOOLEAN AS $$ var obj = JSON.parse(data); var id = obj._id; // if there is no id, naively assume an insert if (id === undefined) { // get the next value from the sequence for the ID var seq = plv8.prepare("SELECT nextval('seq_col_" + collection + "') AS id"); var rows = seq.execute([ ]); id = rows[0].id; obj._id = id; seq.free(); var insert = plv8.prepare("INSERT INTO col_" + collection + " (col_" + collection + "_id, data) VALUES ($1, $2)", [ 'int', 'json']); insert.execute([ id, JSON.stringify(obj) ]); insert.free(); } else { var update = plv8.prepare("UPDATE col_" + collection + " SET data = $1 WHERE col_" + collection + "_id = $2", [ 'json', 'int' ]); update.execute([ data, id ]); } return true; $$ LANGUAGE plv8 IMMUTABLE STRICT;
Based on this point of view, we can build some simple inserted documents:
{ "name": "Jane Doe", "address": { "street": "123 Fake Street", "city": "Portland", "state": "OR" }, "age": 33 } { "name": "Sarah Smith", "address": { "street": "456 Real Ave", "city": "Seattle", "state": "WA" } } { "name": "James Jones", "address": { "street": "789 Infinity Way", "city": "Oakland", "state": "CA" }, "age": 23 }
Let's create a set and insert some data:
work=# SELECT create_collection('data'); create_collection ------------------- t (1 row) work=# SELECT save('data', '{ our object }'); save ------ t (1 row)
You can check the content of the "col_data" table to view the object.
Other translation versions (1)
Now we have some data. Let's query it again. Suppose we want to find everyone who lives in Oregon or Washington State older than 30, using a MongoDB-style find ():
{ "$or": [ { "address.state": "OR" }, { "address.state": "WA" } ], "age": { "$gt": 30 } }
Because we have created some in-depth packet inspection last time, it is now easy to create a query and return the Jane Doe:
SELECT data FROM col_data WHERE find_in_obj_int(data, 'age') > 30 AND ( find_in_obj(data, 'address.state') = 'OR' OR find_in_obj(data, 'address.state') = 'WA' )
I used a method to write a recursive call function to create a WHERE clause. It is a little long, so I didn't paste it here, but put it on GitHub. Once the find () stored procedure is created, we can use it in the query. We should be able to see that Jane Doe is returned:
work=# SELECT find('data', '{ "$or": [ { "address.state": "OR" }, { "address.state": "WA" } ], "age": { "$gt": 30 } }');
This works: it is not elegant, but it works. This is a proof of concept, and there is almost no such good possibility. I was asked why I didn't use HSTORE. Although you can store nested HSTORE and array values, it is still not JSON and is not easy to operate through PLV8. This requires a sequencer from HSTORE to JSON, which serializes the request return to the data format accepted by MongoDB at any time, but is still too easy to process in JavaScript. This is a sub-optimal choice. After all, we need to create a MongoDB copy on the basis of ipvs.
You can find the source code on GitHub: fork and try it. Remember to give back.