Talking about Pipelinedb Series one: How the stream data is written to the continuous view

Last Update:2017-07-09 Source: Internet

Author: User

Tags postgresql version

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Pipelinedb version:0.9.7

PostgreSQL version:9.5.3

Data processing components for Pipelinedb:

From the point of view is mainly pipeline_streams,stream_fdw,continuous view,transform.

In fact is the use of Postgres FDW function to achieve the stream function.

You can see this FDW from the database.

pipeline=# \des                  List of foreign servers       Name       |      Owner      | Foreign-data wrapper------------------+-----------------+----------------------Pipeline_streams | Unknown (oid=0) | STREAM_FDW (1 row)

Data flow into

You can see that the data flow is implemented through ZEROMQ (before the previous version 0.8.2 is implemented by Tuplebuff)

The data is inserted into the stream and then called Foriegninsert, inserted into the initialized IPC, and there is a PIPELINE/ZMQ under the database directory.

Transform actually is the data dest point to the stream, the database default has a Pipeline_stream_insert actually this is a trigger, the tuple is thrown into the target stream inside.

Or you can write your own UDF, that is, write a trigger, the data can be written to the table or other FDW inside, or your own package of Message Queuing IPC is not a problem, the free play of the space is relatively large.

First, let's create a stream and a CV.

pipeline=# Create stream My_stream (x bigint,y bigint,z bigint); Create streampipeline=# Create continuous View v_1 as Select X, Y, z from My_stream; CREATE Continuous viewpipeline=#

Insert a piece of data:

pipeline=# INSERT into My_stream (x, y, z) values, insert 0 1pipeline=# select * from V_1; x | y | Z---+---+---1 | 2 | 3 (1 row) pipeline=#

The data is inserted into the CV, so let's take a look at how pipelinedb is inserted.

There's an introduction to stream, which is a FDW. Let's take a look at his handler (SOURCE:SRC/BACKEND/PIPELINE/STREAM_FDW.C)

/* * Stream_fdw_handler */datumstream_fdw_handler (Pg_function_args) {Fdwroutine *routine = MakeNode (FdwRoutine);/* Stream selects (used by continuous query procs) */routine->getforeignrelsize = getstreamsize;routine-> getforeignpaths = Getstreampaths;routine->getforeignplan = Getstreamscanplan;routine->beginforeignscan = Beginstreamscan;routine->iterateforeignscan = Iteratestreamscan;routine->rescanforeignscan = Rescanstreamscan;routine->endforeignscan = endstreamscan;/* Streams inserts */routine->planforeignmodify = Planstreammodify;routine->beginforeignmodify = Beginstreammodify;routine->execforeigninsert = Execstreaminsert;routine->endforeignmodify = endstreammodify; Routine->explainforeignscan = Null;routine->explainforeignmodify = NULL; Pg_return_pointer (routine);}

The main concern is streams inserts these functions.

Each worker process starts with a recv_id initialized, in fact this is the ID of ZEROMQ

The data is sent to the corresponding queue, and the worker process goes to the IPC to get the data.

Source:src/backend/pipeline/ipc/microbath.c

Voidmicrobatch_send_to_worker (microbatch_t *mb, int worker_id) {

The first is to get worker_id, which is a worker process that is randomly acquired. Stream data is randomly sent to a worker process.

RECV_ID This is the ID obtained from the initialized IPC queue, and the data is sent to the queue.

Finally, call

Pzmq_send (recv_id, buf, Len, True)

The data is pushed to the IPC.

(gdb) precv_id$12 = 1404688165 (GDB)

This is part of the data producer.

Here is the data consumer CV

Data acceptance or accepted by ZMQ API

This is mainly worker process to work.

Srouce:src/backend/pipeline/ipc/pzmq.c&reader.c

(GDB) P *zmq_state->me$8 = {id = 1404688165, type = 7 ' \a ', sock = 0x1139ba0, addr = "Ipc:///home/pipeline/db_0.9.7/pip Eline/zmq/1404688165.sock ", ' \000 ' <repeats 965 Times>} (GDB)

You can see that this data is obtained from 1404688165, and the IPC addr is also given out, this is my database directory

Get to be a buf, then unpack, get the corresponding tuple from the message.

After the tuple is obtained, all CVS are searched for the target associated with the stream. Traverse them, and then execute the corresponding SQL in the CV.

The execution process is roughly the same as the standard SQL initialization execution plan and then Executeplan then Endplan.

The data will go into the combiner, and if it is agg there will be a series of operations.

If the data conforms to the CV's SQL logic, then the data is inserted into the corresponding physical table.

This is a simple working principle of stream.

Thank you

Talking about Pipelinedb Series one: How the stream data is written to the continuous view

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More