Pipelinedb version:0.9.7
PostgreSQL version:9.5.3
Data processing components for Pipelinedb:
From the point of view is mainly pipeline_streams,stream_fdw,continuous view,transform.
In fact is the use of Postgres FDW function to achieve the stream function.
You can see this FDW from the database.
pipeline=# \des List of foreign servers Name | Owner | Foreign-data wrapper------------------+-----------------+----------------------Pipeline_streams | Unknown (oid=0) | STREAM_FDW (1 row)
Data flow into
You can see that the data flow is implemented through ZEROMQ (before the previous version 0.8.2 is implemented by Tuplebuff)
The data is inserted into the stream and then called Foriegninsert, inserted into the initialized IPC, and there is a PIPELINE/ZMQ under the database directory.
Transform actually is the data dest point to the stream, the database default has a Pipeline_stream_insert actually this is a trigger, the tuple is thrown into the target stream inside.
Or you can write your own UDF, that is, write a trigger, the data can be written to the table or other FDW inside, or your own package of Message Queuing IPC is not a problem, the free play of the space is relatively large.
First, let's create a stream and a CV.
pipeline=# Create stream My_stream (x bigint,y bigint,z bigint); Create streampipeline=# Create continuous View v_1 as Select X, Y, z from My_stream; CREATE Continuous viewpipeline=#
Insert a piece of data:
pipeline=# INSERT into My_stream (x, y, z) values, insert 0 1pipeline=# select * from V_1; x | y | Z---+---+---1 | 2 | 3 (1 row) pipeline=#
The data is inserted into the CV, so let's take a look at how pipelinedb is inserted.
There's an introduction to stream, which is a FDW. Let's take a look at his handler (SOURCE:SRC/BACKEND/PIPELINE/STREAM_FDW.C)
/* * Stream_fdw_handler */datumstream_fdw_handler (Pg_function_args) {Fdwroutine *routine = MakeNode (FdwRoutine);/* Stream selects (used by continuous query procs) */routine->getforeignrelsize = getstreamsize;routine-> getforeignpaths = Getstreampaths;routine->getforeignplan = Getstreamscanplan;routine->beginforeignscan = Beginstreamscan;routine->iterateforeignscan = Iteratestreamscan;routine->rescanforeignscan = Rescanstreamscan;routine->endforeignscan = endstreamscan;/* Streams inserts */routine->planforeignmodify = Planstreammodify;routine->beginforeignmodify = Beginstreammodify;routine->execforeigninsert = Execstreaminsert;routine->endforeignmodify = endstreammodify; Routine->explainforeignscan = Null;routine->explainforeignmodify = NULL; Pg_return_pointer (routine);}
The main concern is streams inserts these functions.
Each worker process starts with a recv_id initialized, in fact this is the ID of ZEROMQ
The data is sent to the corresponding queue, and the worker process goes to the IPC to get the data.
Source:src/backend/pipeline/ipc/microbath.c
Voidmicrobatch_send_to_worker (microbatch_t *mb, int worker_id) {
The first is to get worker_id, which is a worker process that is randomly acquired. Stream data is randomly sent to a worker process.
RECV_ID This is the ID obtained from the initialized IPC queue, and the data is sent to the queue.
Finally, call
Pzmq_send (recv_id, buf, Len, True)
The data is pushed to the IPC.
(gdb) precv_id$12 = 1404688165 (GDB)
This is part of the data producer.
Here is the data consumer CV
Data acceptance or accepted by ZMQ API
This is mainly worker process to work.
Srouce:src/backend/pipeline/ipc/pzmq.c&reader.c
(GDB) P *zmq_state->me$8 = {id = 1404688165, type = 7 ' \a ', sock = 0x1139ba0, addr = "Ipc:///home/pipeline/db_0.9.7/pip Eline/zmq/1404688165.sock ", ' \000 ' <repeats 965 Times>} (GDB)
You can see that this data is obtained from 1404688165, and the IPC addr is also given out, this is my database directory
Get to be a buf, then unpack, get the corresponding tuple from the message.
After the tuple is obtained, all CVS are searched for the target associated with the stream. Traverse them, and then execute the corresponding SQL in the CV.
The execution process is roughly the same as the standard SQL initialization execution plan and then Executeplan then Endplan.
The data will go into the combiner, and if it is agg there will be a series of operations.
If the data conforms to the CV's SQL logic, then the data is inserted into the corresponding physical table.
This is a simple working principle of stream.
Thank you
Talking about Pipelinedb Series one: How the stream data is written to the continuous view