Talking about Pipelinedb Series one: How the stream data is written to the continuous view

Source: Internet
Author: User
Tags postgresql version

Pipelinedb version:0.9.7

PostgreSQL version:9.5.3

Data processing components for Pipelinedb:

From the point of view is mainly pipeline_streams,stream_fdw,continuous view,transform.

In fact is the use of Postgres FDW function to achieve the stream function.

You can see this FDW from the database.

pipeline=# \des                  List of foreign servers       Name       |      Owner      | Foreign-data wrapper------------------+-----------------+----------------------Pipeline_streams | Unknown (oid=0) | STREAM_FDW (1 row)

Data flow into

You can see that the data flow is implemented through ZEROMQ (before the previous version 0.8.2 is implemented by Tuplebuff)

The data is inserted into the stream and then called Foriegninsert, inserted into the initialized IPC, and there is a PIPELINE/ZMQ under the database directory.

Transform actually is the data dest point to the stream, the database default has a Pipeline_stream_insert actually this is a trigger, the tuple is thrown into the target stream inside.

Or you can write your own UDF, that is, write a trigger, the data can be written to the table or other FDW inside, or your own package of Message Queuing IPC is not a problem, the free play of the space is relatively large.

First, let's create a stream and a CV.

pipeline=# Create stream My_stream (x bigint,y bigint,z bigint); Create streampipeline=# Create continuous View v_1 as Select X, Y, z from My_stream; CREATE Continuous viewpipeline=#

Insert a piece of data:

pipeline=# INSERT into My_stream (x, y, z) values, insert 0 1pipeline=# select * from V_1; x | y | Z---+---+---1 | 2 | 3 (1 row) pipeline=#

The data is inserted into the CV, so let's take a look at how pipelinedb is inserted.

There's an introduction to stream, which is a FDW. Let's take a look at his handler (SOURCE:SRC/BACKEND/PIPELINE/STREAM_FDW.C)

/* * Stream_fdw_handler */datumstream_fdw_handler (Pg_function_args) {Fdwroutine *routine = MakeNode (FdwRoutine);/* Stream selects (used by continuous query procs) */routine->getforeignrelsize = getstreamsize;routine-> getforeignpaths = Getstreampaths;routine->getforeignplan = Getstreamscanplan;routine->beginforeignscan = Beginstreamscan;routine->iterateforeignscan = Iteratestreamscan;routine->rescanforeignscan = Rescanstreamscan;routine->endforeignscan = endstreamscan;/* Streams inserts */routine->planforeignmodify = Planstreammodify;routine->beginforeignmodify = Beginstreammodify;routine->execforeigninsert = Execstreaminsert;routine->endforeignmodify = endstreammodify; Routine->explainforeignscan = Null;routine->explainforeignmodify = NULL; Pg_return_pointer (routine);}

The main concern is streams inserts these functions.

Each worker process starts with a recv_id initialized, in fact this is the ID of ZEROMQ

The data is sent to the corresponding queue, and the worker process goes to the IPC to get the data.

Source:src/backend/pipeline/ipc/microbath.c

Voidmicrobatch_send_to_worker (microbatch_t *mb, int worker_id) {    

The first is to get worker_id, which is a worker process that is randomly acquired. Stream data is randomly sent to a worker process.

RECV_ID This is the ID obtained from the initialized IPC queue, and the data is sent to the queue.

Finally, call

Pzmq_send (recv_id, buf, Len, True)

The data is pushed to the IPC.

(gdb) precv_id$12 = 1404688165 (GDB)

This is part of the data producer.

Here is the data consumer CV

Data acceptance or accepted by ZMQ API

This is mainly worker process to work.

Srouce:src/backend/pipeline/ipc/pzmq.c&reader.c

(GDB) P *zmq_state->me$8 = {id = 1404688165, type = 7 ' \a ', sock = 0x1139ba0, addr = "Ipc:///home/pipeline/db_0.9.7/pip Eline/zmq/1404688165.sock ", ' \000 ' <repeats 965 Times>} (GDB)

You can see that this data is obtained from 1404688165, and the IPC addr is also given out, this is my database directory

Get to be a buf, then unpack, get the corresponding tuple from the message.

After the tuple is obtained, all CVS are searched for the target associated with the stream. Traverse them, and then execute the corresponding SQL in the CV.

The execution process is roughly the same as the standard SQL initialization execution plan and then Executeplan then Endplan.

The data will go into the combiner, and if it is agg there will be a series of operations.

If the data conforms to the CV's SQL logic, then the data is inserted into the corresponding physical table.

This is a simple working principle of stream.

Thank you

Talking about Pipelinedb Series one: How the stream data is written to the continuous view

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.