Describes the Lateral type in PostgreSQL, postgresqllateral

Source: Internet
Author: User

Describes the Lateral type in PostgreSQL, postgresqllateral

PostgreSQL 9.3 uses a new union type! Lateral is relatively low-key, but it implements a powerful new query that can be obtained only by programming. in this article, I will introduce a channel conversion analysis that cannot be implemented in PostgreSQL 9.2.
What is LATERAL combination?

The best description of this is at the bottom of the optional FROM Statement List in the document:

LATERAL keywords can be prefixed with a select from subitem. this allows the SELECT subitem to reference columns in the FROM item before the FROM item appears. (without LATERAL, each SELECT subitem is independent of each other, so it cannot cross-reference other FROM items .)
...
When a FROM item contains LATERAL cross reference, the query calculation process is as follows: for each row provided by the FROM image to the cross reference column, or multiple FROM items are provided to the row set of the referenced column. LATERAL items use the column values of the row or row set for calculation. the calculated result set is added to the Union query as usual. this process will be repeated in the row or row set of the source table of the column.

This computation is a bit intensive. You can loosely interpret LATERAL as a foreach for an SQL statement. In this loop, PostgreSQL loops every row in a result set, the row is used as a parameter to calculate a subquery.

What can we do with this?

Take a look at the following table structure to record click events:
 

CREATE TABLE event (  user_id BIGINT,  event_id BIGINT,  time BIGINT NOT NULL,  data JSON NOT NULL,  PRIMARY KEY (user_id, event_id))

Each event is associated with a user with an ID, a timestamp, and a JSON blob with event attributes. in the heap, these attributes may contain the DOM level, window title, session reference, and other information for one click.

Join us to optimize our login page to increase registration. The first step is to calculate to see which channel conversion is losing users.

Example: The channel conversion rate between steps of a registration process.


Assume that we have a front-end device to record event logs along this process, and all the data will be stored in the above event data table. [1] The first problem was that we had to calculate how many people had viewed our homepage, and of them had entered verification information within two weeks after the page was viewed. if we use older versions of PostgreSQL, we may need to use PL/pgSQL, a PostgreSQL built-in process language, to compile some customized functions. in 9.3, we can use a lateral combination to calculate the results with only one funny query, without any extension or PL/pgSQL.

 

SELECTuser_id,view_homepage,view_homepage_time,enter_credit_card,enter_credit_card_timeFROM (-- Get the first time each user viewed the homepage.SELECTuser_id,1 AS view_homepage,min(time) AS view_homepage_timeFROM eventWHEREdata->>'type' = 'view_homepage'GROUP BY user_id) e1 LEFT JOIN LATERAL (-- For each row, get the first time the user_id did the enter_credit_card-- event, if one exists within two weeks of view_homepage_time.SELECT1 AS enter_credit_card,time AS enter_credit_card_timeFROM eventWHEREuser_id = e1.user_id ANDdata->>'type' = 'enter_credit_card' ANDtime BETWEEN view_homepage_time AND (view_homepage_time + 1000*60*60*24*14)ORDER BY timeLIMIT 1) e2 ON true

No one will like SQL queries over 30 rows, so let's split these SQL statements into segments for analysis. The first part is a common SQL statement:
 

SELECT  user_id,  1 AS view_homepage,  min(time) AS view_homepage_timeFROM eventWHERE  data->>'type' = 'view_homepage'GROUP BY user_id

That is to obtain the time when each user triggers the view_homepage event. then our lateral Union allows us to iterate each row of the result set and execute a parameterized subquery in the next step. this is equivalent to executing the following query for each row of the result set:
 

SELECT  1 AS enter_credit_card,  time AS enter_credit_card_timeFROM eventWHERE  user_id = e1.user_id AND  data->>'type' = 'enter_credit_card' AND  time BETWEEN view_homepage_time AND (view_homepage_time + 1000*60*60*24*14)ORDER BY timeLIMIT 1

For example, each user needs to obtain the time when they trigger the enter_credit_card event within two weeks after the view_homepage_time event is triggered. because this is a lateral combination, our subquery can reference the view_homepage_time result set from the previous subquery. otherwise, the subquery can only be executed independently, but cannot access the result set calculated by another subquery.

Then we encapsulate it into a select statement, which returns something like the following:

user_id | view_homepage | view_homepage_time | enter_credit_card | enter_credit_card_time---------+---------------+--------------------+-------------------+------------------------567 | 1 | 5234567890 | 1 | 5839367890234 | 1 | 2234567890 | |345 | 1 | 3234567890 | |456 | 1 | 4234567890 | |678 | 1 | 6234567890 | |123 | 1 | 1234567890 | |... 


Because this is a left union, there will be rows in the query result set that do not match the enter_credit_card event, as long as there is a view_homepage event. if we summarize all the numeric columns, we will get a clear summary of channel conversion:
 

SELECT  sum(view_homepage) AS viewed_homepage,  sum(enter_credit_card) AS entered_credit_cardFROM (  -- Get the first time each user viewed the homepage.  SELECT  user_id,  1 AS view_homepage,  min(time) AS view_homepage_time  FROM event  WHERE  data->>'type' = 'view_homepage'  GROUP BY user_id) e1 LEFT JOIN LATERAL (  -- For each (user_id, view_homepage_time) tuple, get the first time that  -- user did the enter_credit_card event, if one exists within two weeks.  SELECT  1 AS enter_credit_card,  time AS enter_credit_card_time  FROM event  WHERE  user_id = e1.user_id AND  data->>'type' = 'enter_credit_card' AND  time BETWEEN view_homepage_time AND (view_homepage_time + 1000*60*60*24*14)  ORDER BY time  LIMIT 1) e2 ON true

... It will output:

 viewed_homepage | entered_credit_card-----------------+---------------------827 | 10


We can enter more intermediate lateral join steps in this channel to get the key improvements in the process. let's add a query of the sample steps between the home page and the input verification information.
 

SELECT  sum(view_homepage) AS viewed_homepage,  sum(use_demo) AS use_demo,  sum(enter_credit_card) AS entered_credit_cardFROM (  -- Get the first time each user viewed the homepage.  SELECT  user_id,  1 AS view_homepage,  min(time) AS view_homepage_time  FROM event  WHERE  data->>'type' = 'view_homepage'  GROUP BY user_id) e1 LEFT JOIN LATERAL (  -- For each row, get the first time the user_id did the use_demo  -- event, if one exists within one week of view_homepage_time.  SELECT  user_id,  1 AS use_demo,  time AS use_demo_time  FROM event  WHERE  user_id = e1.user_id AND  data->>'type' = 'use_demo' AND  time BETWEEN view_homepage_time AND (view_homepage_time + 1000*60*60*24*7)  ORDER BY time  LIMIT 1) e2 ON true LEFT JOIN LATERAL (  -- For each row, get the first time the user_id did the enter_credit_card  -- event, if one exists within one week of use_demo_time.  SELECT  1 AS enter_credit_card,  time AS enter_credit_card_time  FROM event  WHERE  user_id = e2.user_id AND  data->>'type' = 'enter_credit_card' AND  time BETWEEN use_demo_time AND (use_demo_time + 1000*60*60*24*7)  ORDER BY time  LIMIT 1) e3 ON true

The output is as follows:

 viewed_homepage | use_demo | entered_credit_card-----------------+----------+---------------------827 | 220 | 86 


From viewing the homepage to using the demo within one week, to entering credit card information to it within one week, this provides us with three steps of channel conversion. since then, the powerful PostgreSQL allows us to deeply analyze these data result sets and analyze the overall performance of our website. then we may have the following problems to solve:

  • Can I use the demo to increase the possibility of registration?
  • Do users who find our homepage through advertisements share the same conversion rate as those from other channels?
  • What changes will happen to the conversion rate following different A/B Test variables?

The answers to these questions directly affect product improvement. They can be found in the PostgreSQL database because it now supports lateral union.


Without lateral association, we can only use PL/pgSQL for these analyses. Or, if our dataset is small, we may not touch these complicated and inefficient queries. in an exploratory data research application scenario, you may just extract data from PostgreSQL and use the script language you selected for analysis. However, there are still more powerful reasons to express these problems using SQL, especially if you want to encapsulate the entire interface into a set of easy-to-understand UIS, and when publishing features to non-technical users.

Note that these queries can be optimized to make them more efficient. in this example, if we create a btree index on (user_id, (data-> 'type'), time, we can use only one index search to calculate every channel step for every user. if you are using an SSD and the search cost is very small, it is enough. If not, you may need to use a slightly different method to visualize your data. I will leave the detailed content to another article for introduction.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.