Introduction to the lateral type _ database in PostgreSQL other

Source: Internet
Author: User
Tags postgresql time limit

PostgreSQL 9.3 uses a new type of union! The launch of lateral is low-key, but it implements the powerful new queries that need to be written before the program can be used. In this article, I will introduce a channel conversion analysis that is unlikely to be implemented in PostgreSQL 9.2.
What is lateral union?

The best description for this is the bottom of the optional from statement list in the document:

Lateral keywords can be prefixed with a SELECT from subkey. This allows the SELECT subkey to be referenced to the columns in the from item before the from item appears. (without lateral, each SELECT subkey is independent of each other, so it is not possible to cross-reference other from items.)
...
When a from item contains a lateral cross-reference, the query is evaluated as follows: For each row supplied to a cross-reference column by a from, or from a collection of rows that are supplied to a reference column, the lateral item is evaluated using the column value of the row or row's collection. The computed result set is added to the federated query as usual. This procedure is repeated on the row or collection of rows in the source table of the column.

This calculation is a bit denser. You can loosely associate lateral with a SQL foreach selection, in which PostgreSQL loops through each row in a result set and executes the subquery as a parameter.

What can we do with this?

Take a look at the following table structure for recording click events:

CREATE TABLE Event (
  user_id BIGINT,
  event_id BIGINT, time
  BIGINT not NULL,
  data JSON not NULL,
  PRIMA RY KEY (user_id, event_id)
)

Each event is associated with a user, with an ID, a timestamp, and a JSON blob with event attributes. In the heap, these properties may contain information such as the DOM level of a click, the caption of the window, the session reference, and so on.

Join us to optimize our login page to increase registration. The first step is to calculate what channel we are losing users on.

Example: A channel conversion rate between steps of a registration process.


Let's say we've got the device on the front end to log the event logs along this process, and all the data will be saved to the above event datasheet. [1] The first problem was that we had to calculate how many people viewed our home page, and how many of them entered validation information within two weeks of the view of the home page. If we use the older version of PostgreSQL, we may need to write some custom functions using the Pl/pgsql PostgreSQL built-in process language. And in 9.3, we can use a lateral union, with only a funny query can calculate the results, do not need any extension or pl/pgsql.


SELECT
user_id,
view_homepage,
view_homepage_time,
enter_credit_card,
enter_credit_card_ Time from
(
--"Get" the "the" the "viewed" the homepage.
SELECT
user_id,
1 as View_homepage,
min (time) as View_homepage_time from
event
WHERE
Data->> ' type ' = ' view_homepage '
GROUP by user_id
) E1 left JOIN lateral (
--for each row, get the fi RST time the user_id did the Enter_credit_card
-event, if one exists within two weeks of View_homepage_time.
SELECT
1 as Enter_credit_card, time as
Enter_credit_card_time from
event
WHERE
user_id = e1.user_id and
data->> ' type ' = ' enter_credit_card ' and time
BETWEEN View_homepage_time and (view_ Homepage_time + 1000*60*60*24*14) Order by Time
LIMIT 1
) E2 on True

No one is going to like more than 30 rows of SQL queries, so let's parse the SQL into fragments. The first block is a normal SQL:

SELECT
  user_id,
  1 as View_homepage,
  min (time) as View_homepage_time from
event
WHERE
  data- >> ' type ' = ' view_homepage '
GROUP by user_id

That is to obtain the time at which the View_homepage event was first triggered by each user. Then our lateral union allows us to iterate over each row of the result set and perform a parameterized subquery next. This is equivalent to executing one side of the query for each row of the result set:

SELECT
  1 as Enter_credit_card, time as
  Enter_credit_card_time from
event
WHERE
  user_id = e1.user_id and
  data->> ' type ' = ' enter_credit_card ' and time
  BETWEEN View_homepage_time and (view_ Homepage_time + 1000*60*60*24*14) Order by Time
LIMIT 1

For example, for each user, get the time that they triggered the Enter_credit_card event within two weeks of triggering the View_homepage_time event. Since this is a lateral union, our subquery can be referenced from the previous subquery to the view_homepage_time result set. Otherwise, the subquery can be executed alone, without access to the result set computed by another subquery.

After that, we wrap it all into a select, and it returns something like this:

user_id | View_homepage | View_homepage_time | Enter_credit_card | Enter_credit_card_time
---------+---------------+--------------------+-------------------+---------------- --------
567 | 1 | 5234567890 | 1 | 5839367890
234 | 1 | 2234567890 | |
345 | 1 | 3234567890 | |
456 | 1 | 4234567890 | |
678 | 1 | 6234567890 | |
123 | 1 | 1234567890 | |
... 


Since this is a left union, there will be rows in the query result set that do not match the Enter_credit_card event, as long as there is a view_homepage event. If we aggregate all the numeric columns, we get a clear summary of channel conversions:

SELECT
  sum (view_homepage) as Viewed_homepage,
  sum (enter_credit_card) as Entered_credit_card from
(
  --Get the "the" "Viewed" homepage.
  SELECT
  user_id,
  1 as View_homepage,
  min (time) as View_homepage_time from
  event
  WHERE
  Data->> ' type ' = ' view_homepage '
  GROUP by user_id
) E1 the left JOIN lateral (
  --For each (user_id, view_ homepage_time) tuple, get the "the" "the"
  --user did the Enter_credit_card event, if one exists within two .
  SELECT
  1 as Enter_credit_card, time as
  Enter_credit_card_time from
  event
  WHERE
  user_id = e1.user_id and
  data->> ' type ' = ' enter_credit_card ' and time
  BETWEEN View_homepage_time and (view_ Homepage_time + 1000*60*60*24*14) Order by Time
  LIMIT 1
) E2 on True

... It will output:

 Viewed_homepage | Entered_credit_card
-----------------+---------------------
827 | 10


We can fill in the middle steps with more lateral together in this channel to get the part of the process that we need to focus on. Let's add a query to use the sample steps between viewing the home page and entering validation information.
&NBSP

SELECT sum (view_homepage) as Viewed_homepage, sum (Use_demo) as Use_demo, sum (enter_credit_card) as Entered_credit_c
  ARD from (--the "get" the "the" the "the" the "the" the "viewed" homepage SELECT user_id, 1 as View_homepage, min (time) as View_homepage_time from event WHERE data->> ' type ' = ' VI Ew_homepage ' GROUP by user_id "E1 left JOIN lateral (--for each row, get the" the "the USER_ID did the Use_dem
  O-Event, if one exists within one week of View_homepage_time. SELECT user_id, 1 as Use_demo, time as Use_demo_time from event WHERE user_id = e1.user_id and data->> ' Type ' = ' Use_demo ' and Time BETWEEN view_homepage_time and (View_homepage_time + 1000*60*60*24*7) Order by Time LIM IT 1 E2 on true left JOIN lateral (--for each row, get the "the" the USER_ID did the Enter_credit_card--eve
  NT, if one exists within one week of Use_demo_time. SELECT 1 as Enter_credit_card, time as enter_credit_card_time from evenT WHERE user_id = e2.user_id and data->> ' type ' = ' enter_credit_card ' and time BETWEEN use_demo_time and
 _demo_time + 1000*60*60*24*7) Order by Time LIMIT 1) E3 on True

This will output:

 Viewed_homepage | Use_demo | Entered_credit_card
-----------------+----------+---------------------
827 | 220 | 86 


From the View home page to a week to use the demo, and then to a week to enter credit card information, which gives us a three-step channel conversion. From then on, the powerful PostgreSQL allows us to drill down into these data sets and analyze the performance of our web site as a whole. Then we may have the following problems to solve:

    • Can you increase the likelihood of registering by using demo?
    • Does the user who finds our home page through advertising have the same conversion rate as a user from another channel?
    • What happens to the conversion rate following a different A/b test variable?

The answers to these questions directly affect product improvements that can be found in the PostgreSQL database because it now supports lateral syndication.


Without lateral union, we can only use Pl/pgsql to do these analyses. Or, if our dataset is small, we might not be able to touch these complex, inefficient queries. In an exploratory data research scenario, you might just extract the data from the PostgreSQL and analyze it using the scripting language of your choice. But there are still more powerful reasons to use SQL to express these issues, especially if you're trying to encapsulate the entire package into a set of Easy-to-understand UIs and publish features to non tech users.

Note that these queries can be optimized to become more efficient. In this case, if we create a btree index (user_id, (data->> ' type '), time), we can compute each channel step for each user with only one index lookup. If you're using SSD, it's a little expensive to do a lookup on it, and that's enough. And if not, you might want to use a slightly different approach to chart your data, and I'll leave it to another article to introduce.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.