AD Engine resolution

Last Update:2016-07-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Advertising engine
- Overall design
- Search Service
  - Advertising search Process
  - Advertising orientation
    - There are few options under the directional dimension, which can be enumerated, such as orientation including, gender, age, network, System.
    - Our system is now dealing with this kind of directional
    - Directional dimension has cascade relationship, province, city, district
    - The directional dimension is oriented by n km near a coordinate
  - The retrieval service is a copy of the database
  - CTR Calculation
  - Calculate a second price
  - ADX ads
- Exposure Services
- Billing Services
  - Billing Service Master Preparation
  - Floating point problem

Overall design of advertising engine

Our basic architecture is the client request API, which then sends RPC requests by the API to our services, which are managed through the registry
Retrieval Service , indexed by AD in the database, subscribing to Redis Channel to align in-memory ads with the database by notification mechanism
Exposure Service , receive exposure and click, aggregate results are pushed to message queue after aggregation
Billing Services , billing services and payment systems to interact, mainly responsible for the deduction of advertisers money, the initiation of insufficient balance, the budget of the Downline

The following services are introduced separately

Search service AD Retrieval process

Parameter parsing, mainly to filter out some illegal requests
Start the ADX ad request, this step first to which DSP to send a request to a simple orientation, and then through the Flow control module, confirm that can send the request, will send an HTTP request asynchronously, and then immediately return, continue the following logic
Ad targeting, start screening your ads according to orientation, get all eligible ad IDs
Ad filtering, filtering out non-qualifying ad IDs based on some logical rules
CTR is calculated and sorted by ECPM, which calculates the click-through rate for all footage of all ads, and finally sorts by ecpm
Select ad ads and calculate a second price
With the ADX ad auction, before sending an ad request asynchronously, there is a need to wait for each request to return the result
Format the ad content and format the client's protocol according to different types of ad formats
Ad exposure Click parameter encryption, to encrypt some information, especially the exposure price and click price of the advertisement

Advertising orientation

Issues that need to be addressed

A user has some attributes, such as gender, age, network environment, device type, geographic information, and so on, and advertisers want to put their own ads in a particular crowd, according to the user's attributes, retrieve the available advertising process to complete the ad-oriented

For example, advertising 2 does not meet the orientation conditions, advertising 3, advertising 1 meet

There are few options under the directional dimension, which can be enumerated, such as orientation including, gender, age, network, System.

Ad-Directed table

Ad ID Sex Age Segment Network Operating System

AD 1 Unlimited Unlimited Unlimited Unlimited

Ad 2 Woman 0-18 Unlimited Unlimited

Ad 3 Man 0-18 Unlimited Ios

Ad ID	Sex	Age Segment	Network	Operating System
AD 1	Unlimited	Unlimited	Unlimited	Unlimited
Ad 2	Woman	0-18	Unlimited	Unlimited
Ad 3	Man	0-18	Unlimited	Ios

According to the above-directed table, it looks a bit like a database, assuming the figure of the small people's traffic to the database, the query statement, should be

WHERE (gender = male or female = unlimited)
and (age =0-18 or age = unlimited)
and (network =wifi or network = unlimited)
and (operating system =ios or operating system = unlimited)

If there is no "unlimited", then it seems that the combined index is the best choice, we combine the gender-age-network-operating system into an index so that the index space is

2 Sex x 5 segment Age x 2 Kinds of network environment x 2 operating Systems = 40 possible, each of which may correspond to a list of IDs, in order to use the combined index, the or statement must be removed, can be unlimited ad redundancy written to all the index, such as the most extreme example is AD 1, all are not limited to, Then all 40 possible combinations have AD 1 redundancy
Our system is the first to retrieve, the advantage is that only need to query an index to take out the targeted ads, the bad place is also obvious is to increase the index data redundancy

Where statements only look at gender one dimension, or left and right sides are equivalent queries can definitely use the index to find two sets and then fetch the set, and then take the intersection with other dimensions

In our program, we also simulate this kind of first-and-second operation, the btree used in the data to organize the index, the hash table used in our program, is the key of the given query, returns an indexed content

Collection Operations
How to draw a union can refer to a combined sort of two ordered arrays
How to take the intersection, there are generally three ways

General method, select small set cooperation for base to do binary search in large set
Large collection intersection small set, small set if can be put into memory can be used to put small set in memory with build hash index, large set of cooperation as base, hash in small set to find
Large collection, let two are ordered after the intersection of m+n complexity, find a set of the smallest, and then in another set skip is smaller than this number of all data

Our system is now dealing with this kind of directional

An inverted hash index is created for each dimension in memory, and then we do not have to do any redundancy in a dimension, so that there is no need to do a set, each dimension uses the least data dimension data as the driver table, and then hash detection in the results of the other sets, and if not present, remove the ad.

Note that our orientation only makes one adid memory copy to the context of a certain traffic retrieval, and the copy is the smallest set, and the other thing is to do constant hash detection, to orient the collection to other dimensions and narrow it down.

The above-mentioned orientation method mainly uses inverted index, there are some techniques for intersection, inverted index in the processing of the equivalence query and dynamic multi-dimensional combination of time is very suitable, but in the processing range of the query is not very good, such as our age if we support the orientation of any age range, that is, to deal with the scope of the query, Some orderly structures like the balance tree would be a better choice.

Directional dimension has cascade relationship, province, city, district

The ads in the regional orientation conditions are as follows

Ad Area Orientation Table

Ad ID Province City Area

AD 1 Unlimited Unlimited Unlimited

Ad 2 Beijing Beijing Chao yang

Ad 3 Shanghai Shanghai City Unlimited

Ad ID	Province	City	Area
AD 1	Unlimited	Unlimited	Unlimited
Ad 2	Beijing	Beijing	Chao yang
Ad 3	Shanghai	Shanghai City	Unlimited

Suppose a user in Beijing, downtown Beijing, Chaoyang to search for ads, SQL is as follows:

where (province = unlimited and city = unlimited and area = unlimited)
or (province = Beijing and City = Downtown and district = unlimited)
or (province = Beijing and City = Downtown and district = Chaoyang)

Before we said that you can not limit this situation to all data items, so that avoid or operation, the province of the city this can not be redundant for two reasons

One is not sure where redundancy is, because the provincial option may not be fixed
There is redundant if the data in the direction of the provinces and cities are not limited to the amount of redundancy is too large, to the country every county and county are redundant once

Observe that the SQL is first and then or, before the analysis of multiple and suitable for the combined index, so if we can take the province-city-District as the value of an index query, the equivalent of 3 times we check the combined index, and then take the set, the resulting collection also need to do with other sets of intersection, And because the set is dynamically changing because it is or, we have to copy the list of ad IDs and get a temporary table.

How to optimize temporary tables, I think of two methods, but there is no

Fixed query criteria cache for temporary table cache or later temporary tables
According to the set Operation law a intersection (b and c) = (a intersection b) and (a intersection c), so that the last operation of this dimension can be guaranteed to a relatively small, b,c is always constant, do hash filtering can be.

The directional dimension is oriented by n km near a coordinate

First the coordinates can be turned into Geohash, and then n kilometers can be redirected and then calculated using a filtered method
There are a few points to note:

Geohash precision, the smallest bit should be able to cover the nearby n km, about 4 bits of Geohash can cover the 20km,3 bit can cover 78KM
Geohash because you can only locate a approximate, so you need to put the ads in the vicinity of the 8 grid redundancy to write the ad ID, so the query only need to check a lattice, or you need to retrieve 9 grid
In general, this approach is simply to make an equivalent query in the index where GEO=ABCD do a first sieve and then filter

filtering is not necessarily slow, in the database with the index is slower, the case is most suitable for filtering

The retrieval service is a copy of the database

Our search service will load the full amount of ads in the database when it is launched, build the positive and inverted index data of the advertisement, and maintain the consistency with the database through the message notification of multiple copies.
Get messages by subscribing to the channel of Redis, which includes ads for downline advertisers, sync promotion plans, sync ads, sync footage, reload data all at once
When the data is reloaded, the online service will continue to run, so we use a reference substitution, in order to ensure that full load is sufficient memory, memory can only use 1/2
The message mechanism may be unreliable , and every hour the retrieval service synchronizes with the full amount of the database again

When the memory is not enough to load all the ads are inverted, we should consider the two sides, one is the compressed storage to load the necessary information, positive row of data compression storage, the other is the data partition, such as by the region partition, but also need to consider some data skew problem, We haven't hit the average of memory yet.

CTR Calculation

Each of the creatives has a CTR, so the CTR calculation is very large, our CTR calculation takes the async way, when querying the CTR cache of an ad, it returns to the default CTR, and then asynchronously computes the CTR of the footage to populate the cache

Calculate a second price

How to calculate CPC ads

First.clickprice = second.ecpm/first.quality/first.ctr + 0.01 * 1000

CPM Ads

First.displayprice = second.ecpm/first.quality 0.01

ADX ads

Facing problems

The ADX sends the advertisement request is to the outside network, and the request quantity is large, returns within 100MS, a certain DSP time-out cannot have the influence to the whole advertisement retrieval

We control these issues in the following ways

Flow control, with QPS control for each DSP
Send requests asynchronously using NIO
Long connection, try to use HTTP1.1 's keepalive feature, do not allow third parties to use HTTPS's spot link
Maximum number of long connections established with each DSP
The timeout rate is greater than 40% binary decrease the traffic until the QPS minimum is set, when the request success rate is greater than 40% multiplied by the flow until the current this DSP's QPS value is reached
Control over timeout, self encapsulated dspfuture use Hashwheeltimer timer to control timeout time

Where the ADX also needs to be strengthened

Link preheating, can be pre-established for each DSP long connection to provide services
The choice of network link, and the network between the DSP problems, you can try to use a different link to a DSP, or priority to use the most economical link

Exposure Services

Receive exposure click to do aggregations in memory and push aggregated information to message queue every minute

Billing Services

Billing services from the Message Queuing consumer exposure service aggregated exposure click, where exposure is consumed every minute, click is real-time consumption.
Exposure consumption is two threads of collaboration, one thread is responsible for pulling data from the message queue and then aggregating the contents of the N exposure service, a thread responsible for consumption.

When a consumer thread is blocked, the billing service has the risk of losing data when the aggregated exposure is stored in memory, so our team billing service controls the rate at which the cancellation of the message queue is pulled, up to a few minutes of data

Billing Service Master Preparation

The billing service is a master and the lease is realized through Redis
A key,value named lock in Redis is the name of the current work node and has an expiration time.
Master service, get (lock) every second, and then determine if it is consistent with your current node name, and if so, use the expire method to renew it.
For each second, try to write the lock key for the name of your node, using the Setex method of Redis, because key already exists, so it will return false

Floating point problem

Because of the billing experience to do some money aspects of the inspection, so involving floating point accuracy problem
Floating-point numbers do subtraction using the Bigdecimal.subtract () method
Floating-point multiplication using the bigdecimal.multiply () method
Floating-point numbers do division using the Bigdecimal.divide () method

AD Engine resolution

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

AD Engine resolution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

AD Engine resolution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support