HTTPS practice for large Web sites (iv)--practices outside the protocol layer

Source: Internet
Author: User
Tags subdomain

1 Preface

There are few articles on the internet about HTTPS, and there is little to share in the practical experience of deploying HTTPS at large Internet sites, and we have a lot of doubts about deploying HTTPS.
This article for you to introduce the practice of Baidu HTTPS and some trade-offs, hoping to give a chance.
This article was first published in the Baidu Operations and maintenance Department official blog

2 Practice work outside the protocol layer 2.1 reasons for full station coverage HTTPS

A lot of just contact with HTTPS will think, I am not as long as the site of the main domain name Exchange HTTPS can? The answer is no.
The purpose of HTTPS is to ensure the security of the transmission process, if only the primary domain name on the HTTPS, but the main domain load of resources, such as JS,CSS, the picture is not on HTTPS, what will happen?
From the effect, do not achieve the purpose of ensuring the security of the website transmission process, because your js,css, pictures still have the possibility of being hijacked, if the content is tampered with/sniffing, then the meaning of HTTPS is lost.
Browsers in the design of such a situation has been considered, there will be a corresponding prompt. The specific implementation relies on the browser, such as the address bar lock mark to change from green to yellow, block this request, or directly pop up a hint that greatly affects the user experience (mainly IE), users will feel bored, confused and worry about security.

Many users see this link as a habitual point "yes", so that non-HTTPS resources are banned from loading. Many non-IE browsers will also prevent the loading of some highly harmful non-HTTPS resources (such as JS). We found that the limitations of mobile browsers are now slightly more lax.
So if this is not done, many of the basic functions of the site can not be used properly.

2.2 Differences of the site

A lot of people just touch https, think is not the deployment certificate, let webserver support HTTPS on the line.
In fact, for different sites, the way HTTPS is deployed and the difficulty there is a great difference. For a large site, let webserver support HTTPS, and webserver on the HTTPS protocol features to do some optimization, in the proportion of the work of the migration, may only account for 20%-40%.
We consider scenarios where HTTPS is deployed in the following scenarios.

2.2.1 Simple personal site

Simple definition: The resource is loaded only from the primary domain of the site or the subdomain of the primary domain.
such as AXYZ's personal blog, the domain name is axyzblog.com. Load JS and pictures under the main domain name.

Such a station to deploy HTTPS, in the case of existing certificates and webserver support, only need to replace the primary domain with HTTPS access, and then modify the resource connection to https://or//.

2.2.2 Complex personal site

Complex definition: A resource needs to be loaded from an external domain name.

This is more troublesome, the primary domain resource is easy to adapt to HTTPS, the resources loaded on the CDN also need CDN service provider to support HTTPS. At present, the major CDN service providers are gradually providing HTTPS support, need to migrate friends can see whether their own CDN provides this ability. Some cdns charge additional HTTPS traffic.

Common scenarios for CDN using HTTPS are:
1. The website owner provides the private key to the CDN and uses HTTP back to the source.
2. CDN uses public domain name, public certificate, so that the domain name of the resource can not be customized. The back source uses HTTP.
3. Only dynamic acceleration is provided, CDN makes TCP proxy and does not cache content.
4. CloudFlare provides a keyless SSL service that can support a site that is unwilling to provide a private key, does not want to use a public domain name and certificate, and needs to use a CDN.

2.2.3 Simple large-scale site

Simple definition: Resources are loaded only from the primary domain of the site, the subdomain of the primary domain, or the self-built/controllable CDN domain name, with few third-party resources. It is relatively easy to deploy HTTPS if the nature of the site itself is so, or if it is willing to transform it into such a type. Google Twitter is a very good example. Advantages: It has been changed to such a site, the replacement of HTTPS is relatively easy. Cons: If you need to retrofit, then a lot of determination, after all, can hardly use a variety of third-party resources.

2.2.4 Complex, large sites with slightly less important access speeds

Complex definition: From the site's non-primary domain, or the third-party site domain name has a large number of third-party resources need to load, more than now some platform classes, or have complex content to show the site.
Access Speed requirements: The user stays long or strong demand, the user to the speed of access to a higher degree. such as portals, videos, online trading classes (such as train ticket shopping mall) website.
Such a site, you can work to promote all the relevant domain name upgrade to support HTTPS. We use an example to illustrate how the changes will result in a site link to change.

The team responsible for the traffic access transformed the controllable Access environment to HTTP and HTTPS support, so that the front-end project worked relatively little. Most of the time, replace the link from http://. In the case where the primary domain is HTTPS, other resources can be automatically loaded from the HTTPS protocol. What about some third-party resources? Generally there are only two options, one for migrating to their own CDN or IDC, and the second force requires the third party itself to support HTTPS.
Facebook example with full-site HTTPS access. Third-party vendors want to launch a game on Facebook. Facebook: Please provide HTTPS access. Third-party think: can make money ah, or provide next HTTPS access it. Therefore, strong enough, attractive, and the partner has the ability to provide HTTPS, it is completely feasible. It won't work if your platform is connected to a few individual developers and doesn't make much money.
Advantages: The front-end changes are relatively simple, not prone to HTTPS and HTTP resource problems.
Disadvantage: Usually this implementation, the user's access speed will be slow, such as from 2.5 seconds to 3 seconds, such as the above reasons, the user is still acceptable. High demand for third parties.

2.2.5 Complex, large site with strict access speed

Complex definition: Ibid.
Access Speed requirements: Stay time is not long, the user's psychological expectations of access speed is higher.
But if users use the site as a tool and need you to respond quickly, this is not a good implementation. In the next few sections we introduce these optimization choices.

2.3 Choice of domain name

The impact of domain name on access speed has two sides: more domain name, domain name resolution and establish a connection more time; domain name, download concurrency is not enough.
The time cost of rebuilding the connection under HTTPS is higher than HTTP, for the above mentioned simple large-scale site, can only use 1-3 domain name can meet the demand, for Baidu such rich display style more search engine, the page may show too many kinds of resources. Different types of resources are provided by different domain names (different products or third-party products) services, a word search may need to re-establish some of the resources of the SSL link, will let users feel the lag.

If you limit the domain name to a limited scope (typically 2-6 or so), maintain the connection with these domain names, combine some data, and have spdy,http2.0 to ensure concurrency, is to meet our needs. Our current situation is: Baidu Search has hundreds of resource domain names in the loading of various types of resources. This becomes how to solve the problem: how to provide hundreds of domain name services with 2-6 limited domain names, which involves the next section, Proxy Access and CDN.

2.4 Proxy Access

When the domain name is reduced from hundreds of domain names to single digits, it is inevitable to talk about unified access, traffic forwarding and scheduling. The usual site resources are mostly from the main domain name +CDN loading, so we can divide the domain name into these two categories, to replace.

The several CDN domain names that are replaced point to the same CNAME, which means that the way users access them is as follows.

This way SSL handshake only between the user and the two types of nodes, maintain a relatively easy connection, and do not need each domain name to apply for certificates, the deployment of HTTPS access.
This way will encounter domain name conversion, data transmission, traffic scheduling and a series of problems, the need for the overall design structure, for many details need to be optimized, in the transport and research and development have a small investment.
The ideal way: This requires only the HTTPS handshake with the CDN node, greatly reducing the RTT time of the handshake (the CDN node is generally widely distributed in a very close proximity to the user, and the primary domain node is generally relatively limited). This deployment will have a higher demand for CDN operations and research and development capabilities.

Have you found that such access to a complex site into a simple site?

2.5 Connection Multiplexing

The connection multiplexing rate can be divided into different layers such as TCP and SSL, which need to be analyzed and counted separately.

The meaning of 2.5.1 connection multiplexing

The HTTP protocol (RFC2616) specifies that a domain name cannot establish more than 2 TCP connections. But with the development of the Internet, more and more elements of a Web page, the transmission of more and more content, a domain name 2 connection limit has been far enough to meet the current page loading speed requirements.
There are no browsers to comply with this rule, the number of TCP connections to each browser for a single domain name is as follows:

As seen from the table above, the number of connections to a single domain is basically 6. Therefore, the number of concurrent connections can only be increased by increasing the domain name. In HTTP scenarios, there's no problem with this approach. However, under HTTPS connection, due to the high cost of TLS connection establishment, increase the number of concurrent connections itself will bring a large delay, so the number of domain names need a careful control.
In particular, HTTP2 is about to be used on a large scale, and the greatest feature of HTTP2 is multiplexing, which uses multiple domain names and multiple connections to effectively perform multiplexing and compression.
Under the HTTPS protocol, how many domain names should a Web page have? This is actually inconclusive, depending on the number of elements the page needs to load.

2.5.2 Pre-built connection

Since it is not possible to reduce the effect of handshake on speed from the perspective of protocol, can we establish a connection in advance and reduce the user-perceived handshake delay? Of course it's possible. The idea is to pre-contract the current user's next access URL, establish a connection in advance, when the user initiates a real request, the TCP and TLS handshake has been completed, only need to send the application layer data on the connection.
The simplest and most effective way to do this is to pre-build the connection under the primary domain, by requesting some static resources. However, this is still not easy to achieve, because which connection to use, and how much is the browser control. For example, when you request a picture of a domain name, the browser establishes two connections, and then requests an image, the browser is very likely to be able to reuse the connection, but when a domain name needs to load 10 images, the browser will probably create a new connection.

The influence of 2.5.3 Spdy

Spdy is very effective for the increase in connection multiplexing, because it supports concurrent requests on the connection, so the browser will try to remain reusable on this link.

2.5.4 Other

You can also try some other hair methods to allow the browser to establish an HTTPS connection before accessing your website so that the session can be reused. HSTs can also effectively reduce the jump time, unfortunately for the complex website, open need to consider a lot of questions.

2.6 Effect of optimization

From Baidu's optimization experience, if you do not open hsts, the user directly in the browser access to the primary domain name, and then through 302 to HTTPS. The increase will have an average of 400ms+, of which 302 jumps and the SSL handshake are accounted for in half. But for subsequent requests, we did almost no perception of the vast majority of users.
There's plenty of room to optimize for this 400ms+, and we'll continue to optimize the user experience.

3 HTTPS migrations have encountered some common problems. 3.1 Delivery Referrer

We can replace their own web site with HTTPS, but the general site has an outside chain, to make the chain of HTTPS is not too realistic. Many web sites need to judge the source of traffic from referrer, so for a website such as search engine, Referer transmission is more important. If you do not make any settings, you will find that clicking outside the chain in the HTTPS site does not bring referrer into the header of the HTTP request (http://tools.ietf.org/html/rfc7231#section-5.5.2). Modern browsers can use Meta tags to pass refer. (Http://w3c.github.io/webappsec/specs/referrer-policy)
Pass the full URL
Only the site is passed, not including the path and parameters.

What do we do with browsers that don't support meta-pass referrer, such as IE8?
Can use the method of jumping again, since HTTPS can not be passed to HTTP Referer, we could first access a controlled HTTP site from HTTPS, the content needs to be passed to the HTTP site URL, and then jump to the destination address.

3.2 Form Submission

Sometimes you need to submit the form to a third-party site, and the third-party site is the HTTP address, and the browser will have an unsafe warning. Similar logic can be taken with referrer's jump pass.
But this is not a perfect solution for content such as referer and form, as it adds to unsafe factors (hijacking, privacy leaks, etc.). Ideally, users will need to upgrade browsers that conform to the latest specifications and move more sites to HTTPS.

3.3 Video Playback

Simply put, if you use the HTTP protocol to play the video, then the browser will still have unsafe hints. So you have two options, 1 let the video source provide HTTPS. 2 use a non-HTTP protocol, such as the RTMP protocol.

3.4 User exception

In the process of HTTPS migration, there will be a lot of enthusiastic users to give us feedback on the various problems encountered.
The following are some common situations:
1. The user's system time is set incorrectly, which causes the certificate to expire.
2. Users use agents such as Fiddler to Debug, but did not add the root certificate of these software, causing the certificate is illegal.
3. DNS is used by the user for public DNS or cross-network, and some requests are intercepted by the operator as cross-network traffic.
4. Connectivity problems, we found a small operator HTTPS failure rate is very high, and can not contact them, only do not do HTTPS conversion.
5. Slow. Sometimes due to network environment factors, users open other sites also slow, ping which site to 500-2000ms. At this point, HTTPS will naturally be slow too.

4 concluding remarks

For large, complex sites, there is a lot of work to do with HTTPS deployment.
Faced with difficulties and challenges, there is ample power to support our progress: HTTPS on-line, hijacking and other causes of user function anomalies, privacy leaked feedback greatly reduced.
Enthusiastic users often give us feedback on the various issues they encounter. In the past, sometimes even if we had identified the problem of hijacking, the way to solve the problem was very limited. At such times, there is always a sense of powerlessness.
HTTPS's full-site deployment provides us with the option to solve most of the problems. It is the best thing for a technology person to see that their efforts solve the user's problems.
HTTPS is not as hard to imagine and scary, just not optimized. With everyone.

HTTPS practice for large Web sites (iv)--practices outside the protocol layer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.