WebKit page cache II-The unload event
PostedBrady EidsonOn Monday, September 21st, 2009 at pm
Previusly I touched on what exactly the page cache does and outlined some of the improvements we're working on.
This post is geared towards web developers and is therefore even more technical than the last.
In this article I 'd like to talk more about unload event handlers, why they prevent pages from going into the page cache, and what can be done to make things better.
Load/unload event handlers
Web developers can make use of the load and unload events to do work at certain points in the life time of a Web page.
The purpose of the load event is quite straightforward: to perform initial setup of a new page once it has loaded.
The unload event is comparatively mysterious. Whenever the user leaves a page it is "unloaded" and scripts can do some final cleanup.
The mysterious part is that "leaving the page" can mean one of a few things:
- The user closes the browser tab or window, resulting in the destruction of the visible page.
- The browser navigates from the old page to a new page, resulting in the destruction of the old visible page.
The page cache makes this even more interesting by adding a new navigation possibility:
- The browser navigates from the old page to a new page, but the old visible page is suincluded, hidden, and placed in the page cache.
The status quo
Unload event handlers are meant to do some final cleanup when the visible page is about to be destroyed. but if the page goes into the page cache it becomes susponded, is hidden, and is not immediately torn down. this brings up interesting complications.
If we fire the unload event when going into the page cache, then the Handler might be destructive and render the page useless when the user returns.
If we fire the unload event every time a page is left, including each time it goes into the page cache and when it is eventually destroyed, then the Handler might do important work multiple times that it was critical to only do once.
If we don't fire the unload event when going into the page cache, then we face the possibility that the page will be destroyed while it is suincluded and hidden, and the unload Handler might never be run.
If we don't fire the unload event when going into the page cache but consider firing it whenever the suincluded page is eventually destroyed, then we're considering the possibility of doing something that's never been done before: executing scripts that belong to an invisible web page that has had its "Pause" button pressed.
There's all sorts of obstacles in making this work well including wide ical hurdles, security concerns, and user-experience considerations.
Since there is no clear solution for handling such pages the major browsers vendors have all come to the same conclusion: Don't cache these pages.
How you can help
Web developers have a few things they can do to help their pages be cacheable.
One is to only install the unload event handler if the code is relevant to the current browser. For example, we 've seen unload handlers similar to the following:
function unloadHandler() { if (_scriptSettings.browser.isIE) { // Run some unload code for Internet Explorer ... } }
In all browsers other than Internet Explorer this Code does nothing, but its mere existence potentially slows down their user experience. This developer shoshould 've done the browser check * Before * installing the unload handler.
Another way developers can improve things is to only install the unload event handler when the page has a need to listen for it, then remove it once that reason has passed.
For example the user might be working on a draft of a document so the developer installan unload handler to make sure the draft gets saved before the page is left. but they also start a timer to automatically save it every minute or so. if the timer fires, the document draft is saved, and the user doesn' t make any further changes, the unload handler shocould be removed.
Particle ly savvy developers might consider a third option.
A replacement for unload
Some time ago Mozilla approached this problem differently by inventing a replacement for load/unload events.
The load and unload events are meant to be fired exactly once, and this is the underlying cause of the problem. the pageshow/pagehide events-which we 've implemented in WebKit as ofrevision 47824-address this.
Despite their name the pageshow/pagehide events don't have anything to do with whether or not the page is actually visible on the screen. they won't fire when you minimize the window or switch tabs, for example.
What they do is Augment load/unload to work in more situations involving navigation. Consider this example of how load/unload event handlers might be used:
Click here to view this example in a new window, in case you can't guess what it does.
Try clicking the link to leave the page then press the back button. Pretty straightforward.
The pageshow/pagehide fire when load/unload do, but also have one more trick up their sleeve.
Instead of firing only at the single discrete moment when a page is "loaded" the pageshow event is also fired when pages are restored from the page cache.
Similarly the pagehide event fires when the unload event fires but also when a page is suincluded into the page cache.
By including an additional Property on the event called "persisted" the events tell the page whether they represent the load/unload events or saving/restoring from the page cache.
Here's the same example using pageshow/pagehide:
Click here to view this example in a new window, but make sure you're using a recent WebKit nightly.
Remember to try clicking the link to leave the page then press the back button.
Pretty cool, right?
What these new events accomplish
The pagehide event is important for two reasons:
- It enables Web developers to distinguish between a page being susponded and one that is being destroyed.
- When used instead of the unload event, it enables browsers to use their page cache.
It's also straightforward to change existing code to use pagehide instead of unload. here is an example of testing for the onpageshow attribute to choose pageshow/pagehide when supported, falling back to load/unload when they're not:
Piece of cake!
How you can help: Revisited
To reiterate, we 've now identified three great ways web developers can help the page cache work better:
- Only install the event handler if the code is relevant to the current browser.
- Only install the event handler once your page actually needs it.
- If supported by the browser, use pagehide instead.
Web developers that willfully ignore any or all these options are primarily accomplishing one thing:
Forcing their users into "slow navigation mode ."
I say this both as a browser engineer and a browser user: That stinks!
The plot thickens
But now that we 've covered what savvy and polite web developers can do to help in the future, we need to further scrutinize the current state of the web.
Browsers treat the unload handler as sacred because it is designed to do "important work." Unfortunately extends popular sites have unload event handlers that decidedlyDo not"Do important work." I commonly see handlers that:
- Always update some cookies for tracking, even though it's already been updated.
- Always send an xhr update of draft data to a server, even though it's already been sent.
- Do nothing that cocould possible persist to any future browsing session.
- That are empty. They literally doNothing.
Since these misbehaved pages are very common and will render improvements to WebKit's page cache ineffective a few of us started to ask the question:
What * wocould * actually happen if we simply started admitting these pages to the page cache without running the unload event handler first?
What wocould break?
Can we detect any patterns to determine whether an unload event handler is "important" or not?
Our experiment
You never know for sure until you try.
Starting in revision 48388 we 've ve allowed pages with unload handlers into the page cache. if a user closes the window while the page is visible, the unload event will fire as usual. but the unload event will not be fired as normal when the user navigates away from the page. if the user closes the window while the page is sudedded and in the page cache, the unload event handler will never be run.
What this means for users is that their navigation experience cocould be noticeably smoother and quicker in the common case. what this means for developers is that we're re consciously deciding not to run some of their code and their web application might break.
For users and developers alike-Please leave your feedback, observations, or suggestions in the bug tracking this experiment.
And remember this is just an experiment. no one is planning to ship this drastic change in behavior in a production product. but the page cache is such an important part of browser performance that we're re willing to push the envelope a little to improve it a lot.
We want to learn what breaks. we want to know if we can heuristically determine if an unload handler is truly critical or not. we want to know if we can detect certain patterns in some types of Unload handlers and treat them differently. and, perhaps most importantly, we want to evangelize.
At least one popular JavaScript library has already adopted some of the advice we 've given to help improve the landscape on the web. if just a few more developers for popular sites or libraries take notice of this experiment and change their code then the Web will be a much friendlier place for all of us.