When ... What happened when?

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Press the "G" key

The next section describes how the physical keyboard and system interrupts work, but some of the content is not covered. When you press the "G" key, the browser will trigger the auto-completion mechanism after receiving the message. Depending on your algorithm and whether you are in private browsing mode, the browser will give you input suggestions under your browser's address box. Most algorithms give preference to suggestions based on your search history and bookmarks. You are going to enter "google.com", so the advice given does not match. However, there is still a lot of code running in the background during the input process, and every time you press the key you will make the advice more accurate. It is even possible that the browser will give you "google.com" before you enter it.

Enter the Press

In order to start from scratch, we select the ENTER key on the keyboard to be at the lowest point. At this point, a current loop dedicated to the ENTER key is directly or indirectly closed through the capacitor, allowing a small amount of current to enter the keyboard's logic circuit system. The system scans the status of each key, and debounce it for the potential bounce change of the key switch and converts it to a keyboard code value. Here, the code value of the carriage return is 13. After obtaining the code value, the keyboard controller encodes it for subsequent transmissions. The transfer process is now almost always done via a universal serial bus (USB) or Bluetooth, previously via PS/2 or ADB connections.

USB Keyboard:

The USB component of the keyboard is connected to the USB controller via the USB interface on the computer, and the first pin in the USB port provides a voltage of 5V
The key value is stored in the internal circuit of the keyboard a register called "endpoint"
The USB controller will query "endpoint" every 10ms to get the stored key value data, this minimum interval is provided by the keyboard
Key value code values are converted to one or more USB packets that follow the low-level USB protocol via the USB serial interface engine
These packets are transferred from the keyboard to the computer at the highest 1.5mb/s speed, either through the d+ pin or the D-pin (two stitches in the middle). Speed limit is due to the fact that the HCI device is always declared as a "low Speed Device" (USB 2.0 compliance)
This serial signal is decoded at the computer's USB controller and is further explained by the universal keyboard driver for the man-machine interaction device. The code values of the keys are then transferred to the hardware abstraction layer of the operating system

Virtual keyboard (Touch screen device):

On the modern capacitive screen, when the user puts the finger on the screen, a small amount of current from the conductive layer of the electrostatic field through the finger conduction, forming a loop, so that the touch on the screen voltage drops, the screen controller generates an interrupt, report this time "click" coordinates
Then the mobile OS notifies the currently active app that a click event occurred on one of its GUI parts, and now the widget is a virtual keyboard button
The virtual keyboard throws a soft interrupt back to the OS with a "press-down" message
This message is returned to notify the currently active app of a "press-down" event

Interrupt generation [non-USB keyboard]

The keyboard sends a signal on its interrupt request line (IRQ), and the signal is mapped to an interrupt vector by the interrupt controller, which is actually an integer number. The CPU uses interrupt descriptor (IDT) to map interrupt vectors to corresponding functions, called interrupt handlers, which are provided by the operating system kernel. When an interrupt arrives, the CPU is indexed to the corresponding interrupt processor based on the IDT and interrupt vectors, and then the operating system kernel exits.

(Windows) A WM_KEYDOWNMessages are sent to the application

HID the keyboard pressed events to the KBDHID.sys driver, the HID signal into a scanning code (SCANCODE), where the return of the scan code is VK_RETURN(0x0d) . KBDHID.sysDrive and KBDCLASS.sys (Keyboard class driver, keyboard class driver) to interact, this driver is responsible for the safe handling of all keyboard and keypad input events. It is then called Win32K.sys , before which it is possible to pass the message to the installed third-party keyboard filter. These are all occurring in kernel mode.

Win32K.sysGetForegroundWindow()find out which window is currently active through the API function. This API function provides a handle to the address bar of the current browser. The "message pump" mechanism of the Windows system calls the SendMessage(hWnd, WM_KEYDOWN, VK_RETURN, lParam) function, which lParam is a mask that indicates more information about the key, including the number of key repeats (this is 0), the actual scan code (which may be dependent on the OEM, but usually not VK_RETURN ) , the function keys (ALT, SHIFT, CTRL) are pressed (not here), as well as some other states.

The Windows SendMessage API adds messages directly to the message queue for a particular window handle hWnd , and then hWnd the primary message handler function that is assigned is WindowProc called to process the messages in the queue.

The currently active handle hWnd is actually an edit control, in which case WindowProc there is a processor for processing the message, and WM_KEYDOWN This code looks at SendMessage the third parameter passed in wParam because this parameter is VK_RETURN , so it knows the user pressed The ENTER key.

(Mac OS X) A KeyDownNsevent is sent to the application

The interrupt signal causes the I/O Kit kext keyboard-driven interrupt handling event to be translated into a key value and then passed to the OS X WindowServer process. The event is then distributed WindowServer to the appropriate (active, or listening) application via the Mach port, which is placed in the application's message queue. Messages in the queue can be read to by a thread that has high enough permissions to use the mach_ipc_dispatch function. This process is usually generated by the NSApplication main event loop and processed by NSEventType KeyDown the NSEvent .

(Gnu/linux) Xorg Server Listener Key value

When using the graphical x server, x server maps the key values once again to the scanned code according to the specific rules. When the mapping process is complete, X Server sends the key word character to the window manager (dwm,metacity, i3, etc.), and the window manager sends the word character to the current window. The current window uses the relevant graphics API to print text inside the input box.

Parsing URLs

The browser can know the following information through a URL:
- Protocol"http"
  
  Using the HTTP protocol
- Resource"/"
  
  The requested resource is the home page (index)

Are you entering a URL or a search keyword?

When the protocol or hostname is not valid, the browser will pass the text entered in the Address bar to the default search engine. Most of the time, when passing text to a search engine, the URL will have a specific string of characters to tell the search engine that it came from this particular browser.

Check the HSTS list ...

The browser examines the included "preloaded HSTS (HTTP Strict transport Security)" list, which contains Web sites that request browsers to connect using HTTPS only
If the site is in this list, the browser will use HTTPS instead of the HTTP protocol, otherwise the initial request will be sent using the HTTP protocol
Note that a Web site, even if it is not in the HSTS list, can ask the browser to access its own HSTS policy. After the browser makes the first HTTP request to the Web site, the Web site returns a response from the browser requesting that the browser only send the request using HTTPS. However, this first HTTP request may cause the user to receive the threat of downgrade attack, which is why the HSTS list is preset by modern browsers.

Convert non-ASCII Unicode characters

The browser checks if the input a-z contains A-Z 0-9 - . A character that is not,,, or
Here the hostname is google.com , so there is no non-ASCII character, if any, the browser will use Punycode encoding for the hostname part

DNS queries ...

Browser checks if the domain name is in the cache
If not in the cache, call gethostbyname library functions (different functions of the operating system) to query
gethostbynamefunction before attempting DNS resolution first check whether the domain name is in the local hosts, the hosts are located in different operating systems
If gethostbyname there is no cache record for this domain name and hosts It is not found in, it will send a DNS query request to the DNS server. A DNS server is provided by a network communication stack, usually a local router or an ISP's cache DNS server.
Querying the local DNS server
If the DNS server and our host are in the same subnet, the system will follow the following ARP procedure to query the DNS server for ARP
If the DNS server and our host are on different subnets, the system will query the default gateway according to the following ARP procedure

Arp

To send an ARP broadcast, we need to have a destination IP address, and also need to know the MAC address of the interface used to send the ARP broadcast.

First query the ARP cache, if the cache hits, we return the result: Destination IP = MAC

If the cache does not hit:

Look at the routing table to see if the destination IP address is within a subnet in the local routing table. Yes, use an interface that is connected to that subnet, or use an interface that is connected to the default gateway.
Query the MAC address of the selected network interface
We send a two-layer ARP request:

ARP Request:

Sender MAC:interface:mac:address:hereSender IP:interface.ip.goes.hereTarget MAC:FF:FF:FF:FF:FF:FF (broadcast) Target IP:target.ip.goes.here

Depending on the type of hardware that connects the host to the router, you can be divided into the following scenarios:

Direct Connect:

If we are connected directly to the router, the router will return one ARP Reply (see below).

Hubs:

If we connect to a hub, the hub broadcasts the ARP request to all other ports, and if the router is also "connected" in it, it returns one ARP Reply .

Switch:

If we connect to a switch, the switch checks the local CAM/MAC table to see which Port has the MAC address we are looking for, and if not found, the Exchange will broadcast this ARP request to all other ports.
If there is a corresponding entry in the Mac/cam table of the switch, the Exchange opportunity sends an ARP request to the port that has the MAC address we want to query
If the router is also "connected" in it, it returns aARP Reply

ARP Reply:

Sender MAC:target:mac:address:hereSender IP:target.ip.goes.hereTarget MAC:interface:mac:address:hereTarget IP: Interface.ip.goes.here

Now that we have the DNS server or the IP address of the default gateway, we can continue with the DNS request:

A UDP request packet is sent to the DNS server using port 53, and if the response packet is too large, the TCP protocol is used
If the local/isp DNS server does not find the results, it sends a recursive query request, one layer at a level to the top-level DNS server, until the query to the starting authority, if found will return the results

Using sockets

When the browser gets the IP address of the destination server and the port number given in the URL (the HTTP protocol default port number is, the HTTPS default port number is 443), it will call the system library function socket , request a TCP stream socket, the corresponding parameter is AF_INET and SOCK_STREAM 。

This request is first handed over to the transport layer, where the transport layer request is encapsulated as a TCP segment. The target port will be added to the header, and the source port will be selected within the dynamic port range of the system core (Linux is Ip_local_port_range)
The TCP segment is sent to the network layer where the network layer adds an IP header that contains the IP address of the destination server and the IP address of the machine, encapsulating it as a TCP packet.
This TCP packet then enters the link layer, which is added to the frame header in the packet, which contains the MAC address of the local internal network card and the MAC address of the gateway (local router). As previously stated, if the kernel does not know the MAC address of the gateway, it must make an ARP broadcast to query its address.

Now that the TCP packet is ready, you can transfer it in the following way:

Ethernet
Wifi
Cellular Data Network

For most home networks and small business networks, packets are sent from the local computer through the local network, and the modem converts the digital signal to analog, making it suitable for transmission over telephone lines, cable TV cables and wireless telephone lines. At the other end of the transmission line is another modem that converts the analog signal back to the digital signal and is processed by the next network node. The destination address and source address of the node are discussed later.

Large enterprises and relatively new dwellings usually use fiber optics or direct Ethernet connections, in which case the signal is always digital and will be transmitted directly to the next network node for processing.

The final packet arrives at the router that manages the local subnet. From there, it continues through the autonomous zone's border routers, and other autonomous areas, eventually reaching the target server. The routers that pass along the way extract the destination address from the IP data header and route the packets to the next destination correctly. The value of the IP datagram header TTL field is reduced by 1 per router, if the packet's TTL becomes 0, or if the router is full due to network congestion and other reasons, the packet is discarded by the router.

The above send and receive processes occur many times during a TCP connection:

The client chooses an initial serial number (ISN), sends a packet with the SYN bit set to the server side, indicates that it wants to establish the connection and sets the initial sequence number
The server side receives the SYN packet if it can establish a connection:
- The server side chooses its own initial serial number
- The server side sets the SYN bit to indicate that it has chosen an initial serial number
- Server side (client isn + 1) is copied to the ACK domain, and the ACK bit is set to indicate that it has received the first packet of the client
The client confirms this connection by sending one of the following packets:
- Own serial number +1
- Receiving End Ack+1
- Set ACK bit
The data is transmitted in the following way:
- When a party sends N Bytes data, its SEQ serial number is also increased by n
- After the other party confirms that the packet was received (or a series of packets), it sends an ACK packet with the ACK value set to the last sequence number of the received packet
When you close the connection:
- To close the connection, send a FIN package to the party
- The other party confirms the fin bag and sends its own FIN pack
- The party to be closed uses an ACK packet to confirm the receipt of FIN

UDP Packets

TLS handshake

The client sends a Client hello message to the server side, and the message contains its TLS version, the available cryptographic algorithms, and the compression algorithm.
The server side returns a message to the client Server hello that contains the TLS version of the server side, which encryption and compression algorithm the server chooses, and the server's public certificate, which contains the public key. The client uses this public key to encrypt the next handshake until the negotiation generates a new symmetric key
The client verifies that the server-side certificate is valid based on its own list of trusted CAs. If valid, the client generates a bunch of pseudo-random numbers and encrypts it using the server's public key. This string of random numbers is used to generate a new symmetric key
The server uses its own private key to decrypt the above mentioned random number, and then uses this random number to generate its own symmetric master key
The client sends a Finished message to the server side, using a symmetric key to encrypt a hash value for this communication
The server generates its own hash value, then decrypts the information sent by the client and checks whether the two values correspond. If appropriate, sends a message to the client Finished , and also uses the negotiated symmetric key encryption
From now on, the entire TLS session is encrypted with a symmetric key that transmits the application layer (HTTP) content

TCP Packets

HTTP protocol ...

If the browser is produced by Google, it will not use the HTTP protocol to obtain page information, but will send a request with the server side to discuss the use of the SPDY protocol.

If the browser uses the HTTP protocol, it sends such a request to the server:

get/http/1.1host:google.com[Other head]

"Other Headers" contains a series of key-value pairs separated by colons that are formatted in accordance with the HTTP protocol standard, separated by a newline character. Here we assume that the browser does not violate the HTTP protocol standard bug, and the browser uses the HTTP/1.1 protocol, otherwise the header may not contain the Host field, and GET the version number in the request will become HTTP/1.0 or HTTP/0.9 .

http/1.1 defines the option "close" for "closed connection", which is used by the sender to indicate that the connection will be broken after the response has ended:

Connection:close

http/1.1 that do not support persistent connections must include the "close" option in each message.

After sending these requests and headers, the browser sends a newline character indicating that the content to be sent is finished.

The server side returns a response code that indicates the status of this request, in the form of a response:

ok[Response Head]

Then there is a newline, followed by payload (payload), which is www.google.com the HTML content. The connection may be closed under the server, and the server will keep the connection open for subsequent requests to be reused if the client requests remain connected.

If the browser sends an HTTP header that contains enough information, such as an Etag header, so that the server can determine that the file version of the browser cache has not changed since the last fetch, the server may return the response:

304 not modified[response Head]

This response does not have a payload and the browser will remove the desired content from its own cache.

After parsing the HTM L, the browser and the client will repeat the process until all the resources (Pictures, Css,favicon.ico, and so on) that were introduced to the HTML page are all taken, except that the head is GET / HTTP/1.1 changed GET /$(相对www.google.com的URL) HTTP/1.1 .

If HTML introduces a www.google.com resource other than the domain name, the browser will go back to the first step of resolving the domain name, follow the steps below, and the header in the request Host will become another domain name.

HTTP Server Request Processing

HTTPD (HTTP Daemon) processes the request/corresponding on the server side. The most common HTTPD are Apache and nginx commonly used on Linux, as well as IIS on Windows.

HTTPD Receiving requests
The server splits the request into several parameters:
- HTTP Request Methods (GET, POST, HEAD, PUT and DELETE). In the case of Google access, the GET method is used
- Domain Name: google.com
- Request Path/page:/(We did not request google.com under the specified page, so/is the default path)
The server verifies that a google.com virtual host is already configured on it
Server validation google.com Accept GET method
The server verifies that the user can use the GET method (based on IP address, identity information, etc.)
If the server has URL rewrite modules installed (for example, Apache mod_rewrite and IIS URL rewrite), the server will attempt to match the rewrite rules, and if so, the server will rewrite the request according to the rules.
The server obtains the corresponding response content according to the request information, in this case, because the access path is "/", will access the home page file (you can override this rule, but this is the most common).
The server uses the specified handler analysis to process the file, and if Google uses PHP, the server will parse the index file using PHP and capture the output, returning the output from PHP to the requestor

The story behind the browser

After the server has provided the resources (HTML,CSS,JS, pictures, etc.), the browser performs the following actions:

Parsing Html,css,js
Rendering--Drawing with layout---rendering---for building the DOM tree

Browser

The function of the browser is to retrieve the resources you want from the server and then display them in the browser window. Resources are usually HTML files, or they can be PDFs, pictures, or other types of content. The location of the resource is determined by the user-supplied URI (Uniform Resource Identifier).

The way the browser interprets and presents the HTML file is described in detail in the HTML and CSS standards. These standards are maintained by the Web Standards Organization (World Wide Web Consortium).

The user interface of different browsers is very close, there are many common UI elements:

An address bar
Back and Forward Buttons
Bookmark options
Refresh and Stop buttons
Home button

Browser high-level architecture

The components that make up the browser are:

the user interface user interface contains the address bar, forward and backward buttons, bookmarks menu, and so on, all the content you see in addition to the request page is part of the user interface.
The browser engine browser engine is responsible for making the UI and rendering engine work in harmony
The render engine rendering engine is responsible for presenting the requested content. If the requested content is HTML, the rendering engine parses the HTML and CSS and then displays the content on the screen
network Components network components are responsible for network calls, such as HTTP requests, and so on, using a platform-independent interface, the lower layer is the specific implementation of different platforms
UI back end The UI backend is used to draw basic UI components, such as drop-down list boxes and windows. The UI backend exposes a unified platform-agnostic interface that is implemented using the operating system's UI approach
Javascript engine JavaScript engine for parsing and executing JavaScript code
The data storage Data storage component is a persistent layer. Browsers may need to store a wide variety of data locally, such as cookies. Browsers also need to support storage mechanisms such as Localstorage,indexeddb,websql and FileSystem

HTML parsing

The browser rendering engine obtains the requested document from the network layer, and in general, the document is divided into 8kB-sized chunked transmissions.

The main task of the HTML parser is to parse the HTML document and generate the parse tree.

A parse tree is a tree with DOM elements and attributes as nodes. The DOM is an abbreviation for the Document Object model, which is an object representation of an HTML document and an interface for external (such as JavaScript) HTML elements. The root of the tree is the "Document" object. The entire DOM and HTML document is almost a one-to-one relationship.

Analytic algorithm

HTML cannot be parsed using a common top-down or bottom-up approach. The main reasons are as follows:

The "tolerant" nature of the language itself
HTML itself may be incomplete, for common deformity, the browser needs to have a traditional fault-tolerant mechanism to support them
The parsing process needs to be repeated. For other languages, the source code does not change during parsing, but for HTML, dynamic codes, such as the document.write () method contained in the script element, add content to the source, meaning that the parsing process actually changes the input content

Due to the inability to use commonly used parsing techniques, the browser created a parser specifically for parsing HTML. The analytic algorithm is introduced in detail in the HTML5 standard specification, the algorithm mainly contains two stages: labeling (tokenization) and tree building.

After parsing is finished

The browser starts loading the external resources of the Web page (CSS, images, Javascript files, etc.).

At this point the browser marks the document as "Interactive," and the browser starts parsing scripts in "deferred" mode, which is the script that needs to be executed after the document has been parsed. The status of the document then changes to done, and the browser loads the event.

Note When parsing an HTML page, there is never a "syntax error", and the browser fixes all errors and then continues parsing.

Executes the synchronous Javascript code.

CSS parsing

Analysis of CSS files and tags based on CSS lexical and syntactic <style> content
Each CSS file is parsed into a Stylesheet object that contains CSS rules with selectors and objects that correspond to CSS syntax.
The CSS parser may be top-down or a bottom-up parser generated using the parser Builder

Page rendering

Create a "Frame tree" or "Render tree" by traversing the DOM node tree and calculate individual CSS style values for each node
By accumulating the width of the child nodes, the horizontal padding (padding), Border (border), and margin (margin) of the node, the bottom-up calculation of the selected (preferred) width of each node in the "Frame tree"
Calculates the actual width of each node by assigning a feasible width to the child nodes of each node from top to bottom
Calculates the height of each node from the bottom up by applying text wrapping, the height of the accumulated child nodes, and the padding (padding), Border (border), and margin (margin) of this node
Build the coordinates of each node using the above calculation results
There are more complex calculations when there are elements floated in use, locations, absolutely or attributes, relatively see HTTP://DEV.W3.ORG/CSSWG/CSS2/and Http://www.w3.org/Style/CSS/current-work
Create layers (tiers) to represent which parts of a page can be drawn in groups without being re-rasterized. Each frame object is assigned to a single layer
Each layer on the page is assigned a texture (?)
Each layer's frame object is traversed, and the computer executes a drawing command to draw each layer, which may be rasterized by the CPU, or plotted directly on the GPU via the D2D/SKIAGL
All of the above steps may take advantage of the values computed at the most recent page rendering, which can reduce a lot of computational
The final position of each layer is calculated, a set of commands is emitted by the Direct3d/opengl, the GPU command buffer is emptied, the command is transmitted to the GPU and rendered asynchronously, and the frame is sent to the window Server.

GPU rendering

During rendering, the graphics processing layer may use a general-purpose CPU, or the graphics processor GPU may be used
When using the GPU for graphical rendering, graphics-driven software divides the task into multiple parts, which take advantage of the powerful parallel computing power of the GPU for a large number of floating-point computations during rendering.

Window Server

Post-rendering and user-initiated processing

After rendering is complete, the browser runs JavaScript code (such as Google Doodle Animations) or interacts with the user based on certain time mechanisms (such as searching for a keyword in the search bar for search suggestions). Plug-ins like Flash and Java will also run, although not on the Google home page. These scripts can trigger network requests, and may change the content and layout of the Web page, resulting in another round of rendering and drawing.

Reprinted from: Https://github.com/skyline75489/what-happens-when-zh_CN

Original: Https://github.com/alex/what-happens-when

When ... What happened when?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More