Press the "G" key
The next section describes how the physical keyboard and system interrupts work, but some of the content is not covered. When you press the "G" key, the browser will trigger the auto-completion mechanism after receiving the message. Depending on your algorithm and whether you are in private browsing mode, the browser will give you input suggestions under your browser's address box. Most algorithms give preference to suggestions based on your search history and bookmarks. You are going to enter "google.com", so the advice given does not match. However, there is still a lot of code running in the background during the input process, and every time you press the key you will make the advice more accurate. It is even possible that the browser will give you "google.com" before you enter it.
Enter the Press
In order to start from scratch, we select the ENTER key on the keyboard to be at the lowest point. At this point, a current loop dedicated to the ENTER key is directly or indirectly closed through the capacitor, allowing a small amount of current to enter the keyboard's logic circuit system. The system scans the status of each key, and debounce it for the potential bounce change of the key switch and converts it to a keyboard code value. Here, the code value of the carriage return is 13. After obtaining the code value, the keyboard controller encodes it for subsequent transmissions. The transfer process is now almost always done via a universal serial bus (USB) or Bluetooth, previously via PS/2 or ADB connections.
USB Keyboard:
- The USB component of the keyboard is connected to the USB controller via the USB interface on the computer, and the first pin in the USB port provides a voltage of 5V
- The key value is stored in the internal circuit of the keyboard a register called "endpoint"
- The USB controller will query "endpoint" every 10ms to get the stored key value data, this minimum interval is provided by the keyboard
- Key value code values are converted to one or more USB packets that follow the low-level USB protocol via the USB serial interface engine
- These packets are transferred from the keyboard to the computer at the highest 1.5mb/s speed, either through the d+ pin or the D-pin (two stitches in the middle). Speed limit is due to the fact that the HCI device is always declared as a "low Speed Device" (USB 2.0 compliance)
- This serial signal is decoded at the computer's USB controller and is further explained by the universal keyboard driver for the man-machine interaction device. The code values of the keys are then transferred to the hardware abstraction layer of the operating system
Virtual keyboard (Touch screen device):
- On the modern capacitive screen, when the user puts the finger on the screen, a small amount of current from the conductive layer of the electrostatic field through the finger conduction, forming a loop, so that the touch on the screen voltage drops, the screen controller generates an interrupt, report this time "click" coordinates
- Then the mobile OS notifies the currently active app that a click event occurred on one of its GUI parts, and now the widget is a virtual keyboard button
- The virtual keyboard throws a soft interrupt back to the OS with a "press-down" message
- This message is returned to notify the currently active app of a "press-down" event
Interrupt generation [non-USB keyboard]
The keyboard sends a signal on its interrupt request line (IRQ), and the signal is mapped to an interrupt vector by the interrupt controller, which is actually an integer number. The CPU uses interrupt descriptor (IDT) to map interrupt vectors to corresponding functions, called interrupt handlers, which are provided by the operating system kernel. When an interrupt arrives, the CPU is indexed to the corresponding interrupt processor based on the IDT and interrupt vectors, and then the operating system kernel exits.
(Windows) A
WM_KEYDOWN
Messages are sent to the application
HID the keyboard pressed events to the KBDHID.sys
driver, the HID signal into a scanning code (SCANCODE), where the return of the scan code is VK_RETURN(0x0d)
. KBDHID.sys
Drive and KBDCLASS.sys
(Keyboard class driver, keyboard class driver) to interact, this driver is responsible for the safe handling of all keyboard and keypad input events. It is then called Win32K.sys
, before which it is possible to pass the message to the installed third-party keyboard filter. These are all occurring in kernel mode.
Win32K.sys
GetForegroundWindow()
find out which window is currently active through the API function. This API function provides a handle to the address bar of the current browser. The "message pump" mechanism of the Windows system calls the SendMessage(hWnd, WM_KEYDOWN, VK_RETURN, lParam)
function, which lParam
is a mask that indicates more information about the key, including the number of key repeats (this is 0), the actual scan code (which may be dependent on the OEM, but usually not VK_RETURN
) , the function keys (ALT, SHIFT, CTRL) are pressed (not here), as well as some other states.
The Windows SendMessage
API adds messages directly to the message queue for a particular window handle hWnd
, and then hWnd
the primary message handler function that is assigned is WindowProc
called to process the messages in the queue.
The currently active handle hWnd
is actually an edit control, in which case WindowProc
there is a processor for processing the message, and WM_KEYDOWN
This code looks at SendMessage
the third parameter passed in wParam
because this parameter is VK_RETURN
, so it knows the user pressed The ENTER key.
(Mac OS X) A
KeyDown
Nsevent is sent to the application
The interrupt signal causes the I/O Kit kext keyboard-driven interrupt handling event to be translated into a key value and then passed to the OS X WindowServer
process. The event is then distributed WindowServer
to the appropriate (active, or listening) application via the Mach port, which is placed in the application's message queue. Messages in the queue can be read to by a thread that has high enough permissions to use the mach_ipc_dispatch
function. This process is usually generated by the NSApplication
main event loop and processed by NSEventType
KeyDown
the NSEvent
.
(Gnu/linux) Xorg Server Listener Key value
When using the graphical x server, x server maps the key values once again to the scanned code according to the specific rules. When the mapping process is complete, X Server sends the key word character to the window manager (dwm,metacity, i3, etc.), and the window manager sends the word character to the current window. The current window uses the relevant graphics API to print text inside the input box.
Parsing URLs
Are you entering a URL or a search keyword?
When the protocol or hostname is not valid, the browser will pass the text entered in the Address bar to the default search engine. Most of the time, when passing text to a search engine, the URL will have a specific string of characters to tell the search engine that it came from this particular browser.
Check the HSTS list ...
- The browser examines the included "preloaded HSTS (HTTP Strict transport Security)" list, which contains Web sites that request browsers to connect using HTTPS only
- If the site is in this list, the browser will use HTTPS instead of the HTTP protocol, otherwise the initial request will be sent using the HTTP protocol
- Note that a Web site, even if it is not in the HSTS list, can ask the browser to access its own HSTS policy. After the browser makes the first HTTP request to the Web site, the Web site returns a response from the browser requesting that the browser only send the request using HTTPS. However, this first HTTP request may cause the user to receive the threat of downgrade attack, which is why the HSTS list is preset by modern browsers.
Convert non-ASCII Unicode characters
- The browser checks if the input
a-z
contains A-Z
0-9
-
.
A character that is not,,, or
- Here the hostname is
google.com
, so there is no non-ASCII character, if any, the browser will use Punycode encoding for the hostname part
DNS queries ...
- Browser checks if the domain name is in the cache
- If not in the cache, call
gethostbyname
library functions (different functions of the operating system) to query
gethostbyname
function before attempting DNS resolution first check whether the domain name is in the local hosts, the hosts are located in different operating systems
- If
gethostbyname
there is no cache record for this domain name and hosts
It is not found in, it will send a DNS query request to the DNS server. A DNS server is provided by a network communication stack, usually a local router or an ISP's cache DNS server.
- Querying the local DNS server
- If the DNS server and our host are in the same subnet, the system will follow the following ARP procedure to query the DNS server for ARP
- If the DNS server and our host are on different subnets, the system will query the default gateway according to the following ARP procedure
Arp
To send an ARP broadcast, we need to have a destination IP address, and also need to know the MAC address of the interface used to send the ARP broadcast.
- First query the ARP cache, if the cache hits, we return the result: Destination IP = MAC
If the cache does not hit:
- Look at the routing table to see if the destination IP address is within a subnet in the local routing table. Yes, use an interface that is connected to that subnet, or use an interface that is connected to the default gateway.
- Query the MAC address of the selected network interface
- We send a two-layer ARP request:
ARP Request
:
Sender MAC:interface:mac:address:hereSender IP:interface.ip.goes.hereTarget MAC:FF:FF:FF:FF:FF:FF (broadcast) Target IP:target.ip.goes.here
Depending on the type of hardware that connects the host to the router, you can be divided into the following scenarios:
Direct Connect:
- If we are connected directly to the router, the router will return one
ARP Reply
(see below).
Hubs:
- If we connect to a hub, the hub broadcasts the ARP request to all other ports, and if the router is also "connected" in it, it returns one
ARP Reply
.
Switch:
- If we connect to a switch, the switch checks the local CAM/MAC table to see which Port has the MAC address we are looking for, and if not found, the Exchange will broadcast this ARP request to all other ports.
- If there is a corresponding entry in the Mac/cam table of the switch, the Exchange opportunity sends an ARP request to the port that has the MAC address we want to query
- If the router is also "connected" in it, it returns a
ARP Reply
ARP Reply
:
Sender MAC:target:mac:address:hereSender IP:target.ip.goes.hereTarget MAC:interface:mac:address:hereTarget IP: Interface.ip.goes.here
Now that we have the DNS server or the IP address of the default gateway, we can continue with the DNS request:
- A UDP request packet is sent to the DNS server using port 53, and if the response packet is too large, the TCP protocol is used
- If the local/isp DNS server does not find the results, it sends a recursive query request, one layer at a level to the top-level DNS server, until the query to the starting authority, if found will return the results
Using sockets
When the browser gets the IP address of the destination server and the port number given in the URL (the HTTP protocol default port number is, the HTTPS default port number is 443), it will call the system library function socket
, request a TCP stream socket, the corresponding parameter is AF_INET
and SOCK_STREAM
。
- This request is first handed over to the transport layer, where the transport layer request is encapsulated as a TCP segment. The target port will be added to the header, and the source port will be selected within the dynamic port range of the system core (Linux is Ip_local_port_range)
- The TCP segment is sent to the network layer where the network layer adds an IP header that contains the IP address of the destination server and the IP address of the machine, encapsulating it as a TCP packet.
- This TCP packet then enters the link layer, which is added to the frame header in the packet, which contains the MAC address of the local internal network card and the MAC address of the gateway (local router). As previously stated, if the kernel does not know the MAC address of the gateway, it must make an ARP broadcast to query its address.
Now that the TCP packet is ready, you can transfer it in the following way:
- Ethernet
- Wifi
- Cellular Data Network
For most home networks and small business networks, packets are sent from the local computer through the local network, and the modem converts the digital signal to analog, making it suitable for transmission over telephone lines, cable TV cables and wireless telephone lines. At the other end of the transmission line is another modem that converts the analog signal back to the digital signal and is processed by the next network node. The destination address and source address of the node are discussed later.
Large enterprises and relatively new dwellings usually use fiber optics or direct Ethernet connections, in which case the signal is always digital and will be transmitted directly to the next network node for processing.
The final packet arrives at the router that manages the local subnet. From there, it continues through the autonomous zone's border routers, and other autonomous areas, eventually reaching the target server. The routers that pass along the way extract the destination address from the IP data header and route the packets to the next destination correctly. The value of the IP datagram header TTL field is reduced by 1 per router, if the packet's TTL becomes 0, or if the router is full due to network congestion and other reasons, the packet is discarded by the router.
The above send and receive processes occur many times during a TCP connection:
UDP Packets
TLS handshake
- The client sends a
Client hello
message to the server side, and the message contains its TLS version, the available cryptographic algorithms, and the compression algorithm.
- The server side returns a message to the client
Server hello
that contains the TLS version of the server side, which encryption and compression algorithm the server chooses, and the server's public certificate, which contains the public key. The client uses this public key to encrypt the next handshake until the negotiation generates a new symmetric key
- The client verifies that the server-side certificate is valid based on its own list of trusted CAs. If valid, the client generates a bunch of pseudo-random numbers and encrypts it using the server's public key. This string of random numbers is used to generate a new symmetric key
- The server uses its own private key to decrypt the above mentioned random number, and then uses this random number to generate its own symmetric master key
- The client sends a
Finished
message to the server side, using a symmetric key to encrypt a hash value for this communication
- The server generates its own hash value, then decrypts the information sent by the client and checks whether the two values correspond. If appropriate, sends a message to the client
Finished
, and also uses the negotiated symmetric key encryption
- From now on, the entire TLS session is encrypted with a symmetric key that transmits the application layer (HTTP) content
TCP Packets
HTTP protocol ...
If the browser is produced by Google, it will not use the HTTP protocol to obtain page information, but will send a request with the server side to discuss the use of the SPDY protocol.
If the browser uses the HTTP protocol, it sends such a request to the server:
get/http/1.1host:google.com[Other head]
"Other Headers" contains a series of key-value pairs separated by colons that are formatted in accordance with the HTTP protocol standard, separated by a newline character. Here we assume that the browser does not violate the HTTP protocol standard bug, and the browser uses the HTTP/1.1
protocol, otherwise the header may not contain the Host
field, and GET
the version number in the request will become HTTP/1.0
or HTTP/0.9
.
http/1.1 defines the option "close" for "closed connection", which is used by the sender to indicate that the connection will be broken after the response has ended:
Connection:close
http/1.1 that do not support persistent connections must include the "close" option in each message.
After sending these requests and headers, the browser sends a newline character indicating that the content to be sent is finished.
The server side returns a response code that indicates the status of this request, in the form of a response:
ok[Response Head]
Then there is a newline, followed by payload (payload), which is www.google.com
the HTML content. The connection may be closed under the server, and the server will keep the connection open for subsequent requests to be reused if the client requests remain connected.
If the browser sends an HTTP header that contains enough information, such as an Etag header, so that the server can determine that the file version of the browser cache has not changed since the last fetch, the server may return the response:
304 not modified[response Head]
This response does not have a payload and the browser will remove the desired content from its own cache.
After parsing the HTM L, the browser and the client will repeat the process until all the resources (Pictures, Css,favicon.ico, and so on) that were introduced to the HTML page are all taken, except that the head is GET / HTTP/1.1
changed GET /$(相对www.google.com的URL) HTTP/1.1
.
If HTML introduces a www.google.com
resource other than the domain name, the browser will go back to the first step of resolving the domain name, follow the steps below, and the header in the request Host
will become another domain name.
HTTP Server Request Processing
HTTPD (HTTP Daemon) processes the request/corresponding on the server side. The most common HTTPD are Apache and nginx commonly used on Linux, as well as IIS on Windows.
HTTPD Receiving requests
-
-
The server splits the request into several parameters:
-
- HTTP Request Methods (GET, POST, HEAD, PUT and DELETE). In the case of Google access, the GET method is used
- Domain Name: google.com
- Request Path/page:/(We did not request google.com under the specified page, so/is the default path)
The server verifies that a google.com virtual host is already configured on it
Server validation google.com Accept GET method
The server verifies that the user can use the GET method (based on IP address, identity information, etc.)
If the server has URL rewrite modules installed (for example, Apache mod_rewrite and IIS URL rewrite), the server will attempt to match the rewrite rules, and if so, the server will rewrite the request according to the rules.
The server obtains the corresponding response content according to the request information, in this case, because the access path is "/", will access the home page file (you can override this rule, but this is the most common).
The server uses the specified handler analysis to process the file, and if Google uses PHP, the server will parse the index file using PHP and capture the output, returning the output from PHP to the requestor
The story behind the browser
After the server has provided the resources (HTML,CSS,JS, pictures, etc.), the browser performs the following actions:
- Parsing Html,css,js
- Rendering--Drawing with layout---rendering---for building the DOM tree
Browser
The function of the browser is to retrieve the resources you want from the server and then display them in the browser window. Resources are usually HTML files, or they can be PDFs, pictures, or other types of content. The location of the resource is determined by the user-supplied URI (Uniform Resource Identifier).
The way the browser interprets and presents the HTML file is described in detail in the HTML and CSS standards. These standards are maintained by the Web Standards Organization (World Wide Web Consortium).
The user interface of different browsers is very close, there are many common UI elements:
- An address bar
- Back and Forward Buttons
- Bookmark options
- Refresh and Stop buttons
- Home button
Browser high-level architecture
The components that make up the browser are:
- the user interface user interface contains the address bar, forward and backward buttons, bookmarks menu, and so on, all the content you see in addition to the request page is part of the user interface.
- The browser engine browser engine is responsible for making the UI and rendering engine work in harmony
- The render engine rendering engine is responsible for presenting the requested content. If the requested content is HTML, the rendering engine parses the HTML and CSS and then displays the content on the screen
- network Components network components are responsible for network calls, such as HTTP requests, and so on, using a platform-independent interface, the lower layer is the specific implementation of different platforms
- UI back end The UI backend is used to draw basic UI components, such as drop-down list boxes and windows. The UI backend exposes a unified platform-agnostic interface that is implemented using the operating system's UI approach
- Javascript engine JavaScript engine for parsing and executing JavaScript code
- The data storage Data storage component is a persistent layer. Browsers may need to store a wide variety of data locally, such as cookies. Browsers also need to support storage mechanisms such as Localstorage,indexeddb,websql and FileSystem
HTML parsing
The browser rendering engine obtains the requested document from the network layer, and in general, the document is divided into 8kB-sized chunked transmissions.
The main task of the HTML parser is to parse the HTML document and generate the parse tree.
A parse tree is a tree with DOM elements and attributes as nodes. The DOM is an abbreviation for the Document Object model, which is an object representation of an HTML document and an interface for external (such as JavaScript) HTML elements. The root of the tree is the "Document" object. The entire DOM and HTML document is almost a one-to-one relationship.
Analytic algorithm
HTML cannot be parsed using a common top-down or bottom-up approach. The main reasons are as follows:
- The "tolerant" nature of the language itself
- HTML itself may be incomplete, for common deformity, the browser needs to have a traditional fault-tolerant mechanism to support them
- The parsing process needs to be repeated. For other languages, the source code does not change during parsing, but for HTML, dynamic codes, such as the document.write () method contained in the script element, add content to the source, meaning that the parsing process actually changes the input content
Due to the inability to use commonly used parsing techniques, the browser created a parser specifically for parsing HTML. The analytic algorithm is introduced in detail in the HTML5 standard specification, the algorithm mainly contains two stages: labeling (tokenization) and tree building.
After parsing is finished
The browser starts loading the external resources of the Web page (CSS, images, Javascript files, etc.).
At this point the browser marks the document as "Interactive," and the browser starts parsing scripts in "deferred" mode, which is the script that needs to be executed after the document has been parsed. The status of the document then changes to done, and the browser loads the event.
Note When parsing an HTML page, there is never a "syntax error", and the browser fixes all errors and then continues parsing.
Executes the synchronous Javascript code.
CSS parsing
- Analysis of CSS files and tags based on CSS lexical and syntactic
<style>
content
- Each CSS file is parsed into a Stylesheet object that contains CSS rules with selectors and objects that correspond to CSS syntax.
- The CSS parser may be top-down or a bottom-up parser generated using the parser Builder
Page rendering
- Create a "Frame tree" or "Render tree" by traversing the DOM node tree and calculate individual CSS style values for each node
- By accumulating the width of the child nodes, the horizontal padding (padding), Border (border), and margin (margin) of the node, the bottom-up calculation of the selected (preferred) width of each node in the "Frame tree"
- Calculates the actual width of each node by assigning a feasible width to the child nodes of each node from top to bottom
- Calculates the height of each node from the bottom up by applying text wrapping, the height of the accumulated child nodes, and the padding (padding), Border (border), and margin (margin) of this node
- Build the coordinates of each node using the above calculation results
- There are more complex calculations when there are elements
floated
in use, locations, absolutely
or attributes, relatively
see HTTP://DEV.W3.ORG/CSSWG/CSS2/and Http://www.w3.org/Style/CSS/current-work
- Create layers (tiers) to represent which parts of a page can be drawn in groups without being re-rasterized. Each frame object is assigned to a single layer
- Each layer on the page is assigned a texture (?)
- Each layer's frame object is traversed, and the computer executes a drawing command to draw each layer, which may be rasterized by the CPU, or plotted directly on the GPU via the D2D/SKIAGL
- All of the above steps may take advantage of the values computed at the most recent page rendering, which can reduce a lot of computational
- The final position of each layer is calculated, a set of commands is emitted by the Direct3d/opengl, the GPU command buffer is emptied, the command is transmitted to the GPU and rendered asynchronously, and the frame is sent to the window Server.
GPU rendering
- During rendering, the graphics processing layer may use a general-purpose CPU, or the graphics processor GPU may be used
- When using the GPU for graphical rendering, graphics-driven software divides the task into multiple parts, which take advantage of the powerful parallel computing power of the GPU for a large number of floating-point computations during rendering.
Window Server
Post-rendering and user-initiated processing
After rendering is complete, the browser runs JavaScript code (such as Google Doodle Animations) or interacts with the user based on certain time mechanisms (such as searching for a keyword in the search bar for search suggestions). Plug-ins like Flash and Java will also run, although not on the Google home page. These scripts can trigger network requests, and may change the content and layout of the Web page, resulting in another round of rendering and drawing.
Reprinted from: Https://github.com/skyline75489/what-happens-when-zh_CN
Original: Https://github.com/alex/what-happens-when
When ... What happened when?