How Browsers Work
Users want web experiences with content that is fast to load and smooth to interact with.They are two goals we want to achieve.
To better achieve these goals, we need to understand how browsers work.
How Browsers Work
Two major issues in web performance are issues having to do with network latency and issues having to do with the fact that for the most part, browsers are single-threaded.
Navigation
Navigation is the process of loading a web page. It involves the following steps:
- DNS lookup
The first step in the navigation process is to look up the IP address of the server that hosts the website. This is done using the Domain Name System (DNS).If you navigate to https://example.com , the HTML page is located on the server with IP address of 93.184.216.34. If you’ve never visited this site, a DNS lookup must happen.
After this initial request, the IP will likely be cached for a time, which speeds up subsequent requests by retrieving the IP address from the cache instead of contacting a name server again.
This can be problematic for performance, particularly on mobile networks. When a user is on a mobile network, each DNS lookup has to go from the phone to the cell tower to reach an authoritative DNS server. The distance between a phone, a cell tower, and the name server can add significant latency.
- TCP handshake
Once the browser has the IP address, it can establish a connection to the server. This is done using the Transmission Control Protocol (TCP). The browser sends a SYN packet to the server, which responds with a SYN-ACK packet, and the browser sends an ACK packet back. This is known as the TCP handshake.
- TLS negotiation
If the website uses HTTPS, the browser and server must negotiate a secure connection. This is done using the Transport Layer Security (TLS) protocol. The browser sends a ClientHello message to the server, which responds with a ServerHello message, and the browser sends a Finished message. This is known as the TLS handshake.
Response
Once we have established a connection to the server, we can request the HTML page. The server responds with the HTML page, which the browser parses and renders.
Congestion control / TCP slow start
During the TCP handshake, the browser and server negotiate the maximum segment size (MSS) for the connection. The browser starts by sending a small number of segments and increases the number of segments it sends until it reaches the maximum segment size. This is known as TCP slow start.
Parsing
Once the browser receives the first chunk of data, it can begin parsing the information.Parsing is the step the browser takes to turn the data it receives over the network into a Document Object Model (DOM) tree and a CSS Object Model (CSSOM), which are used to render the page.
The browser will begin parsing and attempting to render the page as soon as it receives the first chunk of data. This is known as incremental rendering.
- Building the DOM tree
https://developer.mozilla.org/zh-CN/docs/Web/Performance/Critical_rendering_path
Above describes the process of building the DOM tree.
CRP: Critical Rendering Path
Web Performance includes the following:
- server requests and responses
- loading
- scripting
- rendering
- layout
- painting
The CRP is the sequence of steps the browser goes through to convert the HTML, CSS, and JavaScript into pixels on the screen.
A request for a web page or app starts with an HTTP request. The server sends a response containing the HTML. The browser then begins parsing the HTML, converting the received bytes to the DOM tree. The browser initiates requests every time it finds links to external resources, be it stylesheets, scripts, or embedded image references. Some requests are blocking, which means the parsing of the rest of the HTML is halted until the imported asset is handled. The browser continues to parse the HTML making requests and building the DOM, until it gets to the end, at which point it constructs the CSS object model. With the DOM and CSSOM complete, the browser builds the render tree, computing the styles for all the visible content. After the render tree is complete, layout occurs, defining the location and size of all the render tree elements. Once complete, the page is rendered, or ‘painted’ on the screen.
Document Object Model (DOM)
DOM construction is incremental.
CSS Object Model (CSSOM)
CSSOM construction is incremental.CSS is render blocking.
Render tree
The render tree is the combination of the DOM and CSSOM. It is used to render the page.
To contrcut the render tree, the browser will:
- Traverse the DOM tree
- Match the CSSOM rules to the DOM nodes
- Apply the CSSOM rules to the DOM nodes
- Construct the render tree
Layout
Layout is the process of determining the size and position of each element on the page. The browser will:
- Traverse the render tree
- Calculate the size and position of each element
- Determine the flow of the page
Painting
Painting is the process of filling in pixels on the screen. The browser will:
- Traverse the render tree
- Paint the pixels on the screen