How does the Internet Work?

Ever wonder what happens when you click a link? 🌐 How The Internet Works takes you behind the scenes of the digital world, breaking down complex tech into simple, bite-sized insights. From data packets to servers and beyond, discover the magic that powers your online experience! (Hook written with the help of AI, because I can't :D)

What happens when you go to google.com?

The "g" key is pressed

Let me explain the physical keyboard actions and the OS interrupts. When you press the "g" key, the browser registers the event, triggering the auto-complete functions. Based on your browser's algorithm and whether you're in regular or private/incognito mode, various suggestions appear in a dropdown beneath the URL bar.

These suggestions are typically prioritized and sorted using factors such as your search history, bookmarks, cookies, and popular internet searches. As you continue typing "google.com," numerous processes run in the background, and the suggestions refine with each keystroke. The browser might even predict "google.com" before you've finished typing.

Browsing Autocomplete Sequences

The "enter" key bottoms out

To establish a starting point, let's consider the Enter key on a keyboard when it reaches the bottom of its travel range. At this moment, an electrical circuit dedicated to the Enter key is closed (either mechanically or capacitively), allowing a small current to flow into the keyboard's logic circuitry. This circuitry scans the state of each key switch, filters out electrical noise from the rapid closure of the switch (debouncing), and translates the action into a keycode—in this case, the integer 13. The keyboard controller then encodes this keycode for transmission to the computer. Today, this is almost always done over a Universal Serial Bus (USB) or Bluetooth connection, though older systems used PS/2 or ADB.

In the case of a USB keyboard:

The keyboard is powered by a 5V supply delivered through pin 1 of the computer's USB host controller.
The keycode generated by the keypress is stored in an internal register known as the "endpoint."
The USB host controller polls this "endpoint" roughly every 10ms (the minimum interval set by the keyboard), retrieving the stored keycode.
The keycode is sent to the USB Serial Interface Engine (SIE), where it is converted into one or more USB packets in accordance with the USB protocol.
These packets are transmitted over the D+ and D- lines (the two middle pins) at a maximum rate of 1.5 Mb/s, as the keyboard is classified as a "low-speed device" (per USB 2.0 standards).
The computer's host USB controller decodes this serial signal, and the Human Interface Device (HID) driver interprets the keypress. Finally, the key event is passed to the operating system's hardware abstraction layer. Sequence Diagram

In the case of a virtual keyboard (such as on touch screen devices):

When the user touches a capacitive touch screen, a small amount of current transfers to their finger. This interaction disturbs the electrostatic field of the screen’s conductive layer, creating a voltage drop at the point of contact.
The screen controller detects this and triggers an interrupt, reporting the coordinates of the touch.
The operating system then alerts the currently active application that a press event has occurred within its graphical interface, typically on a virtual keyboard button.
The virtual keyboard application raises a software interrupt, which notifies the operating system of a "key pressed" event.
The focused application receives this notification and processes the keypress accordingly. Sequence Diagram Describing the same

Interrupt Fires [Not for USB Keyboards]

For non-USB keyboards, such as those using legacy connections (e.g., PS/2), the keyboard signals an interrupt via its interrupt request line (IRQ). This IRQ is mapped to an interrupt vector (an integer) by the system's interrupt controller. The CPU consults the Interrupt Descriptor Table (IDT), which links each interrupt vector to a corresponding function known as an interrupt handler, supplied by the operating system’s kernel.

When the interrupt is triggered, the CPU uses the interrupt vector to index into the IDT and execute the appropriate interrupt handler. This process causes the CPU to transition into kernel mode, allowing the operating system to manage the keypress event.

A WM_KEYDOWN Message is Sent to the App (On Windows)

When the Enter key is pressed, the Human Interface Device (HID) transport passes the key down event to the KBDHID.sys driver, which converts the HID usage data into a scan code. In this case, the scan code is VK_RETURN (0x0D), representing the Enter key. The KBDHID.sys driver then communicates with the KBDCLASS.sys driver (the keyboard class driver), which securely manages all keyboard input. Before proceeding, the signal may pass through any third-party keyboard filters installed on the system, though this also happens in kernel mode.

Next, Win32K.sys comes into play, determining which window is currently active by invoking the GetForegroundWindow() API. This function retrieves the window handle (hWnd) of the active application, such as the browser’s address bar. At this point, the Windows "message pump" calls SendMessage(hWnd, WM_KEYDOWN, VK_RETURN, lParam). The lParam parameter contains a bitmask that provides additional information about the keypress, including:

Repeat count (which is 0 in this case),
Scan code (which might be OEM-specific but typically standard for VK_RETURN),
Extended key flags (indicating whether modifier keys like Alt, Shift, or Ctrl were also pressed, which they weren’t).

The SendMessage API queues the message for the specific window handle. Later, the system’s main message processing function (known as WindowProc) assigned to the window (hWnd) retrieves and processes messages in the queue.

The active window in this case is an edit control, and its WindowProc function has a message handler that responds to WM_KEYDOWN events. The handler checks the third parameter (wParam) passed by SendMessage, recognizes that the value is VK_RETURN, and thus determines that the user has pressed the Enter key. This triggers the appropriate response for the application.

A KeyDown NSEvent is Sent to the App (On OS X)

When a key is pressed on OS X, the interrupt signal triggers an event in the I/O Kit keyboard driver (a kernel extension or "kext"). This driver translates the hardware signal into a key code. The key code is then passed to the WindowServer, which manages the graphical user interface.

The WindowServer dispatches the key press event to the appropriate applications (such as the active or listening ones) by sending it through their Mach port, where it is placed into an event queue. Applications with the proper privileges can access this event queue by calling the mach_ipc_dispatch function.

Most applications handle this process through the NSApplication main event loop, which is responsible for processing user input. When the event is a key press, it is represented as an NSEvent of type NSEventTypeKeyDown. The application then reads this event and responds accordingly, triggering any code related to keypress actions based on the key code received.

The Xorg Server Listens for Keycodes (On GNU/Linux)

When a key is pressed in a graphical environment using the X server, the X server employs the evdev (event device) driver to capture the keypress event. The keycode from the physical keyboard is then re-mapped into a scancode using X server-specific keymaps and rules.

Once the mapping is complete, the X server forwards the resulting scancode to the window manager (such as DWM, Metacity, i3, etc.). The window manager, in turn, sends the character or key event to the currently focused window. The graphical API of the focused window processes this event and displays the corresponding symbol in the appropriate field, using the correct font, based on the key pressed.

This flow ensures that the character is correctly rendered in the active application’s interface, completing the keypress interaction from hardware to graphical output.

Parse URL

When the browser parses the URL(Uniform Resource Locator), it extracts the following components:

Protocol: "http" The browser understands that this uses the Hyper Text Transfer Protocol to communicate with the server.
Resource: "/" This indicates that the browser should retrieve the main (index) page of the website, as the / path typically refers to the root or home page of the server.

Each of these components helps the browser interpret and fetch the desired resource from the web.

URL Parsing

Is it a URL or a Search Term?

When no protocol (e.g., "http") or valid domain name is provided, the browser interprets the text in the address bar as a potential search term. Instead of trying to resolve it as a URL, the browser forwards the text to its default web search engine.

In most cases, the browser appends a special identifier to the search query, indicating that the request originated from the browser's URL bar. This allows the search engine to handle and prioritize these searches accordingly, improving the relevance of the results based on the context.

This process helps the browser determine whether it should attempt to navigate directly to a website or provide search results based on the entered text.

Convert Non-ASCII Unicode Characters in the Hostname

The browser examines the hostname for any characters that fall outside the ASCII range, specifically those that are not in the sets of a-z, A-Z, 0-9, -, or ..
In this case, the hostname is google.com, which contains only ASCII characters, so no conversion is necessary. However, if there were non-ASCII characters present in the hostname, the browser would apply Punycode encoding to convert the hostname into a valid ASCII representation. This process ensures that all characters in the hostname can be correctly processed by the network protocols.

Check HSTS List

The browser first checks its preloaded HSTS (HTTP Strict Transport Security) list, which contains websites that have explicitly requested to be accessed only via HTTPS.

If the requested website is found on this list, the browser automatically sends the request using HTTPS rather than HTTP. If the website is not in the HSTS list, the initial request is sent via HTTP.

It’s important to note that a website can still implement HSTS without being included in the preloaded list. In such cases, the first HTTP request made by the user will return a response instructing the browser to only send subsequent requests via HTTPS. However, this initial HTTP request could expose the user to a downgrade attack, where an attacker might intercept the request and force it to remain unencrypted. This vulnerability is why modern web browsers include the HSTS list, enhancing security for users by preventing insecure connections from being established in the first place.

DNS Lookup

The browser begins the DNS lookup process by checking if the domain is already present in its cache. (To view the DNS cache in Google Chrome, navigate to chrome://net-internals/#dns.)

If the domain is not found in the cache, the browser calls the gethostbyname library function (the specific function may vary depending on the operating system) to perform the hostname resolution.

Local Hosts File Check:
- The gethostbyname function first checks if the hostname can be resolved by referencing the local hosts file, whose location varies by operating system. This file is a simple text file that maps hostnames to IP addresses and can provide a quick resolution without querying DNS.
DNS Server Request:
- If the hostname is not cached and cannot be found in the hosts file, the browser then sends a request to the DNS server configured in the network stack. This server is typically the local router or the ISP's caching DNS server, which stores previously resolved names to speed up future requests.
ARP Process for DNS Server:
- If the DNS server is on the same subnet, the network library follows the ARP (Address Resolution Protocol) process to resolve the IP address of the DNS server, ensuring that the request is directed correctly within the local network.
- If the DNS server is on a different subnet, the network library instead follows the ARP process for the default gateway IP, which acts as an intermediary to route the request to the appropriate subnet.

This systematic approach ensures that the browser efficiently resolves domain names to IP addresses, enabling it to establish a connection to the desired website. By checking the cache first, using the local hosts file, and finally querying the DNS server, the browser minimizes the time spent on hostname resolution.

Sequence of DNS Lookup

Sequence Diagram

ARP Process

In order to send an ARP (Address Resolution Protocol) broadcast, the network stack library needs two key pieces of information: the target IP address that needs to be looked up and the MAC address of the interface that will be used to send out the ARP broadcast.

Checking the ARP Cache:

The ARP cache is first checked for an entry corresponding to the target IP address. If an entry exists, the library function returns the result in the format: Target IP = MAC.

If the Entry is Not in the ARP Cache:

If there is no entry for the target IP address, the following steps are taken:

The route table is consulted to determine whether the target IP address is on any of the subnets listed in the local route table.
- If it is found, the library uses the interface associated with that subnet.
- If not, the library defaults to using the interface that connects to the default gateway.
The MAC address of the selected network interface is then retrieved.
Sending the ARP Request:

The network library constructs and sends a Layer 2 (data link layer of the OSI model) ARP request with the following format: ARP Request:

Sender MAC: interface:mac:address:here
Sender IP: interface.ip.goes.here
Target MAC: FF:FF:FF:FF:FF:FF (Broadcast)
Target IP: target.ip.goes.here

Depending on the hardware setup between the computer and the router, the behavior of the ARP request varies:

Directly Connected:

If the computer is directly connected to the router, the router will respond with an ARP Reply (see below).

Hub:

If the computer is connected to a hub, the hub will broadcast the ARP request out of all its other ports. If the router is connected to the same "wire," it will respond with an ARP Reply (see below).

Switch:

If the computer is connected to a switch, the switch will check its local CAM/MAC table to identify which port has the MAC address being queried. If the switch has no entry for the MAC address, it will rebroadcast the ARP request to all other ports. If the switch does have an entry in its MAC/CAM table, it will send the ARP request only to the port that has the corresponding MAC address.

If the router is on the same "wire," it will respond with an ARP Reply (see below).

ARP Reply:

The ARP reply will have the following format:

Sender MAC: target:mac:address:here

Sender IP: target.ip.goes.here

Target MAC: interface:mac:address:here

Target IP: interface.ip.goes.here

Now that the network library has obtained the IP address of either the DNS server or the default gateway, it can resume its DNS process:

The DNS client establishes a socket connection to UDP port 53 on the DNS server, utilizing a source port above 1023.
If the response size exceeds the UDP limit, TCP will be used instead to accommodate the larger response.
If the local or ISP DNS server does not have the requested information, it will initiate a recursive search, querying a hierarchy of DNS servers until the SOA (Start of Authority) is reached, at which point the answer is returned.

Opening of a Socket

Once the browser receives the IP address of the destination server, it combines this with the port number specified in the URL (where HTTP defaults to port 80 and HTTPS to port 443). The browser then makes a call to the system library function named socket, requesting a TCP socket stream using AF_INET or AF_INET6 and SOCK_STREAM.

Transport Layer Processing:

This request is first processed by the Transport Layer, where a TCP segment is crafted. The destination port is added to the header, and a source port is chosen from within the kernel’s dynamic port range (as specified by ip_local_port_range in Linux).

Network Layer Processing:

This segment is then sent to the Network Layer, which wraps it in an additional IP header. The IP addresses of both the destination server and the current machine are inserted to form a packet.

Link Layer Processing:

The packet next arrives at the Link Layer, where a frame header is added. This header includes the MAC address of the machine’s NIC (Network Interface Card) as well as the MAC address of the gateway (local router). If the kernel does not know the MAC address of the gateway, it must broadcast an ARP query to find it.

At this point, the packet is ready to be transmitted through one of the following methods:

Ethernet
WiFi
Cellular Data Network

For most home or small business Internet connections, the packet will pass from your computer, possibly through a local network, and then through a modem (Modulator/Demodulator). This modem converts digital 1’s and 0’s into an analog signal suitable for transmission over telephone, cable, or wireless telephony connections. On the other end of the connection, another modem converts the analog signal back into digital data for processing by the next network node, where the from and to addresses would be analyzed further.

In contrast, larger businesses and some newer residential connections will use fiber or direct Ethernet connections, allowing the data to remain digital and be passed directly to the next network node for processing.

Eventually, the packet will reach the router managing the local subnet. From there, it will continue to travel to the autonomous system’s (AS) border routers, traverse other ASes, and finally arrive at the destination server. Each router along the way extracts the destination address from the IP header and routes it to the appropriate next hop. The time to live (TTL) field in the IP header is decremented by one for each router that processes it. The packet will be dropped if the TTL field reaches zero or if the current router has no space in its queue (which may occur due to network congestion). This send and receive process happens multiple times following the TCP connection flow:

The client chooses an Initial Sequence Number (ISN) and sends a packet to the server with the SYN bit set to indicate it is setting the ISN.
The server receives the SYN and, if it is agreeable, performs the following:
- Chooses its own initial sequence number.
- Sets the SYN bit to indicate it is choosing its ISN.
Copies the (client ISN + 1) to its ACK field and adds the ACK flag to indicate it is acknowledging receipt of the first packet.
The client acknowledges the connection by sending a packet that:
- Increases its own sequence number.
- Increases the receiver acknowledgment number.
- Sets the ACK field.
Data Transfer: Data is transferred as follows:
- As one side sends N data bytes, it increases its sequence number (SEQ) by that number.
- When the other side acknowledges receipt of that packet (or a string of packets), it sends an ACK packet with the acknowledgment (ACK) value equal to the last received sequence from the other side.
Closing the Connection: To close the connection:
- The side initiating the closure sends a FIN packet.
- The other side acknowledges the FIN packet and sends its own FIN.
- The initiating side acknowledges the other side’s FIN with an ACK.

Sequence Diagram of Opening of a Socket

Opening of a Socket: Sequence Diagram

TLS Handshake

The client computer sends a ClientHello message to the server, which includes its Transport Layer Security (TLS) version, a list of available cipher algorithms, and compression methods.
In response, the server replies with a ServerHello message that specifies the TLS version, the selected cipher, the selected compression methods, and the server's public certificate signed by a Certificate Authority (CA). This certificate contains a public key that will be used by the client to encrypt the remainder of the handshake until a symmetric key can be agreed upon.
The client verifies the server's digital certificate against its list of trusted CAs. If trust can be established based on the CA, the client generates a string of pseudo-random bytes and encrypts this string using the server's public key. These random bytes will be used to determine the symmetric key.
The server decrypts the random bytes using its private key and utilizes these bytes to generate its own copy of the symmetric master key.
The client sends a Finished message to the server, encrypting a hash of the transmission that has occurred up to this point with the symmetric key.
The server generates its own hash and then decrypts the hash sent by the client to verify that it matches. If the hashes match, the server sends its own Finished message back to the client, which is also encrypted with the symmetric key.
From this point forward, the TLS session transmits application (HTTP) data encrypted with the agreed-upon symmetric key.

This handshake process establishes a secure connection between the client and server, ensuring that data transmitted over the connection is protected from eavesdropping and tampering.

If a Packet is Dropped

Sometimes, due to network congestion or flaky hardware connections, TLS packets may be dropped before reaching their final destination. In such cases, the sender must decide how to react. The algorithm governing this response is known as TCP congestion control. The specific implementation can vary depending on the sender, with the most common algorithms being Cubic on newer operating systems and New Reno on many others.

The client chooses a congestion window based on the maximum segment size (MSS) of the connection.
For each packet acknowledged, the congestion window doubles in size until it reaches the 'slow-start threshold.' In some implementations, this threshold is adaptive and can change based on network conditions.
Once the slow-start threshold is reached, the window increases additively for each packet acknowledged. If a packet is dropped, the window reduces exponentially until another packet is acknowledged.

This congestion control mechanism helps optimize network performance and stability, ensuring that data can be transmitted efficiently while minimizing the impact of packet loss.

HTTP Protocol

If the web browser used was developed by Google, instead of sending a standard HTTP request to retrieve a page, it may attempt to negotiate an "upgrade" from HTTP to the SPDY protocol with the server.

If the client is using the HTTP protocol and does not support SPDY, it sends a request to the server in the following format:

GET / HTTP/1.1
             Host: google.com
             Connection: close
             [other headers]

Here, [other headers] refers to a series of colon-separated key-value pairs formatted according to the HTTP specification and separated by single newlines. This assumes that the web browser is free of bugs that violate the HTTP specification and that it is using HTTP/1.1. If it were using a different version, such as HTTP/1.0 or HTTP/0.9, it might not include the Host header in the request.

HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after the response is completed. For example:

Connection: close

HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message.

After sending the request and headers, the web browser sends a single blank newline to the server to indicate that the content of the request is complete.

The server then responds with a response code that denotes the status of the request, structured as follows:

200 OK
             [response headers]

This is followed by a single newline and then the payload containing the HTML content of www.google.com. The server may either close the connection or, if requested by the headers sent by the client, keep the connection open for reuse in further requests.

If the HTTP headers sent by the web browser contained sufficient information for the web server to determine whether the version of the file cached by the web browser has been unmodified since the last retrieval (for example, if the web browser included an ETagheader), the server may instead respond with:

304 Not Modified
             [response headers]

This response will have no payload, and the web browser will retrieve the HTML from its cache.

After parsing the HTML, the web browser (and server) repeats this process for every resource (image, CSS, favicon.ico, etc.) referenced in the HTML page. In these cases, instead of GET / HTTP/1.1, the request will be structured as:

GET /$(URL relative to www.google.com) HTTP/1.1

If the HTML references a resource on a different domain than www.google.com, the web browser returns to the steps involved in resolving the other domain, following all steps up to this point for that domain. The Host header in the request will be set to the appropriate server name instead of google.com.

HTTP Server Request Handling

The HTTPD (HTTP Daemon) server is responsible for handling requests and responses on the server side. The most common HTTPD servers include Apache and Nginx for Linux, as well as IIS for Windows.

Receiving the Request: The HTTPD server receives the incoming request from the client.
Breaking Down the Request: The server analyzes the request and extracts the following parameters:
- HTTP Request Method: This could be one of several methods, including GET, HEAD, POST, PUT, PATCH, DELETE, CONNECT, OPTIONS, or TRACE. In the case of a URL entered directly into the address bar, the method will typically be GET.
- Domain: In this case, the domain is google.com.
- Requested Path/Page: Here, the requested path is /, indicating that no specific page was requested; thus, / is treated as the default path.
Verifying the Virtual Host: The server checks whether a Virtual Host is configured for google.com.
Method Verification: The server verifies that google.com can accept GET requests.
Client Permission Check: The server checks if the client is allowed to use this method based on criteria such as IP address, authentication, etc.
Request Rewriting: If the server has a rewrite module installed (such as mod_rewrite for Apache or URL Rewrite for IIS), it attempts to match the request against any configured rules. If a matching rule is found, the server rewrites the request according to that rule.
Content Retrieval: The server retrieves the content that corresponds to the request. In this case, it will typically default to the index file since the request path is /. While there are cases that can override this behavior, using the index file is the most common method.
File Parsing and Processing: The server parses the index file according to the designated handler. If Google is using PHP, for example, the server will utilize PHP to interpret the index file and stream the output back to the client.

By following these steps, the HTTPD server efficiently processes incoming requests and returns the appropriate responses to the client.

Browser

The primary functionality of a browser is to present the web resources you choose by requesting them from a server and displaying them in the browser window. The resource is typically an HTML document but may also include PDFs, images, or other types of content. The location of the resource is specified by the user using a URI (Uniform Resource Identifier).

The way a browser interprets and displays HTML files is defined by the HTML and CSS specifications, which are maintained by the W3C (World Wide Web Consortium), the standards organization for the web.

Browser user interfaces share many common features, including:

An address bar for entering a URI
Back and forward buttons for navigation
Bookmarking options for saving favorite pages
Refresh and stop buttons for refreshing or halting the loading of current documents
A home button that takes you to your home page

Browser High-Level Structure

The components of a browser can be broken down as follows:

User Interface: This includes the address bar, back/forward buttons, bookmarking menu, and any other part of the browser's display except for the window where the requested page is shown.
Browser Engine: The browser engine acts as a bridge between the user interface and the rendering engine, managing actions and interactions.
Rendering Engine: Responsible for displaying requested content, the rendering engine parses HTML and CSS, transforming the parsed content into a visual representation on the screen.
Networking: This component handles network calls, such as HTTP requests, and utilizes different implementations tailored for various platforms while providing a platform-independent interface.
UI Backend: The UI backend is responsible for drawing basic widgets like combo boxes and windows. It exposes a generic interface that is not specific to any platform and relies on the operating system's user interface methods.
JavaScript Engine: This engine parses and executes JavaScript code, allowing for dynamic content and interactivity within web pages.
Data Storage: This acts as a persistence layer, enabling the browser to save various types of data locally, such as cookies. Browsers also support storage mechanisms like localStorage, IndexedDB, WebSQL, and FileSystem.

Each of these components works together to create a seamless browsing experience, allowing users to access and interact with web resources efficiently.

HTML Parsing

The rendering engine begins retrieving the contents of the requested document from the networking layer, typically in 8 kB chunks. The primary responsibility of the HTML parser is to transform the HTML markup into a structured representation known as a parse tree.

The output tree, referred to as the "parse tree," consists of a hierarchy of DOM (Document Object Model) element and attribute nodes. The DOM serves as the object representation of the HTML document, providing an interface for HTML elements to interact with external scripts, such as JavaScript. The root of this tree is the "Document" object, and prior to any scripting manipulations, the DOM maintains an almost one-to-one correspondence with the original markup.

The Parsing Algorithm

HTML cannot be parsed effectively using traditional top-down or bottom-up parsers due to several factors:

Forgiving Nature of the Language: HTML is designed to be lenient with syntax errors, allowing browsers to display content even when the markup is not perfectly structured.
Browser Error Tolerance: Browsers are built to handle common cases of invalid HTML, ensuring that users have a functional experience.
Reentrancy of the Parsing Process: In other programming languages, the source remains unchanged during parsing. However, in HTML, dynamic elements (like <script> tags containing document.write() calls) can modify the input during parsing, which necessitates a different approach. Because of these challenges, browsers employ a custom parser tailored for HTML. The parsing algorithm is thoroughly described in the HTML5 specification and consists of two primary stages: tokenization and tree construction.

Actions When Parsing is Finished

Once the parsing is complete, the browser proceeds to fetch external resources linked to the page, such as CSS stylesheets, images, and JavaScript files. At this point, the browser marks the document as interactive and begins parsing scripts that are in "deferred" mode, meaning those scripts are intended to execute after the document has been fully parsed. The document state is then set to "complete," and a "load" event is triggered.

Importantly, browsers do not generate an "Invalid Syntax" error for HTML pages. Instead, they automatically correct any invalid content and continue processing the document, ensuring that users can view web pages with minimal disruption.

CSS Interpretation

The process of CSS interpretation involves several key steps:

Parsing CSS Files: The browser parses external CSS files, the contents within <style> tags, and the values within style attributes. This parsing follows the "CSS lexical and syntax grammar," which defines the rules and structure of valid CSS.
Creating StyleSheet Objects: Each parsed CSS file is transformed into a StyleSheet object. Each StyleSheet object encapsulates the CSS rules, including selectors and the corresponding CSS declarations. This structured representation allows for efficient access and manipulation of styles.
Parsing Techniques: The CSS parser can utilize either top-down or bottom-up parsing techniques, depending on the specific parser generator employed. These techniques determine how the parser reads and processes the CSS rules, affecting the efficiency and accuracy of the parsing process.

Through this interpretation, the browser builds a comprehensive understanding of how to apply styles to the HTML elements in the DOM, facilitating the rendering of the web page with the intended visual presentation.

Page Rendering

The rendering process of a web page involves several structured steps:

Creating the Frame Tree: The rendering engine constructs a 'Frame Tree' or 'Render Tree' by traversing the DOM nodes and calculating the computed CSS styles for each node. This tree represents the visual structure of the page.
Calculating Preferred Width: The preferred width for each node in the Frame Tree is calculated in a bottom-up manner. This involves summing the preferred widths of the child nodes along with the node's horizontal margins, borders, and padding.
Calculating Actual Width: The actual width of each node is determined in a top-down approach by distributing the available width among its children based on their needs.
Calculating Height: The height of each node is calculated bottom-up by applying text wrapping and summing the heights of the child nodes along with the node's margins, borders, and padding.
Determining Node Coordinates: The coordinates of each node are computed using the width and height information gathered in the previous steps.
Handling Complex Elements: More intricate calculations are performed for elements that are floated, positioned absolutely or relatively, or that employ other complex features. For further details, refer to the CSS specifications at CSS2 and the current CSS work.
Creating Layers: Layers are created to describe which parts of the page can be animated together without requiring re-rasterization. Each frame/render object is assigned to a specific layer.
Allocating Textures: Textures are allocated for each layer of the page to optimize rendering performance.
Executing Drawing Commands: The frame/render objects for each layer are traversed, and drawing commands are executed for their respective layers. This rendering can be handled by the CPU or directly drawn on the GPU using technologies like D2D (Direct2D) or SkiaGL.
Reusing Calculated Values: The rendering process can leverage calculated values from the previous rendering of the webpage, enabling more efficient incremental changes that require less computational work.
Compositing Layers: The final page layers are sent to the compositing process, where they are combined with other visible content, such as the browser chrome, iframes, and addon panels.
Finalizing Render Commands: The final layer positions are computed, and composite commands are issued via graphics APIs like Direct3D or OpenGL. The GPU command buffers are flushed to the GPU for asynchronous rendering, and the completed frame is sent to the window server for display.

GPU Rendering

During the rendering process, graphical computing tasks can utilize either the general-purpose CPU or the specialized graphical processor GPU.
When leveraging the GPU for graphical rendering computations, the graphical software layers divide the workload into multiple smaller tasks. This approach allows them to take full advantage of the GPU's massive parallelism, which is particularly effective for the floating-point calculations required in the rendering process.
The GPU excels in handling numerous operations simultaneously, making it well-suited for rendering complex visual content efficiently and rapidly. This parallel processing capability significantly enhances performance, especially in applications involving high-resolution graphics, animations, and real-time rendering.
As a result, using the GPU not only speeds up the rendering process but also enables more sophisticated visual effects and smoother user experiences in modern web applications and graphics-intensive software.

Benefits of GPU Rendering

This image is also rendered by the GPU

Post-Rendering and User-Induced Execution

After the rendering process is complete, the browser executes JavaScript code triggered by various events, such as timing mechanisms (like a Google Doodle animation) or user interactions (e.g., typing a query into the search box and receiving suggestions).

Plugins: Additionally, plugins such as Flash or Java may also execute, although they typically do not run at this point on the Google homepage.
Network Requests: JavaScript scripts can initiate further network requests, fetching additional resources or data as needed.
DOM Modifications: These scripts have the ability to modify the existing page or its layout, which can lead to another round of page rendering and painting. This dynamic capability allows for interactive experiences, where content can change in real-time based on user actions or other conditions, enhancing the overall functionality and responsiveness of the web application. The interaction between JavaScript execution and the rendering engine is crucial for creating rich, engaging web experiences, allowing developers to build applications that respond intuitively to user input and changing contexts.

Introduction to Bits, Signals, and Packets

The ability to deliver and exchange information over the world's communication networks has revolutionized how people work, play, and live. At the turn of the century, the U.S. National Academy of Engineering listed 20 technologies with the most significant societal impact in the 20th century. This list included life-changing innovations like electrification, the automobile, and the airplane. It also included four communication technologies—radio and television, the telephone, the Internet, and computers—whose technological underpinnings are the focus of this book.

Surprisingly, the Internet ranked only #13, as it was developed late in the century, and the committee believed its most significant impacts would occur in the 21st century. That prediction seems accurate: the spread of wireless networks and mobile devices, the rise of social networks, and anytime, anywhere communication have changed commerce, social connections, and even driven societal and political changes.

Communication is fundamental to modern life. It's hard to imagine life without the Internet, its applications, or networked mobile devices. By early 2011, over 5 billion mobile phones were active worldwide, over a billion with "broadband" connectivity – exceeding the number of people with electricity, shoes, toothbrushes, or toilets in 2011!

Internet's Impact on Communication and Society

Objectives

This post/lesson(whatever you call it) aims to explain how communication networks work. This is worth studying for two reasons:

to understand the design principles and analysis techniques used in these systems;
because the technical ideas are relevant to other fields in computer science (CS) and electrical engineering (EE). This makes studying communication systems a great way to learn broadly applicable concepts.

At MIT(Massachusetts Institute of Technology), sophomores take such a course, with some exposure to probability and Fourier series.

Traditionally, "low-level communication" (how information moves across a single link) has been considered an EE topic, while "networking" (building networks of multiple links) has been a CS topic. Traditional digital communication courses rarely address network building, and computer network courses treat communication over physical links as a black box. This book aims to bridge this gap, providing a comprehensive understanding of both aspects.

Understanding Communication Networks

We'll cover communication systems end-to-end: from the source with information to transmit, to packets (messages broken down for transmission), to bits ("0" or "1"), to signals (analog waveforms sent over links like wires, fiber, radio, or acoustic waves). We'll examine diverse networks: simple point-to-point links, shared media with multiple nodes, and larger multi-hop networks connected to form even bigger networks.

Themes

Three challenges are central to digital communication: reliability, sharing, and scalability. This introductory section focuses on the first two.

Digital Communication

Reliability

Many factors make communication unreliable. We'll study techniques to improve reliability, all of which employ redundancy to achieve reliability with unreliable components, relying on independent component failures.

The main challenge is overcoming noise, interference, bit errors, packet losses (from uncorrectable errors, queue overflows, or link/software failures), all of which degrade communication quality.

Besides reliability, speed is also important. Reliability techniques often use redundancy, reducing speed. Many communication systems balance reliability and speed.

Communication speeds have increased dramatically, from kilobits per second in the early 1980s to 100+ Megabits per second wirelessly and 1-10 Gigabits per second over wired links today.

We'll explore why communication is unreliable and how to address it, using error-correcting codes, handling inter-symbol interference, retransmission protocols, and fault-tolerant routing.

Dedicated links for every node pair are prohibitively expensive. Sharing is essential. We'll study sharing point-to-point links, shared media, and multi-hop networks.

We'll cover sharing a medium (relevant to Ethernet, WiFi, cellular networks, and satellite networks), modulation/demodulation (transmitting over different frequencies), and medium access control (MAC) protocols (rules governing network node behavior). We'll explore time sharing (each node transmits for a dedicated time) and frequency sharing (dividing bandwidth). Then we'll move on to multi-hop networks, where many communications share links, orchestrated by switches.

Key questions include how multiple communications share the network, how messages traverse the network, and how to ensure reliable communication across a multi-hop network.

Sharing techniques and reliability mechanisms determine network efficiency. Efficiency can be framed as minimizing cost for given requirements or maximizing "useful work" (aggregate throughput, throughput variation, and average delay/latency) for a given network. This post focuses on throughput and latency.

Scalability

Scalability (designing networks that handle large sizes) is important. This post touches upon it briefly, leaving detailed discussion for later lessons.

Outline and Plan

The lesson is divided into four parts: the source, and the abstractions of bits, signals, and packets, studied in that order.

The source: We begin with the basics of information, entropy, source coding (data compression), Huffman codes, and the Lempel-Ziv-Welch algorithm.
Bits: We address overcoming bit errors with error-correcting codes: linear block codes and convolutional codes.
Signals: We cover modulation/demodulation, modeling signal distortions with linear time-invariant (LTI) channels, time/frequency domain representations, and frequency response of channels/filters.
Packets: We study medium sharing with MAC protocols, routing in multi-hop networks, and reliable data transport protocols.

Information, Entropy, and the Motivation for Source Codes

Claude Shannon's theory of information (developed in the late 1940s) is a groundbreaking idea that has transformed many technological fields, especially communication systems and networks. This chapter introduces the intuition behind information, defines it mathematically, and links it to entropy, a property of data sources.

These concepts enable efficient data compression before communication or storage, allowing for recovery of the original data without distortion. A core idea is source coding, which maps each symbol from a data source to a codeword with desirable properties. A message is a sequence of symbols. Our focus is lossless source coding, where the original message can be perfectly recovered from an uncorrupted transmission.

Information and Entropy

Shannon, building on Hartley's work, realized that information can be defined generally, independent of the application or message semantics. Communication involves a sender (S) choosing and sending one of several possible messages to a receiver (R). For example, S could indicate the British arrival route:

"1" if by land
"2" if by sea

If each choice is equally likely (no prior knowledge), the information conveyed is 1 bit. This bit can encode the choice. 1000 such independent events can be encoded with 1000 bits.

If prior knowledge suggests a higher probability for one choice (e.g., land due to a storm), then the less likely message (sea) conveys more information. Similarly, a temperature of 75°F in Boston in January is more informative than 32°F.

Information about an event depends on its probability (p). Lower probability (less likely event) implies higher information.

Definition of information

Hartley defined information (I) as:

I = log(1/p) = -log(p) (2.1)

The base-2 logarithm is used, and the unit of information is a bit. The logarithmic function ensures additivity: information from two independent events A and B (probabilities pA and pB) adds up:

IA + IB = log(1/pA) + log(1/pB) = log(1/(pA*pB)) = log(1/P(A and B))

Examples

Information Content of Decimal Digit Events

Entropy

Entropy (H) quantifies the expected information from a set of mutually exclusive events. If event i has probability pi:

H(p1, p2, ... pN) = Σ pi * log(1/pi) (2.2)

Solving Process of the Equation above

Entropy is measured in bits and represents the average uncertainty. For two mutually exclusive events with probabilities p and 1-p:

H(p, 1-p) = -plog(p) - (1-p)log(1-p) (2.3)

H(p) is symmetric around p = 1/2, with a maximum of 1 bit at p = 1/2. H(0) = H(1) = 0. Entropy is always non-negative and H(p1, p2, ... pN) ≤ log N.

Source Codes

Source coding efficiently encodes messages. Many messages have standard encodings (ASCII, image pixels, audio samples). These are fixed-length encodings, easily manipulated.

However, these encodings can be inefficient. In English text, 'e' occurs more frequently than 'x'. Encoding 'e' with fewer bits and 'x' with more bits can shorten the average message length. This aligns with the information concept: frequent symbols (higher pi) convey less information and need fewer bits.

A code maps information to bit sequences. A codeword is a bit sequence in the code. Source codes aim to compress data by matching encoded message length to the information content (entropy).

Example: encoding 1000 grades (A, B, C, D) with probabilities:

Fixed-length encoding: 2 bits/grade (total 2000 bits). Decoding is simple, but inefficient.

Variable-length encoding (example): A=10, B=0, C=110, D=111. Length depends on the message. Decoding requires sequential processing. This example code is not optimal.

How Much Compression Is Possible?

Ideally, compression uses only the necessary bits to represent the information. Entropy (Equation 2.2) provides the lower bound on the average number of bits needed to avoid ambiguity. Sending fewer bits leads to unresolved choices at the receiver.

In the grades example, the expected information per grade is 1.626 bits. Encoding 1000 grades requires 1626 bits on average. The example variable-length code uses 1667 bits, so it's not optimal. Encoding sequences of grades can improve compression.

Finding good codes is challenging. Sometimes, context-specific codes can be very efficient (e.g., encoding Shakespeare sonnets using just 8 bits if both sender and receiver know all the sonnets).

Why Compression?

Compression offers several advantages:

Faster transmission: Shorter messages reduce transmission time and free up network capacity.
Efficient resource use: Smaller messages conserve network resources (bandwidth, buffers), accommodating more communications.
Improved throughput over error-prone links: Compression before error-correction coding allows for optimized redundancy for error resilience.

Compression is typically an end-to-end function (application layer), but can also be applied at the link layer if the data is compressible and contains redundancy. The next chapter covers Huffman Codes and Lempel-Ziv-Welch (LZW) compression.

Compression Algorithms: Huffman and Lempel-Ziv-Welch (LZW)

This chapter discusses two source coding algorithms for message compression (where a message is a sequence of symbols): Huffman coding and Lempel-Ziv-Welch (LZW). Huffman coding is efficient when symbol probabilities are known and independent. LZW is adaptive, requiring no prior knowledge of probabilities. Both are widely used (GIF, JPEG, MPEG, MP3, etc.).

Properties of Good Source Codes

A code maps symbols from an alphabet (text, pixel intensities, or abstract symbols) to codewords. Binary codewords are convenient for many communication channels.

Example: Encoding grades in 6.02: A=1, B=01, C=000, D=001. A sequence of grades could be transmitted as 0010001110100001, decoded as "DCAAABCB".

Instantaneous codes: A symbol is decoded as soon as its codeword is received. The grade encoding above is instantaneous. Non-instantaneous codes require looking ahead or decoding from the end, making them harder to decode.

Code trees and prefix-free codes: A code tree visualizes codes. In a binary code tree, edges are labeled with 0 or 1. The path from the root to a symbol gives its encoding. Prefix-free codes have symbols only at leaf nodes, ensuring no codeword is a prefix of another. Prefix-free codes are instantaneous.

Expected code length (L): For N symbols with probabilities pi and codeword lengths li: L = Σ pi li. Codes with small L are desirable for compression. An optimal code* has minimum L. Shannon showed that L ≥ H (entropy), and codes achieving entropy asymptotically exist.

Huffman Codes

Huffman codes provide efficient encoding given symbol probabilities. More likely symbols get shorter codes.

Huffman's algorithm builds the code tree bottom-up, starting with the least probable symbols:

Input: Set S of (probability, symbol) tuples.
Combine the two least probable symbols into a new tuple (combined symbol, sum of probabilities). Add the new tuple to S.
Repeat step 2 until S has only one tuple (the root).

The resulting code tree defines the variable-length code.

Properties of Huffman Codes

Non-uniqueness: Multiple optimal codes (and trees) may exist due to arbitrary tie-breaking during tree construction.
Optimality: Huffman codes have minimal expected length among instantaneous codes for independent symbols drawn from a fixed, known probability distribution.

LZW: An Adaptive Variable-length Source Code

Simple Huffman coding based on letter probabilities has limitations. Adaptive encoding, which adjusts to the message content, can perform better. LZW is a popular adaptive algorithm.

LZW builds a string table mapping symbol sequences to/from N-bit indices. The table (2^N entries) is initialized with single-byte sequences (0-255). Encoding:

Accumulate bytes while the sequence (S) is in the table.
When S + next byte (b) is not in the table:
- Transmit the code for S.
- Add S + b to the table.
- Reset S to b.
Repeat until all bytes are processed; then transmit the code for the final S.

The decoder rebuilds the table as it receives codes, enabling it to recover the original message.

LZW observations:

Greedy: Finds longest matches.
Adaptive: Table entries reflect actual message sequences.
Suboptimal: May include unused entries.
Variable compression: Compression increases as the table fills.
Table reinitialization: The table is reset when full, adapting to changing probabilities. Some variants use a CLEAR code for explicit resets.

Why Digital? Communication Abstractions and Digital Signaling

This chapter explains analog and digital communication, focusing on the problems with analog and the rationale for digital. It presents methods for sending and receiving digital data over analog communication links (necessary because physical links are fundamentally analog at the lowest level). It also introduces a layered communication model: messages → packets → bits → signals, which forms the basis for the rest of the book.

Sources of Data

Communication technologies enable users (human or application) to exchange messages. Data sources can be inherently digital (e.g., computer-generated data) or analog (e.g., video, audio, sensor data). Modern systems often digitize all data, regardless of the source.

Why Digital?

Digital communication excels for two reasons:

Modularity: The digital abstraction enables building large systems by composing modules.
Sophisticated Processing: It allows using algorithms to enhance data quality and system performance.

However, physical links are analog, requiring digital-to-analog and analog-to-digital conversions.

Why analog is natural in many applications

Analog representations map well to physical link properties. Examples:

Black-and-white TV: Image luminance (shade of gray) is represented by voltage levels (0V = black, 1V = white).
Analog telephone: Sound waves are converted to electrical signals.

Analog signals can be sent at different voltage/intensity/wavelength levels, easily measured by the receiver.

So why not analog?

No communication link is error-free. Noise and distortions perturb signals, and these errors accumulate across multiple transmission stages (cascading effect). This makes analog systems difficult to build reliably, especially complex ones. Digital signaling addresses this problem.

Digital Signaling: Mapping Bits to Signals

Digital signaling uses discrete values to represent bits, enabling robust distinction from noise. Binary signaling uses two voltages: V0 for "0" and V1 for "1". V0 and V1 must be sufficiently separated to handle noise.

The receiver uses a threshold voltage (Vth = (V0 + V1) / 2) to map received voltages to bits (voltage < Vth → "0", voltage ≥ Vth → "1"). Precisely measuring near Vth is difficult, but not crucial if V0 and V1 are far apart.

Signals in this lesson

Transmission signals are fixed-voltage waveforms held for a specific time. Continuous signals are represented by discrete-time samples. The sample rate (samples per second) is agreed upon by sender and receiver. Sample interval is the time between samples.

Clocking Transmissions

Sender and receiver must agree on a clock rate. Bits are sent on clock transitions. Samples_per_bit is the number of samples per bit. The receiver infers clock edges from transitions in received samples.

Challenges:

Clock synchronization: Sender and receiver clocks may differ. Solution: Adaptive timing at the receiver based on detected transitions.
Ensuring frequent transitions: Needed for clock recovery. Solution: Line coding (e.g., 8b/10b).

Clock and Data Recovery

Receiver clock may be slightly faster or slower than the sender's. The receiver dynamically adjusts its sampling index based on transitions:

If the halfway sample between current and previous points is the same as the current sample (sampling too late): Increment the index by samples_per_bit - 1.
If the halfway sample is different (sampling too early): Increment by samples_per_bit + 1.
If no transition: Increment by samples_per_bit.

Initial correction is aided by a training sequence of alternating 0s and 1s at the start of transmission.

Line Coding with 8b/10b

8b/10b addresses DC balance and frequent transitions. It maps 8-bit symbols to 10-bit transmission symbols, ensuring:

Maximum run of 0s or 1s is 5 bits.
Maximum difference between 1s and 0s count at any sample is 6.
Special 7-bit sequences enable packet boundary detection.

Encoding process:

Packet data is divided into bytes.
Each byte is mapped to a 10-bit symbol.
Packets are framed with a training sequence and a SYNC pattern for synchronization.

Communication Abstractions

A communication system involves several modules: Mapper (bits to signals), Demapper (signals to bits), Channel coding (error correction), Channel decoding. Messages are broken into packets and transmitted over multiple links. The three key abstractions are packets, bits, and signals. The book focuses on these and their interactions within the communication network.

Coping with Bit Errors using Error Correction Codes

This chapter discusses techniques for reliable digital communication, focusing on adding redundancy to combat inevitable bit errors in communication channels and storage media. The core concept is channel coding: encoding at the sender and decoding at the receiver to correct errors or detect uncorrectable errors. The chapter focuses on error correction codes, specifically linear block codes and (later) convolutional codes.

Bit Errors and BSC

The Binary Symmetric Channel (BSC) model characterizes bit errors with a single parameter, ε (bit-flip probability), where a transmitted bit flips with probability ε, independently of other bits. ε can be estimated empirically. Packet error probability (PER) for a packet of size S bits:

PER = 1 - (1 - ε)^S (5.1)

When ε << 1: PER ≈ Sε (5.2)

Real-world channels may exhibit burst errors where error probability depends on history (higher if recent bits were also in error).

The Simplest Code: Repetition

A repetition code encodes bit b as n copies of b. The code rate is 1/n. Maximum likelihood decoding selects the most likely message given the received codeword. For a BSC, this means choosing the codeword with the most bits in common with the received one.

Decoding error probability for repetition code (see Equation 5.3 in the original text). The probability decreases exponentially with the code rate but is inefficient due to high overhead.

Embeddings and Hamming Distance

Hamming distance (HD) between two n-bit words is the number of differing bit positions. For single error correction (SEC), HD between any two valid codewords must be at least 3. A code with minimum Hamming distance D can detect D-1 errors and correct floor(D/2 -1) errors.

Linear Block Codes and Parity Calculations

Linear block codes (n, k) map k-bit messages to n-bit codewords using linear functions (weighted sums) over message bits. Algebraic block codes perform such operations within the block. (n,k,d) codes denote block codes with a minimum Hamming distance 'd'. Code rate = k/n.

Linear codes require that the sum of any two codewords is also a codeword. The all-zeros codeword exists in any linear code. The minimum Hamming distance of a linear block code equals the smallest non-zero codeword's weight (number of 1s). Binary linear codes use modulo-2 arithmetic (Galois Field F2).

Rectangular Parity SEC Code

Parity is the modulo-2 sum of bits. Even parity code adds a parity bit to each message, making the codeword have even parity. This detects single errors (HD=2).

The rectangular parity code arranges k bits into an r x c array and adds row and column parities. Codeword: message + row parities + column parities. Length: n = rc + r + c. This code is an SEC code (HD=3). Decoding involves checking row and column parities and correcting the corresponding bit if both parities indicate an error.

How many parity bits are needed in an SEC code?

Any linear code can be converted to a systematic code (message bits followed by parity bits). For SEC, the number of parity combinations (2^(n-k)) must be greater than or equal to the number of correctable situations (n+1):

n + 1 ≤ 2^(n-k) (5.6)

Parity bit count grows at least logarithmically with message bits.

Hamming Codes

Hamming codes are efficient SEC codes with logarithmic parity bit growth. Each parity bit protects multiple data bits, and single errors produce unique parity error combinations.

Syndrome bits (Ei) are computed at the receiver by checking parities. The combination of syndrome bits indicates the erroneous bit (if any).

Is There a Logic to the Hamming Code Construction?

Hamming code construction:

Assign parity bits to indices that are powers of 2 (1, 2, 4, 8,...).
Assign data bits to remaining indices.
Data bit di is included in the parity computation for pj if and only if index(pj) contributes to index(di) in binary representation (bitwise AND is non-zero).

The syndrome bits (E3E2E1 in the (7,4) example) treated as a binary number give the index of the bit to correct.

Note: This is just the information, one needs for Web Development. For Sysops, the networks and their fundamentals are a two-semester course.