Web latency is a solved problem but is still regularly under optimised for due to complexity in managing caches, rendering frameworks and dev teams. Thinking about theoretically optimal websites can help reset our understanding of what is possible and remind ourselves of just how fast websites can be. To keep things simple we can ignore the baggage of rendering frameworks and external API calls and consider a single HTML file being delivered to the client (and this is not overly simplistic - you can probably fit the majority of a modern interface shell into a single HTML file).
The fastest loading experience possible is a HTML file that is already cached locally in the browser such that no external network request is required. This is not possible for the first page load but it is possible for subsequent loads and can be achieved with cache headers on the website assets (be that a single HTML file or combination of HTML/JS/CSS). There is complexity in choosing a cache expiry that matches the acceptable staleness of the website, for example, you wouldn't want your interface to be cached forever since you'll likely make updates to the style and form but an hour or so is probably fine.
The second fastest loading experience, and the fastest experience for a first page load, is a single HTML file that is cached in a CDN and delivered by a server that is located nearby to the visitor. You can expect page loads in under 300ms this way and often below 50ms. There is again some complexity in managing cache expiry but with a modern CDN (e.g. Cloudflare) it's possible to purge the CDN cache on demand which means you could cycle the file before expiry if you needed to (consider that this is not possible in the browser cache since you cannot make a purge request to a browser). The main complexity with this architecture is the requirement to pre-generate your website files upfront and seed the CDN. This was easier in the days of handwritten HTML websites and SPA applications (single static bundle delivered to client) but has become more complicated in the modern era of SSR where we expect to generate dynamic assets per-page and per-user at runtime. Still, it is possible to pre-generate static pages and have these cached in a CDN upfront and expire/purge when content changes.
The third most optimal page load - the one which most website frameworks use today - is a HTML file that is served by an edge server (e.g. Cloudflare workers, Vercel). In this case the file is not cached in the CDN but is generated at runtime and served by a machine that is located nearby to the visitor. Modern edge server networks are globally distributed and have a similar reach to CDNs meaning that latency can be very small even for a dynamic server-side rendered website. Latency is introduced via the runtime of your application which generates the HTML file (e.g. Next, Remix, Node) and the time taken for your cloud provider to provision your runtime which could be neglible if your website has been served recently in that location (warm start) or could be 100ms+ if a new instance needs to be provisioned (cold start). Given that your users are distributed across the globe and traffic is lumpy then you are relying on your cloud provider to predict traffic, auto-scale and manage cold starts (similar considerations can apply for a vanilla CDN).
The vast majority of latency on the web comes from code and design which is upstream of network latency such as waterfalls (prioritisation of network calls), janky rendering, hydration bails and more. The point of this article is to free ourselves of excuses and remind ourselves of the fastest theoretical page load so we can work towards it.
George