A Content Delivery Network (CDN) is a made up of networked facilities, called Point of Presences (POPs), that form an intelligent mesh in a geographic area (regionally, nationally, or world wide based on provided) to provide enhanced delivery of “web content“ over HTTP/HTTPs. The strength of a CDN is the ability to push the “Edge” closer to users, using the physical scale of it’s POPs and it’s private network to distribute content and load efficiently. This helps to optimize and secure a website’s “Origin”.
Modern web development takes full advantage of the modern Web Browsers’ (Edge, Chrome, Safari, Firefox, etc…) ability to participate in the dynamic nature of an application. The use of Javascript (JS) running on the user’s Web Browser creates rich interactions, Cascading Style Sheets (CSS) to provide consistent and elegant presentation and style, and a variety of other supporting files (Images, Fonts, etc..) require a series of dependent web resources to be downloaded with each web page. Which means with for each web page load there could be 100’s of requests for subsequent support files. To make it more complex, the need for this content to be available in large geographic areas (regionally, nationally, or world wide) puts an emphasis on far away a user is from the Origin.
This is where leveraging a CDN becomes important. In general the basic benefits of a CDN can be categorize as either user-related or origin-related. For all the examples, the user will be in Tokyo, Japan and the website’s “Origin” will be located in Azure’s South Central US region in the State of Texas.
User-Related benefits:
- Faster user experience - By “caching” some of the resources that are required for a web site, a CDN leverages it’s POPs by creating a distributed copy or Cache of resources. This means that the user’s request for the web page (that has 40 different JS/CSS/Image files) will go to Texas, but the 40 dependent requests will stay go to the nearest POP, which most likely be in Tokyo Japan.
- More consistent user experience no matter the location of the user - By using the CDN’s ability to “proxy” the initial request, the user’s request will be directed to the nearest POP and then have an optimized network path to the Origin. The stronger the CDN’s backend network is, the more consistent the user’s experience is whether the user is 500 feet, or 5000 miles from the websites Origin.
Origin-Related benefits:
- Connection Management to Origin - Instead of every single user establishing and managing discrete connections to the Origin per user, the “proxy” will manage a pool of persistent connections to the Origin. The CDN’s proxy effect than acts as buffer for all users requesting web resources. The Origin then sees much longer living connections, and doesn’t need to manage the overhead of continually creating and destroying connections.
- SSL Termination at the Edge - One of the most expensive operations when hitting a web site is the time to establish a connection and do the entire Transport Layer Security (TLS) handshake. By moving this to the edge, the 9 part process of going back and forth between User and the Edge is now closest to the user. Note: A CDN’s connection to the Origin should also be over HTTPS requiring handshaking, but the load is greatly reduced since this connection is persistent in nature.
- Network Security can be enforced at the Edge - For different forms attack such as Denial of Service (DoS) including Distributed Denial of Service (DDoS) and other forms of malicious activities, a layered strategy of Firewalls (Layer 3 or Layer 4) and Web Application Firewalls (WAF - Layer 7) are employed for mitigation. The CDN itself will have it’s own layer of protection that will mitigate massive DoS attacks to ensure stability of the network in general. This is focused on safeguarding the CDN’s network in total (which is massive), not just a particular website. The definition of an attack is much different in terms of scale when it comes to an entire CDN vs a single website. That is where utilizing the CDN’s WAF offering provides a per website mitigation.
Definition of terms used above:
- Web Content refers to resource that can be accessed via a URI with a HTTP or HTTPS scheme. This can be almost any kind of file, generally in web terms Javascript (JS), Cascading Style Sheet (CSS), Images (JPEG, PNG, GIF, etc..), or HTML that is a stored file or created dynamically by a web site. Another popular use of Web Content is software distribution (full installs or updates).
- Edge refers to the closest ingress network location to a user that a website has control over. Without a CDN, the Edge is that first firewall/router/server a user’s request hits when making a HTTP request.
- Origin refers to where to accessible location where the content is hosted. This is a location that accepts HTTP or HTTPS requests, could be a firewall, router (Layer 3 or Layer 7), a load balancer, web server, an Azure Storage Account, etc…
- Caching, or in other terms, redundant copies of the same resources. For a CDN, there is a layered caching strategy that is generally employed to efficiently replicate resources to each individual POP. Each POP will have it’s own cache of a resource, generally populated after the first request made that originates from a user that hits the specific POP. For example, a company’s logo gets updated (call it FabrikamLogo.png). There are three general cases to consider:
- First Request from anywhere
- User’s request hit’s the POP cache, it doesn’t exist
- User’s request hit’s the “Origin Shield”, it doesn’t exist
- User’s request hit’s the Origin, which populates the Origin Shield cache and the POP’s cache
- User get the new Company Logo
- First Request from anywhere
- User’s request hit’s the POP cache, it doesn’t exist
- User’s request hit’s the “Origin Shield”, it exists and populates the POP’s cache
- User get the new Company Logo
- Second Request made to a POP
- User’s request hit’s the POP cache, it exists
- User get the new Company Logo
-
Proxy, and more specifically a Layer 7 Transparent Proxy, is where the CDN presence itself as the website and acts as the intermediary between the user and the Origin. A normal user web to an Origin goes over the public internet which involves a series of routing hand-offs.
- Here is an example of a user accessing the Origin over the Public Internet and how it can happen, but it’s not (detail article).
- User’s device
- Local ISP (Internet Service Provider) near user
- Regional ISP near user
- Network Service Provider (NSP) near user
- The Network Access Point (NAP) or Metropolitan Area Exchange (MAE) closest to user
- Continues to hop from NAP/MAE to NAP/MAE to the closest location to Origin (think City to City on the a highway system). This can be several
- Network Service Provider (NSP) near Origin
- Regional ISP near Origin
- Local ISP (Internet Service Provider) Origin
- Origin
- Here is an example of a user accessing the Origin using a Proxy within a CDN:
- User’s device
- Local ISP (Internet Service Provider) near user
- Regional ISP near user
- Network Service Provider (NSP) near user
- CDN’s POP
- Direct route over the optimal path calculated by CDN’s POP
- If the website is using Azure CDN (Microsoft CDN Provider or Azure Front Door) and Origin is in Azure, the next hop is your Origin.
- If the website is either not in Azure or the website uses a different CDN Provider, then the most optimal egress point is calculated a the reverse of #1 thru #4 is traversed.
- Origin Shield refers to a centralized cache a CDN creates for an Origin. Instead of each POP (could be in the 1000s based on the CDN Provider) hitting the Origin for each resource, it acts as a “shield” for the Origin.
- Firewall (Layer 3 or Layer 4) refers to a capability (either a physical network appliance, a virtual network appliance, or software) that can interrogate individual packets of data the route thru it. Layer 3 Firewalls can decide to allow or block the individual packets based on header of the individual packet (IP Address, Port). Layer 4 Firewalls add the capability of inspecting the session of a series of packets that make up a connection between a client an server, trending and understand the communication patterns vs just individual packets.
- Web Application Firewall (Layer 7) refers to a capability (either a physical network appliance or a virtual network appliance) that can analyze and mitigate based on the payload of 1 to many packets that constitutes a specific HTTP/HTTPS request. So instead of just being able to say “don’t allow requests from IP Address X”, it can be “don’t allow GET requests to URL Y”. This is particularly powerful due to the nature of complicated attacks that involve specific HTTP request patterns (either thru the Query String or posted body’s) that are indicative to a particular web application platform.