HTTP


Overview

HTTP: Hypertext Transfer Protocol
HTTP is an application based protocol that works in a request-response manner and contains many elements that are all crucial to creating the perfect website/application. It works on port 80 and port 443 for HTTPS (HTTP Secure).

Basic HTTP transaction:
Diagram 9: Basic HTTP transaction:

Source: https://www.webnots.com
HTTP Transactions consist of 6 flows:

  • DNS Lookup
  • Connect
  • Send
  • Wait
  • Load
  • Close

Source: http://blog.catchpoint.com/
Below are short descriptions of the different versions, status codes and methods available. This information is followed by a complete list of request and response headers along with best practice for each.
Breakdown of an HTTP request

Example HTTP URL
http://www.domain.com/test/folder/image.jpg?v=1.0
HTTP RequestExample
Protocolhttp:// or https://
Domainwww.domain.com
URI (note this is not the URL)test/folder/image.jpg
Query String?v=1.0

Version

There are two versions of the protocol: HTTP/1.1 and HTTP/2. HTTP/1.1 has served most of the internet traffic for over 15 years, however, recently, as with all protocols we have seen the arrival of HTTP/2. Below is a table of information that notes the new benefits and advantages of moving towards the newer version. Most major CDN vendors and browsers should now support the use of HTTP/2.

FeatureBenefits of HTTP/2
MultiplexingVersion 1.1 only allows one request and one outstanding (queued) request at the same time. Version 2 now allows
multiple requests speeding up the delivery on content in parallel.
Single connection to serverA single TCP connection is opened to the server and is kept-alive for as long as the website is open. No need for several TCP connections while browsing the same site.
Server pushingSimilar to pre-heating the CDN with content from the origin server, you can now push content to the end user (browser) for future use.
PrioritisationAs it sounds, HTTP/2 allows for content to be loaded by priority.
HPACKHeader Compression to reduce overheads

Status Codes

Overview
Status codes are a list of 3-digit codes that are split into 5 categories. These codes are sent from a server to an end user (browser) in the response to every request. The current standard for these codes were defined in HTTP/1.1 and there are (when this document was written) no changes moving towards HTTP/2. The five categories are shown below with best practice use (where required). The list contains the most commonly seen status codes only. For a complete list please following the link below: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

10x – Information

CodeDescription and Best Practice (where applicable to applications/CDN)
100Continue
The server has received the request headers and is now ready for the request body. This is generally only used for specific methods such as POST.
101Switching Protocols
A request to switch protocols and the server has agreed.

20x – Success

CodeDescription and Best Practice
200OK
Standard response to all successful HTTP requests. Remember that applications and CDN configurations can manipulate
response codes. It is very important not to send 200 OK responses for failed requests. For example, do not configure an application or the CDN to send 200 OK responses for 404 pages. This is not only bad practice for analytics but also on how proxies and clients (browsers) may or may not cache or manipulate the responses.
203Non-Authoritative Information
Specific to a web proxy (such as a CDN), when a 200 OK is received from the origin server, the web proxy sends a modified version to the end user.
206Partial Content
The server is only delivering part of the resource due to a range request sent by the client. This is commonly used for large files and is very useful to resume large downloads after connections are broken.For example, sending a 1Gb file in chunks of 100Mb. If the connection breaks after the first 5 parts of the file have been received, when the client makes a request for the same file again, it can resume and not need to start again. Proper use of partial would indicate a 206 is received for each chunk of the file but a 200 OK should be sent with the last chunk.Note, that on the CDN configuration and the origin server, chunking must be enabled to accept these requests. Also, note that the CDN may have a default list of file types for chunking. Specific file types may require customised rules.

30x – Redirections

CodeDescription and Best Practice
301Moved Permanently
The requested URL has been definitively to another URL. This is based on the Location header. This request and all further requests for the original URL will be redirected.Given proper use of this response code, it makes perfect sense for caching to be enabled and to allow browsers and CDN servers to respond to the redirects.Example first request:

  • End User -> CDN -> Origin sends 301 redirect -> CDN -> End User
  • CDN caches the redirect

Example second request:

  • End User -> CDN -> End User
  • CDN responds with cached 301 redirect
302Found
Although the name of this response code is “found”, it is easier described as a temporary redirect. This works in the same way as a 301 but responses should not be cached.Example request:

  • End User requests a HTTP URL
  • The CDN or origin server only accept HTTPS requests so a response to redirect the same URL to HTTPS is sent back to the end user.
  • There is no need to cache this response as it may affect and cause redirect loops for those requests that are already HTTPS.
304Not Modified
An indication response code that means there is no need to retrieve the requested resource because as there is already a fresh version cached. This is usually based on the request header If-Modified-Since or If-None-Match.

40x – Client Errors

CodeDescription and Best Practice
400Bad Request
The server cannot or will not process the request due to a client-side error.
401Unauthorised
Specifically used for authentication. This could be a result of authentication not being provided or a failed attempt. Very common use (that would result in a 401) would be Basic Authentication. Usually a username and password operation that can be set up on the origin or the CDN very easily.
403Forbidden
The request is valid, but the client had blocked the end user from accessing the site. When setting up IP blocks on the CDN, this is the most used status code for any rejected/blocked user attempts.
404Not Found

The request is accepted by the client server, but the object or resource does not exist. This is very common when providing specific, long URL links that can easily have been changed over time.
Example: this link may exist today, however, it contains a version number in the query string that may change at any time, therefore, making this URL obsolete.
www.domain.com/test/hello.jpg?v=1.0
Many website/application developers create custom 404 pages for end users.

405Method Not Allowed
A basic status code that states the requested HTTP method is not supported or allowed. For example, if a GET request is used to send information in a form it may be rejected as this should be a POST request.

50x – Server Errors

CodeDescription and Best Practice
500Internal Server Error
A general message provided by the server when an issue occurs that was unexpected and does not sit within any specific category.
501Not Implemented
Response code that suggests the server does not accept the requested HTTP method or it does not have the ability to complete the request.
502Bad Gateway
When servers act as proxy servers or gateways (i.e. A CDN) and an invalid response was received from the upstream server. This is mostly seen when troubleshooting and watching the hops between servers over HTTP (not TCP).
503Service Unavailable
Server is unavailable. This is a classic code used when a CDN server cannot get a response or build a TCP connection to the origin server. This could be for a number of reasons:

  • The origin is down
  • The origin is overloaded
  • The request timed out (CDN’s always have a timeout set on connections and responses with the origin)

Note, the general error message seen on a browser for 503 responses via a CDN is:
Error to Origin” Again, developers create custom pages for these responses.
Example: IF Status Code == 503; THEN “show custom error page”

Request Headers

Host

ExplanationThe domain name of the server and if used, the port number required.
Best PracticeWhen the CDN is in play, it is important to configure the Host header on the CDN as required by the origin. In most cases the CDN domain will differ from the origin domain, therefore, the initial request to the CDN will carry a specific value for the host header. We do not want this value used when the CDN needs to connect to the origin (unless the origin is configured to accept that specific value).
ExampleCDN domain: cdn.domain.com
Origin domain: origin.domain.com
End User -> CDN – Host: cdn.domain.com
CDN -> Origin – Host: origin.domain.com
References

User-Agent

ExplanationA string of information that allows network protocol peers to identify application type, software vendor, operating system and software version.
Best PracticeAs this is a request header that is automated and controlled by the browser the best practice options for developers would involve how their application may respond with different content depending on the value of the header itself. It is important to note if and how this can be achieved in the CDN, saving the request from having to go to the origin.
ExampleA common use is to read the user-agent to determine the type of device as all new websites and applications are now responsive.

  • Request from desktop browser -> Respond with Desktop site
  • Request from mobile device -> Respond with Mobile version

Although the URL’s for the above 2 sites might be the same, a CDN can cache both as separate instances based on the User-Agent.

ReferencesUser-Agent List:
http://www.useragentstring.com/pages/useragentstring.php

Method

ExplanationRequest methods are not essentially request headers but are still a major
part of each request. A list of the different methods available (most
common at the top of the list) are:

  • GET
  • HEAD
  • POST
  • PUT
  • DELETE
  • TRACE
  • OPTIONS
  • CONNECT
  • PATCH
Best PracticeCDN’s are generally closed to different HTTP methods except for GET and HEAD. If you request the use of POST or other methods, these need to be enabled or explicitly defined in the CDN configuration. There is no specific best practice except the correct use of each method, however, it is important to understand how your CDN vendor caches
requests and responses based on different methods.
ReferencesA complete list of methods with explanations:
https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods

Referer

ExplanationContains the URL of the previous webpage from which the current requested page came from. As a note, if the previous page was HTTPS and the current page is HTTP, a browser may not send the Referer
header. In most cases, this information is used for analytics, logging or optimising caching.
Best PracticeUsed in certain cases to route/proxy requests (CSS, JS etc.) to the same origin as an HTML page.
References

Origin

ExplanationThis is similar to the Referer but only states name of the server from where the request originated from.
Best PracticeDirectly linked to CORS headers that are explained later in the document.
References

Accept-Encoding

ExplanationA set of values that are related to content-encoding. These are usually compression-based values that are sent by the client (browser) to the server. Although encoding may be advertised and supported by both the client and the server, it does not always enforce compression. For example, if the content is already compressed and/or if the server is overloaded. General rules state that servers will not compress content on the fly if they are at 80% CPU usage.
Best PracticeNote that this header is sent by the client (browser), therefore, there is no need to actively enforce it unless the client does not send the header for specific content types that you deem compressible. It is imperative that the application understands compression (if required) needs to be enabled on the CDN configuration and possibly customised for specific content or “the amount of compression”. These are generally basic settings on the CDN but can be customised. The idea is to have the CDN request the content from the origin and receive a compressed version that can be cached. CDN servers will decompress content on the fly.

  • Enable compression on the CDN and check default usage.
  • Enable compression on the origin that matches the compression type on the CDN.
  • “Content-Encoding: ” in the response confirms compression.
ReferencesFull encoding tokens: https://en.wikipedia.org/wiki/HTTP_compression

Transfer-Encoding

ExplanationA set of values that are valid on a hop-by-hop basis between servers. It denotes the form of encoding required to send an object or entity to an end user. Similar to Accept-Encoding, there are several options but
the most commonly used is “chunked”. Chunking is enabled on large file transfers. These files are broken into small chunks instead of one large file. The main advantage is that if the transfer fails, it can continue from where it failed and not from the start and the CDN can cache chunks. Very useful when delivering VoD (Video on Demand). The response code for each chunk should be a 206 until the last chunk is complete; this will respond with the status code 200.
Best Practice
  • Transfer-Encoding: chunked
  • Ensure that the content-length header is present otherwise chunking cannot work.
  • Check that chunking is enabled (and if required customised) on the CDN. Usually a checkbox and is generally enabled by default.
Referenceshttps://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding

Accept-Language

ExplanationThis header is sent by the client (browser) by default and is very rarely amended by the end user. By browser defaults it sends the preferred language and locale of the end user. This header is passed (by default) from the CDN to the Origin where the website/application may or may not be configured to read the values present and make use of them. In general, the syntax used is a 2 or 3 letter language code followed by a “-“ (dash) followed by a 2 letter locale. For example: Accept-Language: en-US
Best PracticeExample: If a CDN has cached versions of a homepage in different language and country variations, there is no need for the request to be sent to the origin. The CDN can be customised to read the Accept-Language header and
provide the suitable for that end user. It is more common to use the following structure when creating sites
in different languages/locales: www.domain.com/<language-locale>/
www.domain.com/en-US/
These redirects can easily be setup or cached on the CDN. The suggestion that the first request should always hit the origin is an older theory that applied before CDN configurations could handle these varied requests successfully. Best practice would be to push as many processes to the CDN without the need for the origin to be sent
any requests.
Referenceshttps://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language

Cookie

ExplanationThe cookie header contains previously stored cookies (from the Set-Cookie response header – explained later).
Best Practice
  • Reasonable TTL
  • Path restriction and/or domain restriction
References

If-Modified-Since

ExplanationCreate a conditional request that will respond with a 200 OK, only if the requested object has been last modified after the given date in the If-Modified-Since header.
Best PracticeThe header will be ignored if used in conjunction with If-None-Match and can only be used with GET and HEAD requests. The most common use of If-Modified-Since is to keep cache entries up to date where no Etag Header is being used.
Syntax example:
If-Modified-Since: Wed, 21 Oct 2015 07:28:00 GMT
Take note that the day of the week and month are “case – sensitive” and the time zone is always GMT (not local).
References

Cache-Control

ExplanationNote, Cache-Control is a header that can be used as a request and response header, however, the header is unidirectional which means what happens on the request does not need to be linked to what is sent on response.
Cache-Control headers include a various set of directives and values that are used to validate a resource. In this case, it can be used as request header sent from a browser to the server. Cache-Control does not often come up as a request header (or is often over looked).
Best PracticeCache-Control: no-cache
The most popular use of this header with a value of “no-cache” in a request is to tell proxies to revalidate content, regardless of whether the content is fresh (i.e. Has not expired or reached the max-age).
ReferencesCache-Control contains many values and uses. Please refer to the following for a full list of usage.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control

Response Headers

Cache-Control

ExplanationNote, Cache-Control is a header that can be used as a request and response header, however, the header is unidirectional which means what happens on the request does not need to be linked to what is sent on response.

Cache-Control is mainly used as a response header and is now the main header used to control how and how long a resource or object is cached.

There are many values but the most common are:

  • Public – responses can be cached by any cache
  • Private – the cached response is only intended for a single user and is not to be cached by a shared cache (or proxy) but by a private cache only (browser).
  • Max-age – the amount of time (in seconds) that a resource or object is considered fresh.
  • No-cache – resource or object must be validated against the origin server before being served.
  • No-store – caches should not store any request or response details.
Best PracticeEach application should spend time analysing the life cycle of all their assets.
Example
ReferencesCache-Control contains many values and uses. Please refer to the following for a full list of usage.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control

Date

ExplanationThis contains the date and time at which the message was sent. It is the responsibility of the server that replies to the request to include the date header and this date must be as accurate as possible and follow the HTTP : date format.
Date = “Date” “:” HTTP-date
Best PracticeThe main course of best practice with the date header is as follows:

  • A CDN or cache server should not cache a response that does not contain the date header without revalidating on every request.
  • Given that, every CDN vendor should use NTP to synchronise their clocks.
References

Set-Cookie

ExplanationPlease note that this header is used to send cookie information from the origin server to the user-agent (not the end user as such). The same end user could make a request for the same content on a laptop or smart device – the responses could and most likely differ per device.
Syntax should follow where the prefix is optional:
Set-Cookie: <cookie-name>=<cookie-value>; <prefix>
Best PracticeOnly in rare occurrences does the CDN look at that header.
ReferencesFor a complete list and usage of this header, please read the following: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie

Content-Encoding

ExplanationIn conjunction with the request header, Accept-Encoding. The header describes the type of compression used so the client (browser) understands how to decompress the resource or object.

Some of the most common uses are:

  • Content-Encoding: gzip
  • Content-Encoding: compress
  • Content-Encoding: deflate
Best PracticeIt is highly recommended to compress resources wherever possible, however, be careful on which resources you apply compression to as it may already be compressed. Applying compression to an object that is already compressed may not decrease the size of the object but will most likely increase the load time (as compression is handled on the fly).
References

Last-Modified

ExplanationDate and time in which the origin server believes the object or resource was last modified. Very useful to validate or determine if an object or resource received is the same.
Best PracticeUsed in conjunction with request headers If-Modified-Since and If-Unmodified-Since, best practice would be to have accurate dates and times. However, some would argue best practice would be to use the ETag header (explained later).
References

Content-Type

ExplanationThe client can read this header to determine the type of content that is a specific object or resource is, however, many browsers have MIME sniffing enabled that provides this information, so they may ignore the
header.
Best PracticeAlthough browsers may enable MIME sniffing, application developers should always configure the content-type header correctly as assumptions cannot be made. More importantly, a CDN most likely does not MIME sniff; therefore, proper use of this header is very important, especially when a CDN considers compression, caching and chunking.

Example:
Many CDN’s will not cache HTML by default or they will not compress content types that are not worth compressing.
Take note of the format below and check the reference for all different MIME types.
HTTP/1.1 200 OK
Server: Apache/2.4.12 (Red Hat)
ETag: “3ff8f-56154c7ef0b29”
X-Frame-Options: SAMEORIGIN
Last-Modified: Wed, 27 Dec 2017 17:13:29 GMT
Accept-Ranges: bytes
Content-Type: text/javascript
Vary: Accept-Encoding
Content-Encoding: gzip
Cache-Control: public, max-age=25754
Expires: Tue, 23 Jan 2018 20:46:12 GMT
Date: Tue, 23 Jan 2018 13:36:58 GMT
Content-Length: 20
Connection: keep-alive

ReferencesMIME types: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types

CORS Headers – Access-Control-Allow-Origin

ExplanationCross Origin Resource Sharing (CORS) is a way for the client (browser) to access a resource or object that is on a server different to the origin. There are many CORS headers, however, the most common is Access-Control-Allow-Origin.
This header is sent from the origin and indicates whether the response can be shared with resources or objects with the specified origin.
Two of the main uses could be :
Access-Control-Allow-Origin: *
Wildcard that allows access from all origins
Access-Control-Allow-Origin: <origin>
Access from a specific domain is allowed
Best PracticeGenerally, CDN’s will automatically pass what are considered as “safe” headers between the origin and the end user without them having to be specified.
With CORS headers, many CDN vendors will require CORS headers to be strictly defined in each configuration.

Best practice would be to avoid using wildcard (*) values as these defeat the object of using the header, therefore, stick with specific domains for proper use. This will also prevent other applications or websites hijacking resources. A very common object that is hijacked and loaded on other websites are fonts.

ReferencesCORS overview and all CORS headers: https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS

Location

ExplanationSimple but important. This header tells the browser which URL to redirect that particular request to. Note, that this is only valid for redirections (generally with a 30x response code) but not for rewrites.
Best PracticeThe only points to consider here are the correct use of status code. Should the redirect be a 301 or 302? There are other options, but these are the main 2. As discussed before (in status codes), it is important to consider what redirects should be permanent or temporary.
References

E-Tag

ExplanationA specific identifier for the version of a resource or object. A more efficient way to determine whether a server has the latest version. Saving on bandwidth, a server can send a request to the origin for an object with the current ETag of the object in cache. If the ETag’s match, there is no need to send the object again.
Best PracticeMany CDN’s support ETag but do not actively promote the use of it. It is very easy to use properly, for example, ensure a new ETag value is generated for each new version of the object sent.
References https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag

Vary

ExplanationA very sensitive response header that uses the matching of specified request headers to determine if a cached resource can be served or a fresh copy should be sent.
The format could be:

  • Vary: *
  • Vary: <header>, <header>

The use of a wildcard (*) would indicate that a new copy needs to be provided for every request.

The use of a header would ask the server to consider the request header value and match it against the request header value on the cached copy.

For example, it is possible to cache different versions of a resource based on User-Agent. This is very common with websites/applications that have desktop and mobile versions.

Best PracticeRecommendations would be to not use the Vary header but use meaningful Cache-Control headers and Content Variation. Vary headers can be interpreted incorrectly very easily and are more than often used incorrectly.

Cache-Control used correctly can clearly mark objects that should not be cached.

Content Variation (supported by most CDN vendors), automatically caches different versions of objects based on specified headers.

References

3rd Party Objects

ExplanationObjects that are not loaded through your domain. For example, google features, fonts from specific vendors etc.
Best PracticeAlways a common mistake when looking at analytics for websites and applications. Always consider running tests on your specific domain and not webpage URL’s. Slowness and often restricted domains can and will affect the load times of your website.
Consider that the 3rd party domains do not use a CDN to accelerate their content. Do not cover your website or application with 3rd party objects. It is detrimental to performance and security.
Note: Countries that restrict major domains. For example, Google is blocked in China.
References