HTTP


Overview

HTTP: Hypertext Transfer Protocol
HTTP is an application based protocol that works in a request-response manner and contains many elements that are all crucial to creating the perfect website/application. It works on port 80 and port 443 for HTTPS (HTTP Secure).

Basic HTTP transaction:
Diagram 9: Basic HTTP transaction:

Source: https://www.webnots.com
HTTP Transactions consist of 6 flows:

  • DNS Lookup
  • Connect
  • Send
  • Wait
  • Load
  • Close

Source: http://blog.catchpoint.com/
Below are short descriptions of the different versions, status codes and methods available. This information is followed by a complete list of request and response headers along with best practice for each.
Breakdown of an HTTP request

Example HTTP URL
http://www.domain.com/test/folder/image.jpg?v=1.0
HTTP Request Example
Protocol http:// or https://
Domain www.domain.com
URI (note this is not the URL) test/folder/image.jpg
Query String ?v=1.0

Version

There are two versions of the protocol: HTTP/1.1 and HTTP/2. HTTP/1.1 has served most of the internet traffic for over 15 years, however, recently, as with all protocols we have seen the arrival of HTTP/2. Below is a table of information that notes the new benefits and advantages of moving towards the newer version. Most major CDN vendors and browsers should now support the use of HTTP/2.

Feature Benefits of HTTP/2
Multiplexing Version 1.1 only allows one request and one outstanding (queued) request at the same time. Version 2 now allows
multiple requests speeding up the delivery on content in parallel.
Single connection to server A single TCP connection is opened to the server and is kept-alive for as long as the website is open. No need for several TCP connections while browsing the same site.
Server pushing Similar to pre-heating the CDN with content from the origin server, you can now push content to the end user (browser) for future use.
Prioritisation As it sounds, HTTP/2 allows for content to be loaded by priority.
HPACK Header Compression to reduce overheads

Status Codes

Overview
Status codes are a list of 3-digit codes that are split into 5 categories. These codes are sent from a server to an end user (browser) in the response to every request. The current standard for these codes were defined in HTTP/1.1 and there are (when this document was written) no changes moving towards HTTP/2. The five categories are shown below with best practice use (where required). The list contains the most commonly seen status codes only. For a complete list please following the link below: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

10x – Information

Code Description and Best Practice (where applicable to applications/CDN)
100 Continue
The server has received the request headers and is now ready for the request body. This is generally only used for specific methods such as POST.
101 Switching Protocols
A request to switch protocols and the server has agreed.

20x – Success

Code Description and Best Practice
200 OK
Standard response to all successful HTTP requests. Remember that applications and CDN configurations can manipulate
response codes. It is very important not to send 200 OK responses for failed requests. For example, do not configure an application or the CDN to send 200 OK responses for 404 pages. This is not only bad practice for analytics but also on how proxies and clients (browsers) may or may not cache or manipulate the responses.
203 Non-Authoritative Information
Specific to a web proxy (such as a CDN), when a 200 OK is received from the origin server, the web proxy sends a modified version to the end user.
206 Partial Content
The server is only delivering part of the resource due to a range request sent by the client. This is commonly used for large files and is very useful to resume large downloads after connections are broken.For example, sending a 1Gb file in chunks of 100Mb. If the connection breaks after the first 5 parts of the file have been received, when the client makes a request for the same file again, it can resume and not need to start again. Proper use of partial would indicate a 206 is received for each chunk of the file but a 200 OK should be sent with the last chunk.Note, that on the CDN configuration and the origin server, chunking must be enabled to accept these requests. Also, note that the CDN may have a default list of file types for chunking. Specific file types may require customised rules.

30x – Redirections

Code Description and Best Practice
301 Moved Permanently
The requested URL has been definitively to another URL. This is based on the Location header. This request and all further requests for the original URL will be redirected.Given proper use of this response code, it makes perfect sense for caching to be enabled and to allow browsers and CDN servers to respond to the redirects.Example first request:

  • End User -> CDN -> Origin sends 301 redirect -> CDN -> End User
  • CDN caches the redirect

Example second request:

  • End User -> CDN -> End User
  • CDN responds with cached 301 redirect
302 Found
Although the name of this response code is “found”, it is easier described as a temporary redirect. This works in the same way as a 301 but responses should not be cached.Example request:

  • End User requests a HTTP URL
  • The CDN or origin server only accept HTTPS requests so a response to redirect the same URL to HTTPS is sent back to the end user.
  • There is no need to cache this response as it may affect and cause redirect loops for those requests that are already HTTPS.
304 Not Modified
An indication response code that means there is no need to retrieve the requested resource because as there is already a fresh version cached. This is usually based on the request header If-Modified-Since or If-None-Match.

40x – Client Errors

Code Description and Best Practice
400 Bad Request
The server cannot or will not process the request due to a client-side error.
401 Unauthorised
Specifically used for authentication. This could be a result of authentication not being provided or a failed attempt. Very common use (that would result in a 401) would be Basic Authentication. Usually a username and password operation that can be set up on the origin or the CDN very easily.
403 Forbidden
The request is valid, but the client had blocked the end user from accessing the site. When setting up IP blocks on the CDN, this is the most used status code for any rejected/blocked user attempts.
404 Not Found

The request is accepted by the client server, but the object or resource does not exist. This is very common when providing specific, long URL links that can easily have been changed over time.
Example: this link may exist today, however, it contains a version number in the query string that may change at any time, therefore, making this URL obsolete.
www.domain.com/test/hello.jpg?v=1.0
Many website/application developers create custom 404 pages for end users.

405 Method Not Allowed
A basic status code that states the requested HTTP method is not supported or allowed. For example, if a GET request is used to send information in a form it may be rejected as this should be a POST request.

50x – Server Errors

Code Description and Best Practice
500 Internal Server Error
A general message provided by the server when an issue occurs that was unexpected and does not sit within any specific category.
501 Not Implemented
Response code that suggests the server does not accept the requested HTTP method or it does not have the ability to complete the request.
502 Bad Gateway
When servers act as proxy servers or gateways (i.e. A CDN) and an invalid response was received from the upstream server. This is mostly seen when troubleshooting and watching the hops between servers over HTTP (not TCP).
503 Service Unavailable
Server is unavailable. This is a classic code used when a CDN server cannot get a response or build a TCP connection to the origin server. This could be for a number of reasons:

  • The origin is down
  • The origin is overloaded
  • The request timed out (CDN’s always have a timeout set on connections and responses with the origin)

Note, the general error message seen on a browser for 503 responses via a CDN is:
Error to Origin” Again, developers create custom pages for these responses.
Example: IF Status Code == 503; THEN “show custom error page”

Request Headers

Host

Explanation The domain name of the server and if used, the port number required.
Best Practice When the CDN is in play, it is important to configure the Host header on the CDN as required by the origin. In most cases the CDN domain will differ from the origin domain, therefore, the initial request to the CDN will carry a specific value for the host header. We do not want this value used when the CDN needs to connect to the origin (unless the origin is configured to accept that specific value).
Example CDN domain: cdn.domain.com
Origin domain: origin.domain.com
End User -> CDN – Host: cdn.domain.com
CDN -> Origin – Host: origin.domain.com
References

User-Agent

Explanation A string of information that allows network protocol peers to identify application type, software vendor, operating system and software version.
Best Practice As this is a request header that is automated and controlled by the browser the best practice options for developers would involve how their application may respond with different content depending on the value of the header itself. It is important to note if and how this can be achieved in the CDN, saving the request from having to go to the origin.
Example A common use is to read the user-agent to determine the type of device as all new websites and applications are now responsive.

  • Request from desktop browser -> Respond with Desktop site
  • Request from mobile device -> Respond with Mobile version

Although the URL’s for the above 2 sites might be the same, a CDN can cache both as separate instances based on the User-Agent.

References User-Agent List:
http://www.useragentstring.com/pages/useragentstring.php

Method

Explanation Request methods are not essentially request headers but are still a major
part of each request. A list of the different methods available (most
common at the top of the list) are:

  • GET
  • HEAD
  • POST
  • PUT
  • DELETE
  • TRACE
  • OPTIONS
  • CONNECT
  • PATCH
Best Practice CDN’s are generally closed to different HTTP methods except for GET and HEAD. If you request the use of POST or other methods, these need to be enabled or explicitly defined in the CDN configuration. There is no specific best practice except the correct use of each method, however, it is important to understand how your CDN vendor caches
requests and responses based on different methods.
References A complete list of methods with explanations:
https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods

Referer

Explanation Contains the URL of the previous webpage from which the current requested page came from. As a note, if the previous page was HTTPS and the current page is HTTP, a browser may not send the Referer
header. In most cases, this information is used for analytics, logging or optimising caching.
Best Practice Used in certain cases to route/proxy requests (CSS, JS etc.) to the same origin as an HTML page.
References

Origin

Explanation This is similar to the Referer but only states name of the server from where the request originated from.
Best Practice Directly linked to CORS headers that are explained later in the document.
References

Accept-Encoding

Explanation A set of values that are related to content-encoding. These are usually compression-based values that are sent by the client (browser) to the server. Although encoding may be advertised and supported by both the client and the server, it does not always enforce compression. For example, if the content is already compressed and/or if the server is overloaded. General rules state that servers will not compress content on the fly if they are at 80% CPU usage.
Best Practice Note that this header is sent by the client (browser), therefore, there is no need to actively enforce it unless the client does not send the header for specific content types that you deem compressible. It is imperative that the application understands compression (if required) needs to be enabled on the CDN configuration and possibly customised for specific content or “the amount of compression”. These are generally basic settings on the CDN but can be customised. The idea is to have the CDN request the content from the origin and receive a compressed version that can be cached. CDN servers will decompress content on the fly.

  • Enable compression on the CDN and check default usage.
  • Enable compression on the origin that matches the compression type on the CDN.
  • “Content-Encoding: ” in the response confirms compression.
References Full encoding tokens: https://en.wikipedia.org/wiki/HTTP_compression

Transfer-Encoding

Explanation A set of values that are valid on a hop-by-hop basis between servers. It denotes the form of encoding required to send an object or entity to an end user. Similar to Accept-Encoding, there are several options but
the most commonly used is “chunked”. Chunking is enabled on large file transfers. These files are broken into small chunks instead of one large file. The main advantage is that if the transfer fails, it can continue from where it failed and not from the start and the CDN can cache chunks. Very useful when delivering VoD (Video on Demand). The response code for each chunk should be a 206 until the last chunk is complete; this will respond with the status code 200.
Best Practice
  • Transfer-Encoding: chunked
  • Ensure that the content-length header is present otherwise chunking cannot work.
  • Check that chunking is enabled (and if required customised) on the CDN. Usually a checkbox and is generally enabled by default.
References https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding

Accept-Language

Explanation This header is sent by the client (browser) by default and is very rarely amended by the end user. By browser defaults it sends the preferred language and locale of the end user. This header is passed (by default) from the CDN to the Origin where the website/application may or may not be configured to read the values present and make use of them. In general, the syntax used is a 2 or 3 letter language code followed by a “-“ (dash) followed by a 2 letter locale. For example: Accept-Language: en-US
Best Practice Example: If a CDN has cached versions of a homepage in different language and country variations, there is no need for the request to be sent to the origin. The CDN can be customised to read the Accept-Language header and
provide the suitable for that end user. It is more common to use the following structure when creating sites
in different languages/locales: www.domain.com/<language-locale>/
www.domain.com/en-US/
These redirects can easily be setup or cached on the CDN. The suggestion that the first request should always hit the origin is an older theory that applied before CDN configurations could handle these varied requests successfully. Best practice would be to push as many processes to the CDN without the need for the origin to be sent
any requests.
References https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language

Cookie

Explanation The cookie header contains previously stored cookies (from the Set-Cookie response header – explained later).
Best Practice
  • Reasonable TTL
  • Path restriction and/or domain restriction
References

If-Modified-Since

Explanation Create a conditional request that will respond with a 200 OK, only if the requested object has been last modified after the given date in the If-Modified-Since header.
Best Practice The header will be ignored if used in conjunction with If-None-Match and can only be used with GET and HEAD requests. The most common use of If-Modified-Since is to keep cache entries up to date where no Etag Header is being used.
Syntax example:
If-Modified-Since: Wed, 21 Oct 2015 07:28:00 GMT
Take note that the day of the week and month are “case – sensitive” and the time zone is always GMT (not local).
References

Cache-Control

Explanation Note, Cache-Control is a header that can be used as a request and response header, however, the header is unidirectional which means what happens on the request does not need to be linked to what is sent on response.
Cache-Control headers include a various set of directives and values that are used to validate a resource. In this case, it can be used as request header sent from a browser to the server. Cache-Control does not often come up as a request header (or is often over looked).
Best Practice Cache-Control: no-cache
The most popular use of this header with a value of “no-cache” in a request is to tell proxies to revalidate content, regardless of whether the content is fresh (i.e. Has not expired or reached the max-age).
References Cache-Control contains many values and uses. Please refer to the following for a full list of usage.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control

Response Headers

Cache-Control

Explanation Note, Cache-Control is a header that can be used as a request and response header, however, the header is unidirectional which means what happens on the request does not need to be linked to what is sent on response.

Cache-Control is mainly used as a response header and is now the main header used to control how and how long a resource or object is cached.

There are many values but the most common are:

  • Public – responses can be cached by any cache
  • Private – the cached response is only intended for a single user and is not to be cached by a shared cache (or proxy) but by a private cache only (browser).
  • Max-age – the amount of time (in seconds) that a resource or object is considered fresh.
  • No-cache – resource or object must be validated against the origin server before being served.
  • No-store – caches should not store any request or response details.
Best Practice Each application should spend time analysing the life cycle of all their assets.
Example
References Cache-Control contains many values and uses. Please refer to the following for a full list of usage.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control

Date

Explanation This contains the date and time at which the message was sent. It is the responsibility of the server that replies to the request to include the date header and this date must be as accurate as possible and follow the HTTP : date format.
Date = “Date” “:” HTTP-date
Best Practice The main course of best practice with the date header is as follows:

  • A CDN or cache server should not cache a response that does not contain the date header without revalidating on every request.
  • Given that, every CDN vendor should use NTP to synchronise their clocks.
References

Set-Cookie

Explanation Please note that this header is used to send cookie information from the origin server to the user-agent (not the end user as such). The same end user could make a request for the same content on a laptop or smart device – the responses could and most likely differ per device.
Syntax should follow where the prefix is optional:
Set-Cookie: <cookie-name>=<cookie-value>; <prefix>
Best Practice Only in rare occurrences does the CDN look at that header.
References For a complete list and usage of this header, please read the following: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie

Content-Encoding

Explanation In conjunction with the request header, Accept-Encoding. The header describes the type of compression used so the client (browser) understands how to decompress the resource or object.

Some of the most common uses are:

  • Content-Encoding: gzip
  • Content-Encoding: compress
  • Content-Encoding: deflate
Best Practice It is highly recommended to compress resources wherever possible, however, be careful on which resources you apply compression to as it may already be compressed. Applying compression to an object that is already compressed may not decrease the size of the object but will most likely increase the load time (as compression is handled on the fly).
References

Last-Modified

Explanation Date and time in which the origin server believes the object or resource was last modified. Very useful to validate or determine if an object or resource received is the same.
Best Practice Used in conjunction with request headers If-Modified-Since and If-Unmodified-Since, best practice would be to have accurate dates and times. However, some would argue best practice would be to use the ETag header (explained later).
References

Content-Type

Explanation The client can read this header to determine the type of content that is a specific object or resource is, however, many browsers have MIME sniffing enabled that provides this information, so they may ignore the
header.
Best Practice Although browsers may enable MIME sniffing, application developers should always configure the content-type header correctly as assumptions cannot be made. More importantly, a CDN most likely does not MIME sniff; therefore, proper use of this header is very important, especially when a CDN considers compression, caching and chunking.

Example:
Many CDN’s will not cache HTML by default or they will not compress content types that are not worth compressing.
Take note of the format below and check the reference for all different MIME types.
HTTP/1.1 200 OK
Server: Apache/2.4.12 (Red Hat)
ETag: “3ff8f-56154c7ef0b29”
X-Frame-Options: SAMEORIGIN
Last-Modified: Wed, 27 Dec 2017 17:13:29 GMT
Accept-Ranges: bytes
Content-Type: text/javascript
Vary: Accept-Encoding
Content-Encoding: gzip
Cache-Control: public, max-age=25754
Expires: Tue, 23 Jan 2018 20:46:12 GMT
Date: Tue, 23 Jan 2018 13:36:58 GMT
Content-Length: 20
Connection: keep-alive

References MIME types: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types

CORS Headers – Access-Control-Allow-Origin

Explanation Cross Origin Resource Sharing (CORS) is a way for the client (browser) to access a resource or object that is on a server different to the origin. There are many CORS headers, however, the most common is Access-Control-Allow-Origin.
This header is sent from the origin and indicates whether the response can be shared with resources or objects with the specified origin.
Two of the main uses could be :
Access-Control-Allow-Origin: *
Wildcard that allows access from all origins
Access-Control-Allow-Origin: <origin>
Access from a specific domain is allowed
Best Practice Generally, CDN’s will automatically pass what are considered as “safe” headers between the origin and the end user without them having to be specified.
With CORS headers, many CDN vendors will require CORS headers to be strictly defined in each configuration.

Best practice would be to avoid using wildcard (*) values as these defeat the object of using the header, therefore, stick with specific domains for proper use. This will also prevent other applications or websites hijacking resources. A very common object that is hijacked and loaded on other websites are fonts.

References CORS overview and all CORS headers: https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS

Location

Explanation Simple but important. This header tells the browser which URL to redirect that particular request to. Note, that this is only valid for redirections (generally with a 30x response code) but not for rewrites.
Best Practice The only points to consider here are the correct use of status code. Should the redirect be a 301 or 302? There are other options, but these are the main 2. As discussed before (in status codes), it is important to consider what redirects should be permanent or temporary.
References

E-Tag

Explanation A specific identifier for the version of a resource or object. A more efficient way to determine whether a server has the latest version. Saving on bandwidth, a server can send a request to the origin for an object with the current ETag of the object in cache. If the ETag’s match, there is no need to send the object again.
Best Practice Many CDN’s support ETag but do not actively promote the use of it. It is very easy to use properly, for example, ensure a new ETag value is generated for each new version of the object sent.
References https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag

Vary

Explanation A very sensitive response header that uses the matching of specified request headers to determine if a cached resource can be served or a fresh copy should be sent.
The format could be:

  • Vary: *
  • Vary: <header>, <header>

The use of a wildcard (*) would indicate that a new copy needs to be provided for every request.

The use of a header would ask the server to consider the request header value and match it against the request header value on the cached copy.

For example, it is possible to cache different versions of a resource based on User-Agent. This is very common with websites/applications that have desktop and mobile versions.

Best Practice Recommendations would be to not use the Vary header but use meaningful Cache-Control headers and Content Variation. Vary headers can be interpreted incorrectly very easily and are more than often used incorrectly.

Cache-Control used correctly can clearly mark objects that should not be cached.

Content Variation (supported by most CDN vendors), automatically caches different versions of objects based on specified headers.

References

3rd Party Objects

Explanation Objects that are not loaded through your domain. For example, google features, fonts from specific vendors etc.
Best Practice Always a common mistake when looking at analytics for websites and applications. Always consider running tests on your specific domain and not webpage URL’s. Slowness and often restricted domains can and will affect the load times of your website.
Consider that the 3rd party domains do not use a CDN to accelerate their content. Do not cover your website or application with 3rd party objects. It is detrimental to performance and security.
Note: Countries that restrict major domains. For example, Google is blocked in China.
References