The standard way of understanding the HTTP protocol is via the request reply pattern. Each HTTP transaction consists of a finitely bounded HTTP request and a finitely bounded HTTP response.
However it's also possible for both parts of an HTTP 1.1 transaction to stream their possibly infinitely bounded data. The advantages is that the sender can send data that is beyond the sender's memory limit, and the receiver can act on the data stream in chunks immediately instead of waiting for the entire data to arrive. Basically you're either saving space or you're saving time. The advantages of streaming is elaborated in Wikipedia's Online algorithm article.
Note that HTTP streaming is only involves the HTTP protocol and not websockets. Streaming is also the basis for HTML5 server sent events.
So we're going to look at HTTP streaming architecture, and how to achieve streaming in a few different languages.
The first thing to understand is that HTTP streaming involves streaming within a single HTTP transaction. In a larger context, each HTTP transaction itself represents an event as part of a larger event stream. This reveals to us that the concepts of "streaming" is a context-specific concept, it's relative to what we consider the "stream" to be.
Firstly we have to consider the HTTP headers that supports streaming. Open this https://en.wikipedia.org/wiki/List_of_HTTP_header_fields up for reference:
The Content-Length header determines the byte length of the request/response
body. If you neglect to specify the Content-Length header, HTTP servers will
implicitly add a Transfer-Encoding: chunked header. The receiver will have no
idea what the length of the body is and cannot estimate the download completion
time. If you do add a Content-Length header, make sure it matches the entire
body in bytes, if it is incorrect, the behaviour of receivers is undefined.
The Content-Length header will not allow streaming, but it is useful for large
binary files, where you want to support partial content serving. This basically
means resumable downloads, paused downloads, partial downloads, and multi-homed
downloads. This requires the use of an additional header called Range. This
technique is called Byte serving.
The use of Transfer-Encoding: chunked is what allows streaming within a single
request or response. This means that the data is transmitted in a chunked manner,
and does not impact the representation of the content.
Officially an HTTP client is meant to send a request with a TE header field that
specifies what kinds of transfer encodings the client is willing to accept. This is
not always sent, however most servers assume that clients can process chunked
encodings.
The chunked transfer encoding makes better use of persistent TCP connections, which HTTP 1.1 assumes to be true by default.
Chunked data is represented in this manner:
4\r\n
Wiki\r\n
5\r\n
pedia\r\n
e\r\n
in\r\n\r\nchunks.\r\n
0\r\n
\r\n
Each chunk starts with its byte length expressed as a hexadecimal number followed by optional parameters (chunk extension) and a terminating CRLF sequence, followed by the chunk data. The final chunk is terminated by a CRLF sequence.
Chunk extensions can be used to indicate a message digest or an estimated progress. They are just custom metadata that your layer 7 receiver needs to parse. There's no standardised format for it. Because of this, it's probably better to just add your metadata (if any) into the chunk itself for your layer 7.5 application to parse.
For your application to send out chunked data, you must first send out the
Transfer-Encoding header, and then you must flush content in chunks according to
the chunk format. If you don't have an appropriate HTTP server that handles this, then
you need to implement the syntax generator yourself. Sometimes you can use a library
to provide an abstract interface.
For example in PHP, there's the Symfony HTTP Foundation Stream Response and in NodeJS, it's native HTTP module chunks all responses.
Chunking is a 2 way street. The HTTP protocol allows the client to chunk HTTP requests. This allows the client to stream the HTTP request. Which is useful for uploading large files. However not many servers (except NGINX) support this feature, and most streaming upload implementations rely on Javascript libraries to cut up a binary file and send it by chunks to the server. Using Javascript gives you more control over the uploading experience, but the HTTP protocol would be the most simplest.
Browsers natively support chunked data. So if your server sends chunked data, they will start rendering data as soon as they receive it. However there's a buffer limit that browsers need to receive before it starts rendering them. This is different for each browser, but generally it's 1KB. You can see the limits for various browsers here: http://stackoverflow.com/a/16909228/582917
If however you want to consume an API that supports streaming, you need to be aware of
how your HTTP library handles chunked data. In most cases, you'll need to attach a
callback handler that executes upon each chunk of data. This should mean that your
API will need to frame each chunk in a useful manner. If the API is doing too many
chunks, you may end up needing to buffer the data up into a "semantic protocol data
unit" (PDU) before you can work on it. This of course defeats the purpose of chunking
in the first place. For example in PHP, you can use the Guzzle library or curl.
In considering performance, you want to make sure that you're not producing way too chunky data. The more "chunking" you do, the more overhead that exists in both producing the chunks and parsing the chunks. Furthermore, it also results in more executions of buffering functions if the receiver can't make immediate use of the chunks. Chunking isn't always the right answer, it adds extra complexity on the recipient. So if you're sending small units of things that won't gain much from streaming, don't bother with it!
It is also possible to compress chunked or non-chunked data. This is practically
done via the Content-Encoding header.
Note that the Content-Length is equal to the length of the body after the
Content-Encoding. This means if you have gzipped your response, then the length
calculation happens after compression. You will need to be able to load the entire
body in memory if you want to calculate the length (unless you have that information
elsewhere).
When streaming using chunked encoding, the compression algorithm must also support online processing. Thankfully, gzip supports stream compression. I believe that the content gets compressed first, and then cut up in chunks. That way, the chunks are received, then decompressed to acquire the real content. If it were the other way around, you'll get the compressed stream, and then decompressing would give us chunks. Which doesn't make sense.
A typical compressed stream response may have these headers:
Content-Type: text/html
Content-Encoding: gzip
Transfer-Encoding: chunked
Semantically the usage of Content-Encoding indicates an "end to end" encoding
scheme, which means only the final client or final server is supposed to decode the
content. Proxies in the middle are not suppose to decode the content.
If you want to allow proxies in the middle to decode the content, the correct header
to use is in fact the Transfer-Encoding header. If the HTTP request possessed a
TE: gzip chunked header, then it is legal to respond with Transfer-Encoding: gzip chunked.
However this is very rarely supported. So you should only use Content-Encoding
for your compression right now.
...to be continued...