antonvasin/http_streaming.md

Forked from CMCDragonkai/http_streaming.md

Created July 15, 2025 15:32

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/antonvasin/ea2e81a03db0750adc784a118c9b5884.js"></script>
Save antonvasin/ea2e81a03db0750adc784a118c9b5884 to your computer and use it in GitHub Desktop.

Download ZIP

HTTP Streaming (or Chunked vs Store & Forward)

Raw

http_streaming.md

HTTP Streaming (or Chunked vs Store & Forward)

The standard way of understanding the HTTP protocol is via the request reply pattern. Each HTTP transaction consists of a finitely bounded HTTP request and a finitely bounded HTTP response.

However it's also possible for both parts of an HTTP 1.1 transaction to stream their possibly infinitely bounded data. The advantages is that the sender can send data that is beyond the sender's memory limit, and the receiver can act on the data stream in chunks immediately instead of waiting for the entire data to arrive. Basically you're either saving space or you're saving time. The advantages of streaming is elaborated in Wikipedia's Online algorithm article.

Note that HTTP streaming is only involves the HTTP protocol and not websockets. Streaming is also the basis for HTML5 server sent events.

So we're going to look at HTTP streaming architecture, and how to achieve streaming in a few different languages.

The first thing to understand is that HTTP streaming involves streaming within a single HTTP transaction. In a larger context, each HTTP transaction itself represents an event as part of a larger event stream. This reveals to us that the concepts of "streaming" is a context-specific concept, it's relative to what we consider the "stream" to be.

Firstly we have to consider the HTTP headers that supports streaming. Open this https://en.wikipedia.org/wiki/List_of_HTTP_header_fields up for reference:

Content-Length

The Content-Length header determines the byte length of the request/response body. If you neglect to specify the Content-Length header, HTTP servers will implicitly add a Transfer-Encoding: chunked header. The receiver will have no idea what the length of the body is and cannot estimate the download completion time. If you do add a Content-Length header, make sure it matches the entire body in bytes, if it is incorrect, the behaviour of receivers is undefined.

The Content-Length header will not allow streaming, but it is useful for large binary files, where you want to support partial content serving. This basically means resumable downloads, paused downloads, partial downloads, and multi-homed downloads. This requires the use of an additional header called Range. This technique is called Byte serving.

Transfer-Encoding

The use of Transfer-Encoding: chunked is what allows streaming within a single request or response. This means that the data is transmitted in a chunked manner, and does not impact the representation of the content.

Officially an HTTP client is meant to send a request with a TE header field that specifies what kinds of transfer encodings the client is willing to accept. This is not always sent, however most servers assume that clients can process chunked encodings.

The chunked transfer encoding makes better use of persistent TCP connections, which HTTP 1.1 assumes to be true by default.

Chunked data is represented in this manner:

4\r\n
Wiki\r\n
5\r\n
pedia\r\n
e\r\n
 in\r\n\r\nchunks.\r\n
0\r\n
\r\n

Each chunk starts with its byte length expressed as a hexadecimal number followed by optional parameters (chunk extension) and a terminating CRLF sequence, followed by the chunk data. The final chunk is terminated by a CRLF sequence.

Chunk extensions can be used to indicate a message digest or an estimated progress. They are just custom metadata that your layer 7 receiver needs to parse. There's no standardised format for it. Because of this, it's probably better to just add your metadata (if any) into the chunk itself for your layer 7.5 application to parse.

For your application to send out chunked data, you must first send out the Transfer-Encoding header, and then you must flush content in chunks according to the chunk format. If you don't have an appropriate HTTP server that handles this, then you need to implement the syntax generator yourself. Sometimes you can use a library to provide an abstract interface.

For example in PHP, there's the Symfony HTTP Foundation Stream Response and in NodeJS, it's native HTTP module chunks all responses.

Chunking is a 2 way street. The HTTP protocol allows the client to chunk HTTP requests. This allows the client to stream the HTTP request. Which is useful for uploading large files. However not many servers (except NGINX) support this feature, and most streaming upload implementations rely on Javascript libraries to cut up a binary file and send it by chunks to the server. Using Javascript gives you more control over the uploading experience, but the HTTP protocol would be the most simplest.

Browsers natively support chunked data. So if your server sends chunked data, they will start rendering data as soon as they receive it. However there's a buffer limit that browsers need to receive before it starts rendering them. This is different for each browser, but generally it's 1KB. You can see the limits for various browsers here: http://stackoverflow.com/a/16909228/582917

If however you want to consume an API that supports streaming, you need to be aware of how your HTTP library handles chunked data. In most cases, you'll need to attach a callback handler that executes upon each chunk of data. This should mean that your API will need to frame each chunk in a useful manner. If the API is doing too many chunks, you may end up needing to buffer the data up into a "semantic protocol data unit" (PDU) before you can work on it. This of course defeats the purpose of chunking in the first place. For example in PHP, you can use the Guzzle library or curl.

In considering performance, you want to make sure that you're not producing way too chunky data. The more "chunking" you do, the more overhead that exists in both producing the chunks and parsing the chunks. Furthermore, it also results in more executions of buffering functions if the receiver can't make immediate use of the chunks. Chunking isn't always the right answer, it adds extra complexity on the recipient. So if you're sending small units of things that won't gain much from streaming, don't bother with it!

Content-Encoding

It is also possible to compress chunked or non-chunked data. This is practically done via the Content-Encoding header.

Note that the Content-Length is equal to the length of the body after the Content-Encoding. This means if you have gzipped your response, then the length calculation happens after compression. You will need to be able to load the entire body in memory if you want to calculate the length (unless you have that information elsewhere).

When streaming using chunked encoding, the compression algorithm must also support online processing. Thankfully, gzip supports stream compression. I believe that the content gets compressed first, and then cut up in chunks. That way, the chunks are received, then decompressed to acquire the real content. If it were the other way around, you'll get the compressed stream, and then decompressing would give us chunks. Which doesn't make sense.

A typical compressed stream response may have these headers:

Content-Type: text/html
Content-Encoding: gzip
Transfer-Encoding: chunked

Semantically the usage of Content-Encoding indicates an "end to end" encoding scheme, which means only the final client or final server is supposed to decode the content. Proxies in the middle are not suppose to decode the content.

If you want to allow proxies in the middle to decode the content, the correct header to use is in fact the Transfer-Encoding header. If the HTTP request possessed a TE: gzip chunked header, then it is legal to respond with Transfer-Encoding: gzip chunked.

However this is very rarely supported. So you should only use Content-Encoding for your compression right now.

Buffering Problem

...to be continued...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment