carmour24/whatsreactive.md

While researching tech for a new project I've seen sure seen the words react, reactor and reactive come up a lot. Confusingly the term reactive is somewhat overloaded so I'll try to define what we mean by it in which context giving some hopefully clarifying examples. This is just some notes to help me think about this stuff so it might not be 100% right.

As a brief introduction reactive programming is about non-blocking applications that are asynchronous and event-driven and require a small number of threads to scale vertically (i.e. within the JVM) rather than horizontally (i.e. through clustering).

Let me give a generic example which is pertinent and will hopefully illustrate why this has become a hot topic.

In a typical web server we would have a thread per request model. Meaning that for each request made to the web server a single thread would service this request for its lifetime. The thread may go off and perform some blocking IO, e.g. reading a file, connecting to database or making a call to another web service, perhaps even synchronising with some other threads but until this call returns this thread is not available to do work for any other connections. It is not unusual for web servers utilising this model to allow for thousands of simultaneous threads and once this thread pool is saturated connections can no longer be made to the service (as is common in DDOS attacks). To achieve further simultaneous connections the service must either be scaled up by beefing up the server or scaled out by replicating the service and load balancing between instances.

However, much of the time that these exclusive per request threads are running they are blocked waiting for the filesystem, database or web service to return a response and so are doing no work themselves. This brings us to the reactor pattern which applies a single threaded event driven approach to handling requests. At a high level in the reactor pattern a single threaded dispatcher class waits for events like readiness for read or readiness for write and synchronously calls an event handler which has been registered to handle this type of event. The event handler will decode and process the event until either the operation is complete and a response can be returned or a juncture at which the event handler yields is reached (e.g. to make a futher request to a web service). At this point the dispatcher which will check for further events such as more readiness for read or write, or a callback to some IO operation and will execute these. In this way the reactor pattern can be used to achieve greater utilisation of the available resources provided that blocking for long periods can be avoided, e.g. on database connections or long running computation.

To illlustrate take the following simplified example:

An HTTP GET request is made by a browser to http://example.com/api/users
Dispatcher gets the event and calls the appropriate event handler which is executed synchronously
Event handler calls a remote web service asynchronously
The currently exectuing handler registers a new event handler indicating interest in the result of the remote web service call and returns, allowing the next item in the event loop to be processed
Other processing occurs (e.g. other requests) and are serviced sequentially by the same thread
Our remote web service call completes and the dispatcher responds to the event by calling the callback registered by our original event handler
Finally this child event handler completes by synchronously writing any response data out to the appropriate connection
The browser displays the content returned by the server

Server implementers are finding that a single threaded event loop can serve what would previously have required hundreds of threads. Requests are not actually served faster but the existing hardware can be better utilised for more throughput and more consistent workload patterns as long as blocking can be avoided. If we perform some blocking IO or computation on the event loop thread then suddenly requests cannot be served and the system can become unreliable. Therefore a requirement for using this architecture is never to block the main thread for an extended duration. How can this be achieved? Wherever possible by using non-blocking APIs provided by the host OS and when this isn't possible by offloading blocking activities onto worker threads from a thread pool. If we exhaust this thread pool then we can end up in a similar situation to before but ideally blocking should be kept to a minimum to prevent this. This is where issues may come up if using a reactive web framework like Spring WebFlux with a blocking APIs, e.g. JDBC and why it isn't an automatic choice to use one of these reactive web frameworks.

In the context of web servers or web frameworks reactive means running a single threaded event loop with asynchronous non-blocking IO to provide lots of throughput. This is no panacea, however, and to reap the benefits we need to ensure we don't block the event loop or performance will suffer.

Now, how does this change how we interact with the framework/server and what does this mean for us during development?

When responding to an HTTP request we must respond in an asynchronous manner, and avoid blocking in our handler. There are a number of approaches for this the simplest of which is callbacks. Callbacks are just functions passed to an asynchronous operation to be invoked upon completion. These are very commonly used in Node.js but also in Vert.x and while they are reactive, they can easily lead to untidily nested code which is difficult to reason about. Promises/futures are one approach to dealing with this complexity (see Callback Hell and CompletableFuture FTW for more detail) but a number of projects on the JVM adopt or extend the reactive streams API which, whilst requiring a change in approach from imperative to declarative, hopefully provides a cleaner approach with greater opportunities for reuse and reduced complexity.

The reactive streams API defines a few simple interfaces which allow publication, transformation and consumption of an asynchronous sequence of events. The idea behind this is to make working with asynchronous operations more straightforward by encapsulating the flow of data and allowing us to apply transformations in a standard way much like in the Java Streams API. The difference is that while we execute the methods from the Streams API in an imperative fashion, with Reactive Streams we are declaratively defining a data flow process which may occur at some point in the future. Once a publisher begins publishing the data flow we have set up is invoked for each item in the sequence.

Take the following example using the Java Streams API to multiply each integer by 3 and return only the even elements:

List<Integer> integers = Arrays.asList(1, 2, 3, 4);

Stream<Integer> transformedIntegerStream = integers
    .stream()
    .map(integer -> integer * 3)
    .filter(integer -> integer % 2 == 0);

List<Integer> transformedIntegers = transformedIntegerStream
        .collect(Collectors.toList());

System.out.println(transformedIntegers.toString());

Now compare with this example using RxJava, an adopter of the Reactive Streams API:

Flowable<Integer> transformedIntegerFlow = Flowable.fromArray(1, 2, 3, 4)
        .map(integer -> integer * 3)
        .filter(integer -> integer %2 == 0);

transformedIntegerFlow
	.toList()
	.subscribe(transformedIntegers -> System.out.println(transformedIntegers));

They look similar apart from a couple of details. First, the source in the initial example is a Flowable, rather than a list and when we print the result in the first example, we do so in the initial context but in the second example we print the result in a callback to the subscribe method. So what does that mean?

The Flowable type is a publisher of sequence elements. In this case it will publish the array items it has been given as these are requested by its subscriber.

So unlike the first example where we have performed a sequence of operations and received a result in the original calling context, here we have set up a data flow operation here and nothing will happen until subscribe is called. After the call to subscribe, our subscriber will request from the publisher (the Flowable) a number of elements to be supplied. This will continue either until the publisher runs out of data (e.g. for a finite data source) or the subscription is cancelled.

While we have used a simple sequence from an array, a publisher could produce data from any type of source like a socket, filesystem or even the click of a button in a UI and elements could be emitted at varying intervals rather than continuously.

To prevent the consumer being overwhelmed by events the publisher will only supply new items up to the number which has been requested by the subscriber. This pattern of allowing the destination to control the flow of data from the source is known as backpressure and reactive streams implements this by requiring the subscriber to request some number of sequence elements before the publisher is allowed to provide them. This is just one possible implementation, in Node.js for example the publisher will supply items to a subscriber until the supply attempt fails after which it will back off until the subscriber's buffer has been emptied and fires a drain event indicating it is ready for further data.

Why would we want to use this declarative reactive streams approach over more traditional imperative approaches like callbacks or even promises? Reactive Streams implementations like RxJava let us wrap complex operations into reusable functions which can be combined in powerful ways. Let's say for example we want to attempt an HTTP request. We know there's a possiblity that this request may fail so we want to retry the request a few times whilst also delaying the retry attempts to avoid overwhelming the server. In a more imperative style would be fairly complex but in RxJava this can be expressed fairly concisely (example from RxJava's repeatWhen and retryWhen, explained):

source.retryWhen(errors ->  
  errors
    .zipWith(Observable.range(1, 3), (n, i) -> i)
    .flatMap(retryCount -> Observable.timer((long) Math.pow(5, retryCount), TimeUnit.SECONDS))
);

Now say we find that while some requests fail straight away and the above handles this well, other requests hang and so we want to have a timeout apply to the whole operation so that we can return some cached data rather than appear unresponsive for too long. Suddenly the imperative approach becomes far more complicated but the RxJava approach becomes:

source.retryWhen(errors ->  
  errors
    .zipWith(Observable.range(1, 3), (n, i) -> i)
    .flatMap(retryCount -> Observable.timer((long) Math.pow(5, retryCount), TimeUnit.SECONDS))
).timeout(30, TimeUnit.SECONDS, Observable.Just(someCachedData));

So while it might take a while to get used to the syntax we get quite a lot of power with less chance of introducing bugs as we are composing and reusing well tested functionality.

Hopefully this overview of what is meant by reactive and how it will apply to what we're working on will be helpful. Here's a glossary of some React* buzzwords you might see:

Reactive Extensions (ReactiveX) encompassing RxJava, Rx.Net, RxJs are libraries for composing asynchronous and event-based programs by using observable sequences.

Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure. The reactive streams API corresponds directly to the nested classes from Java 9's java.util.concurrent.Flow class.

RxJava (specifically version 2) also adopts the Reactive Streams API.

Project Reactor is a reactive library for building non-blocking applications on the JVM based on the Reactive Streams specification. This comprises Reactor Core, the foundation libraries for Reactor including the Flux and Mono publisher types representing multiple and single element publishers respectively (See Understanding Reactive types for more details), and Reactor Netty, an asynchronous runtime for building HTTP/TCP clients or servers. In practice this often means that Netty provides HTTP server functionality to an application. Reactor supplies numeorus scheduler strategies for offloading work onto other threads to keep its single threaded event loop performant.

Spring WebFlux is a framework for building reactive web applications or services. Reactor Core provides its reactive types, e.g. Mono and Flux. As Reactor implements the Reactive Streams API, it can interoperate with other implementations, e.g. RxJava. WebFlux is the reactive replacement for Spring MVC. WebFlux can run on Netty, Tomcat, Undertow, RxNetty, etc.

Vert.x is another framework for building reactive web applications and services on the JVM but with a focus on allowing development in many different languages providing language specific APIs through code generation. Whilst the typical pattern in use is the callback model, there are also Reactive Streams and RxJava integrations. Some of the languages supported are Java, Kotlin, Scala, Ceylon and Javascript (somehow integrating with the JVM but I don't know the details). Vert.x implements what it terms the Multi-Reactor pattern, each Vert.x instance will run an event loop per core to automatically maximise resource usage without requiring the developer to deploy multiple instances as would be required with Node.js and its single event loop per process.

RxNetty is a RxNetty is a reactive extension adaptor for Reactor Netty (e.g.

The above are all JVM based reactive libaries, frameworks, specs etc. Here are some non-JVM projects you might see mentioned:

Node.js is Google Chrome's V8 JavaScript engine made available on the server to create reactive web applications. It essentially implements the Reactor Pattern and has become very popular for the development of web applications and microservices.

ReactJS is a JavaScript library for building user interfaces, the principle behind it is to declare how the user interface should be constructed and to allow React to respond to data changes and have it efficiently apply those changes to the DOM to avoid manually managing UI change state. We intend to use for the Operations Cloud. It's from Facebook.

Reactive systems, as defined by the Reactive Manifesto, is an architectural style with a focus on responsiveness, resilience, elasticity (remaining responive under varying workloads) and asynchronous message passing to ensure "loose coupling, isolation and location tranparency". This would often be achieved through a microservices architecture with services with automatic health monitoring, dynamic deployment etc. but is not directly related to the reactive programming described earlier.

And finally, note that Reactive Programming as discussed here is not the same as Functional Reactive Programming although it is sometimes mistakenly labelled as such probably due to the more functional style of APIs like RxJava. To be honest, I'm not really clear on exactly why this isn't the case but it Conal Elliot the originator of FRP was pretty emphatic about it and I think I'd need to know more about Functional Programming to really get the distinction so it's not even if I can't tell you why (though I'm pretty sure it's something to do with FRP continous time semantics...).

carmour24/whatsreactive.md

Select an option

No results found

Select an option

No results found