Asynchronous I/O

July 26, 2019

This is more a record of notes for my own understanding. I’m not contributing anything novel here, although a lot of the cited sources are.


The fundamental insight of Node.js is two-fold.

  1. I/O is almost always the slowest operation a process has to deal with in its execution. It can be this way for many reasons—network latency, disk operations, and not insignificantly, waiting for human input (like in the browser).
  2. Maintaining a thread for each concurrent operation is wasteful, because it spends a lot of time waiting for I/O operations to complete.

Recognizing that I/O is slow, it becomes obvious that blocking I/O is wasteful.

Blocking vs. Non-Blocking I/O

Per the Node.js docs:

Blocking is when the execution of additional JS in the Node.js process must wait until a non-JavaScript operation completes [5].

When the main thread is blocked, it cannot continue to perform operations while stuff happens in the background. Blocking occurs synchronously.

Note that concurrency in Node.js actually refers to the event loop being allowed to attend to other things while some I/O (a non-JavaScript operation) occurs. Tasks are spread out over time instead of threads [3]

With non-blocking I/O, the system returns immediately without waiting for the result of an asynchronous operation. We’ll analyze in the recipes section how can define work that we’d like to be done after the async operation completes, through callbacks and promises, but first let’s dive in a little deeper into the event loop.

The Event Loop

The event loop is the mechanism through which the single thread executes your JavaScript while also handling timers and I/O.

With each iteration, it cycles through various phases, like the timer phase, which executes callbacks scheduled by setTimeout and setInterval, or the I/O phase, which deals with networking related tasks, like server connections.

It also has a worker pool that maintains worker threads, one of which would be used if we wanted to interact with the filesystem, for example.

When some new operation is kicked off (e.g. a timer is added to the timer heap, the program is notified by the OS of a new server connection, a worker thread is given a new task), we increment a reference counter. When a task is completed, we decrement the reference counter.

When the event loop detects there is no more work to be done—when the reference counter is 0, we exit the process.

To learn more about the event loop, watch this talk by the creator of libuv (Node’s internal async I/O library), and read the Node.js docs’ Event Loop Guide.


Node.js Asynchronous Recipes

Callbacks

A synchronous callback is invoked before a function returns, an asynchronous one after [1]. Note that mixing the two execution styles in the same function (having a callback sometimes called asynchronously and other times not) is a bad practice [1, 2].

An Asynchronous function with a callback

const fs = require('fs')
fs.readFile('filename.txt', function read(err, data) {
if (err) throw err
console.log(data)
})

A common concern when dealing with callbacks is callback hell, where callbacks are put inside callbacks ad nauseum, until it’s unfeasable to reason about the execution order and closures, and code becomes fragile and difficult to change. Defining the callbacks separately and using a more pointfree style can mitigate this problem a bit, as well as simplifying the design. For this reason, I personally find async / await the easiest to reason about.

I recommend Node.js Design Patterns chapters 3 & 4 for how to chain asynchronous operations, run them “in parallel” (execution is carried out by underlying non-blocking API and interleaved by the event loop), and even limiting the number of tasks that run at any given time using a queue.

Promises

A promise is an object representing the eventual result of an async operation [3]. It can have one of three states:

  • pending
  • fulfilled
  • rejected

You can chain promises together with .then() and .catch(), since those functions return Promises as well.

fetch("https://www.starwars.com/")
.then((response) => console.log(response.status))
/* if run in Chrome console */
// Promise {<pending>}
// 200

Check out the MDN docs to see additional Promise related methods, e.g. Promise.all.

async / await

async / await is a syntactically cleaner way to use Promises. They use an implicit Promise, but they look a bit more like synchronous code, which arguably makes them easier to reason about.

Here’s an adapted version of the MDN async / await example.

function resolveAfter2Seconds() {
return new Promise(resolve => {
setTimeout(() => resolve('resolved'), 2000)
})
}
async function asyncCall() {
console.log('calling')
const result = await resolveAfter2Seconds()
console.log(result)
}
console.log(asyncCall())
console.log("other stuff happening")

calling
Promise { <pending> }
other stuff happening
resolved

Notice the transfer of control that occurs upon encountering the await keyword. A promise in the pending state is returned, and until it is resolved, it is not put on the event queue or attended to by the event loop.

Summary

The pattern discussed above is called the Reactor pattern. It is not exclusive to Node.js, but Node is the most successful example. As one of my colleagues pointed out, you have to go out of your way in Node.js to block the main thread, whereas in Java you have to go out of your way to not do so. This, combined with the social / cultural benefit of at least having a chance of being fullstack (if you use Node.js on the backend) make Node.js a really good architectural choice.

Sources

  1. Havoc’s Blog
  2. Don’t Release Zalgo
  3. Node.js Design Patterns
  4. MDN Promises
  5. Overview of Blocking vs. Non-Blocking