After finding out that Node.js is in its nature single-threaded, we might think that all the cores of your processor are meant to go to waste, but this is not necessarily the case. In the previous part of this series, we prove that Node.js creates additional threads under the hood by itself. Not only that, but you can create new processes using the Child Processes module and communicate between them. In this article, we learn how to launch a cluster of Node.js processes that share server ports. The above means that you can distribute the workload over multiple processes automatically.
Node.js Cluster: sharing the workload between multiple processes
Handling a significant number of users attempting to connect to your API might put quite a strain on a single process of yours. Fear not, because there exists an elegant solution for balancing the load over multiple processes thanks to the cluster module. Child processes that you create with it all share server ports. That means that all your processes work under the same address. When someone makes a request to your API, the master process accepts new connections and distributes them across the child processes.
The first thing to do would be to decide how many processes to start. To do that, we use the
os module. It provides us with utilities related to the operating system that runs our code. One of them, the
os.cpus()
, can give us information about the cores of our CPU. The function above returns an array of objects representing each CPU core. The information that we need is how many do we have.
import * as os from 'os';
const numberOfCores = os.cpus().length;
Thanks to that information, we can start one process per CPU core.
Another piece of information that we need is whether the current process is a master. You could determine it through the
process.env.NODE_UNIQUE_ID
environment variable, but if you look into the source code of Node.js you can see that at some point it gets deleted. Fortunately, the
cluster module exports a boolean property called
isMaster.
Forking our process
Once we got all of the above, we can call the fork function. It spawns a new process, and you can only call it from the master.
import * as cluster from 'cluster';
import * as http from 'http';
import * as os from 'os';
const numberOfCores = os.cpus().length;
if (cluster.isMaster) {
console.log(`Master ${process.pid} started`);
for (let i = 0; i < numberOfCores; i++) {
cluster.fork();
}
} else {
http.createServer((req, res) => {
res.writeHead(200);
res.end(`Process ${process.pid} says hello!`);
}).listen(8000);
console.log(`Worker ${process.pid} started`);
}
The same code executes for every process but behaves in a different manner depending on whether it is a master process or not. Let’s try it out using Postman!
Scheduling policy
If you make the request multiple times you observe, that you always get the same response. The above is caused by the way that Node.js chooses the process that should handle upcoming requests.
You can change this policy using the
cluster.schedulingPolicy
property. Unfortunately, if you look into the DefinitelyTyped repository where the
@types/node
comes from, it is currently marked as
TO DO. It means that we need to assert the type of the
cluster module to avoid
TypeScript compilation errors.
On every platform except Windows,
cluster.schedulingPolicy
defaults to
cluster.SCHED_RR
, which represents the
round-robin approach.
console.log((cluster as any).schedulingPolicy === (cluster as any).SCHED_RR); // true
Round-robin should imply distributing connections across process in equal portions, but the Node.js documentation mentions some “built-in smarts”. If you apply some more significant traffic, the load distributes across more processes, so don’t worry!
Let’s test it out in the browser. For this purpose, I respond with just the process identifier (PID) and wait a second before replying to make the test more realistic.
setTimeout(() => {
res.end(process.pid.toString());
}, 1000);
In the code above we count the number of times each process is used. While it can be far from perfect, it gets the job done.
The second approach to scheduling includes master process creating a “listen socket” and sending it to interested workers, leaving the scheduling to the operating system. Instead of having the master process distributing the load, the workers accept incoming connections directly. While it should have better performance, the distribution tends to be unbalanced due to the scheduler of the operating system sometimes behaving unexpectedly.
The events in the cluster
Both the
cluster module and workers returned by the
cluster.fork()
emit events. One of the most useful is the
“exit” event that you can use to restart any workers if they stop working. Another event that we can use is the “
online” event: for example to log the activity.
import * as cluster from 'cluster';
import * as http from 'http';
import * as os from 'os';
const numberOfCores = os.cpus().length;
if (cluster.isMaster) {
console.log(`Master ${process.pid} started`);
for (let i = 0; i < numberOfCores; i++) {
cluster.fork();
}
cluster.on('exit', (worker) => {
console.log(`worker ${worker.process.pid} stopped working`);
cluster.fork();
});
cluster.on('fork', (worker) => {
console.log(`Worker ${worker.process.pid} started`);
});
} else {
http.createServer((req, res) => {
res.writeHead(200);
res.end(`Process ${process.pid} says hello!`);
}).listen(8000);
}
The individual workers also emit those events:
const worker = cluster.fork();
worker.on('online', () => {
console.log(`Worker ${worker.process.pid} started`);
});
worker.on('exit', () => {
console.log(`worker ${worker.process.pid} stopped working`);
cluster.fork();
});
You can also use the “
message” event and the
send()
function to communicate between the master and the workers in a very similar way that we did in the previous part of the series with the
child process module.
import * as cluster from 'cluster';
if (cluster.isMaster) {
const worker = cluster.fork();
worker.on('message', (text) => {
console.log(text);
})
} else {
process.send('Hello!')
}
Summary
In this article, we covered the basics of using the cluster module to handle many child processes. It turns out to be very useful when load balancing our application. With the cluster module, we forked our app into multiple processes managed by the master process. This makes use of multiple cores of your processor and aims to increase the performance of your application.
The post Node.js TypeScript #11. Harnessing the power of many processes using a cluster appeared first on Marcin Wanago Blog - JavaScript, both frontend and backend.