Friday, 26 February, 2021 UTC


Summary

A quick note about myself

I’m a self-taught developer based in Paris, I worked as a freelancer for a few years, before starting to work for my family's company Place-des-Arts.
Place-des-Arts is an art gallery specialized in modern and contemporary prints with more than 20 000 references, feel free to check out the open beta here.

Scaling my Apps

It’s been almost 3 years since I’ve started to use Meteor for my work, and it’s been great!
During this time I met several issues with scalability and used a microservices approach to tackle heavy/scheduled/recurring jobs problematics such as:
  • Resource generation (images, pdf, JSON, CSV)
  • Automated React templated emailing
  • Automated Database backup to an external server
  • External services such as payment and delivery providers
  • Static React templating generation
  • Search engine management
  • Sitemap management
At some point I found myself reinventing the wheel with underlying inter-dependencies, and I thought:
“why not going back for an all-meteor solution”?
I decided to migrate all my external services back in Meteor by implementing a multiprocess job queue (or worker pool) using the node cluster module.
I was not deceived: not only my server-side codebase was reduced by more than 40% 😮, but performances were better, and managing my code became a piece of 🍰 ​!
That’s why I’m thrilled to introduce to you the official release of nschwarz:cluster, a brainless multi-core solution for your app 🥳
It supports:
  • Both async and sync jobs
  • Both in-memory and mongoDB job queues
  • scheduled and recurring jobs
  • Job prioritization
  • IPC messaging between master and worker
  • Event listeners
You can find the full documentation and examples here.

A quick worker pool summary

A worker pool is inspired by both the master/slave and thread pool models, and it's composed of:
  • A job queue, which stores all the jobs in order
  • A Master, which supervise the queue and the workers
  • Workers, which execute the tasks sent by the Master on individual processes
The master checks if jobs are in the queue, if so, It forks itself to create workers.
The worker then receive the job, execute it, and tells the master the job's done.
The master removes the job, if no more jobs are available it closes the worker.
This behavior is repeated at regular intervals.
Using this approach has many advantages :
  • It enables you to offload your main event loop (which will result in more responsiveness)
  • The offloaded tasks are executed more efficiently since they are not deferred by other events
  • It enables you to write fewer, clearer, and modular code because you’ll need to break it down into subroutines
  • It becomes painless to design scheduled and recurring events
  • It’s easier to manage your application growth (both in code, and performance)

A real-world usage

Race condition safe event management and failure proofed mailer:
When a customer makes a payment on the website, 2 events occur:
  • As soon as my bank receives a payment, it sends a POST request to my server to notify me that a payment was made.
  • As soon as the payment is made on the bank website, the customer is redirected to a page saying “thank you”, “sorry, the payment was refused”, or “checking payment”. If the status is “checking payment”, the page will pull from the server at regular intervals until the status is updated to either success or failure.
In both cases, my server runs the same routine which:
  • Gets the payment status from my bank through their API
  • Updates the order in my database
  • Updates the products stocks
  • Sends an email to the customer with the order summary
Because NodeJs is non-blocking and these 2 events are often triggered at the same time, a race condition could happen, and this routine could be fired multiple times.
Plus if an error occurs while sending the email, I need to safely save it to send it back later.
Thus, I can split this routine into three subroutines :
// gets the payment status from my bank through their API
// update the order in my database
async function onNewOrder(job) {
const order = Orders.findOne({ _id: job.data._id })
  // abort if order has already been handled before
if (order.status !== 'WAITING_PAYMENT') {
return
}
  const paymentStatus = await getBankPaymentStatus(order.paymentId)
order.paymentStatus = paymentStatus
  if (paymentStatus === 'accepted') {
order.status = 'WAITING_ACCEPT'
order.save()
/* technically this is not called here,
but in a hook after the save event */
handleStocks(order.products)
TaskQueue.addTask({
taskType: 'sendMail',
data: { orderId: order._id, mailType: 'onNewOrder' },
priority: 5
})
} else if (paymentStatus === 'refused') {
order.status = 'REFUSED_PAYMENT'
order.save()
}
// if neither of the 2, paymentStatus === 'waiting_payment'
}
// updates the products stocks
function handleStocks(products) {
products.forEach(p => {
const product = Products.findOne({ _id: p._id })
product.stock -= p.qty
product.save()
})
}
// sends an email to the customer with the order summary
function sendMail(job) {
const { orderId, mailType } = job.data
const order = Orders.findOne({ _id: orderId })
const html = getReactMailTemplate({ order, mailType })
Email.send({
html,
to: order.customer.email,
from: 'somemail@somedomain',
subject: getMailSubject({ order, mailType })
})
}
Now I can safely do :
// called by the client / bank requests
function getOrderStatus(_id) {
const order = Orders.findOne({ _id })
if (order === undefined) {
throw new Error(`order${_id}:NOT_FOUND`)
}

if (order.status === 'WAITING_PAYMENT') {
const taskId = `onNewOrder${order._id}`
const exists = TaskQueue.findOne({
taskType: 'onNewOrder',
_id: taskId
})
if (exists === undefined) {
TaskQueue.addTask({
_id: taskId,
taskType: 'onNewOrder',
data: { _id },
priority: 11
})
}
}
return order.paymentStatus
}
It’s now race condition safe because:
  • You can only have one task with the _id : onNewOrder${order._id} at the same time.
  • Even if the task is called a 2nd time (there’s a small window), it will abort because order.statuswas previously modified.
Now let’s talk about the mailer safety net:
One time my mail server was down for a short period, leading to some unsent mails.
Because I hadn’t thought about this, some emails never went to the customers without me knowing for a while 😨.
Thankfully the failed tasks are kept in the queue, so I was able to send them back 🙏 .
To avoid any further fuss, I made a simple mechanism using an event listener :
import { add } from 'date-fns'
function onTaskError({ value, task }) {
if (task.taskType === 'sendMail') {
// notify the admins through a Notification Collection
Notifications.insert({
type: 'MAIL_ERROR',
task: task._id,
...someData
})
// retry in 30 minutes
const dueDate = add(new Date(), { minutes: 30 })
TaskQueue.update({ _id: task._id }, { $set: {
dueDate, onGoing: false
}})
}
}
It’s now time to configure the cluster :
const taskMap = {
onNewOrder,
sendMail,
}
const cluster = new Cluster(taskMap, { maxAvailableWorkers: 4 })
Meteor.startup(() => {
if (Cluster.isMaster()) {
TaskQueue.addEventListener('error', onTaskError)
}
})

What’s next?

Performances upgrades are coming up :
  • Using UNIX sockets instead of TCP for IPC which is already available in meteor
  • Adding a settable keepAlive field for each individual worker to reduce the cost of worker startup/shutdown routine
  • Sending back a job to a worker as soon as a job is finished instead of waiting for the next cycle
  • Putting up a meteor core pull request to fully disable the TCP server with an environment variable to reduce the cost and delay of the workers' startup
features roadmap :
  • Multiple “filterable” event listeners
  • Support for load balancing (to be discussed) :
As I went through the meteor feature request repo and the forum, I saw that one of the issues with scaling meteor apps was load-balancing (you can find one thread here).
Some meteor packages such as meteorhacks/cluster or arunoda/meteor-cluster were developed originally for load-balancing but have not been updated for a while (more than 5 years).
Since this package uses forked processes, assigning workers to handle requests passed from the master instead of working on tasks would be a no-brainer to integrate.
Making an isomorphic worker could be interesting :
It could handle requests when traffic is high and switch back to handling jobs when the network load is manageable by the master.
If you’re interested, let me know 😉
  • Meteor core integration for build phases :
There’s a lot of complaints about meteor building time, providing the ability to build the app using multiple cores would be one of the solutions to reduce delay.
Using multi-core on the soon to be tree-shaking feature is also in discussion.
Get started with meteor add nschwarz:cluster!
Please let me know if you have any feedback!

Scale your app with multi-core was originally published in Meteor Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.