Monday, 8 January, 2018 UTC


Summary

Time to scale? Ready to add more containers? When you start to consider multiplying the number of containers you have powering your applications, many considerations arise. We'll walk through how to horizontally scale Docker containers using Nginx and Round Robin load balancing, then peer into how you can use Fly, Wormhole, and The Power of 2 Random Choices load balancing for an easy and low-maintenance approach.
Example code available here.
Nginx + Docker: Dock 'Yer Socks Off
We're going to look at two methods. We will start with the do-it-yourself method that uses Nginx as a proxy. After that, the Fly method, which is more of an automated, breezy path. The first method is more cumbersome -- even without some of the more challenging bits like HTTPS and TLS termination.
When you make a server instance bigger, you're scaling vertically. When you add more server instances, you're scaling horizontally. Containers make horizontally scaling a cleaner task than it has been in the past. But when you're dealing with systems at scale that need to satiate an international user-base, things are never going to be simple.
Consider that we have a base container; we'll call it templatio. Within templatio lives our application. It could be anything: a Node or Rails Application, or some super-secretive proto-type-y thing. It is one container that you will replicate to accommodate the connection and resource requirements of scale. And Scale is upon us!
Our base container, templatio is going to contain two things:

index.js

We'll write a simple HTTP server using Node; templatio is a Node application.
var http = require('http');

// We want to be able to specify the SERVER_NAME.  
// This will come in handy, later!

var serverName = process.env.SERVER_NAME || 'default';
var port = process.env.PORT || 8000;

// The server will receive requests and return the name of the server that we've specified.

var server = http.createServer(function (request, response) {
  response.writeHead(200, {"Content-Type": "text/plain"});
  response.end(`You have reached ${serverName}\n`);
});

server.listen(port);

// It's good to remind yourself when things start up!

console.log(`Server alive at localhost:${port}!`);

Dockerfile

No need to get fancy. We'll use the node:alpine base image as it's a linux distro that's less than 5MB. We want Docker to access our index.js application and run it.
FROM node:alpine
copy index.js .
EXPOSE 8000

CMD ["node", "index.js"]
... We also want the container itself to be accessible via port 8000. When you expose ports in this way, you're saying that these ports can be accessed by other containers.
This does not facilitate exposure to the wild.

Great! These two files are now within our current working directory:
~/fly-docker-example ❯❯❯ ls
Dockerfile index.js
With Docker installed, we want to build templatio into an image using the command line:
~/fly-docker-example ❯❯❯ docker build -t fly-nodejs .

Sending build context to Docker daemon  51.71kB
Step 1/4 : FROM node:alpine
 ---> 466bcf8bf36e
Step 2/4 : COPY index.js .
 ---> Using cache
 ---> edbbf6c9b105
Step 3/4 : EXPOSE 8000
 ---> Using cache
 ---> c17e716675af
Step 4/4 : CMD node index.js
 ---> Running in c87297015c36
 ---> 7606f2b7aa7d
Removing intermediate container c87297015c36
Successfully built 7606f2b7aa7d
Successfully tagged fly-nodejs:latest
Docker gives us the ability to now run as many containers as we'd like based on the templatio image. We made room for environment variables within our index.js application. We'd like to give our containers a name. The SERVER_NAME variable influences the code, while --name is relevant within Docker:
# Container #1: We'll call it Statler. It's good.

~/fly-docker-example ❯❯❯ docker run -d -e "SERVER_NAME=statler" --name=statler fly-nodejs                                                       

da732b802ad52120dfaa27c63fad4c2bb697dc0fd90ae2156772526fc7bf6928

# Container #2: We'll call it Waldorf. It's not bad.

~/fly-docker-example ❯❯❯ docker run -d -e "SERVER_NAME=waldorf" --name=waldorf fly-nodejs 

f912b61077595fce224409ddad15797d9c8677b8604b435635a1c6db67117204
When we check in with Docker, things look great so far:
~/fly-docker-example ❯❯❯ docker ps                                                                                                                
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
f912b6107759        fly-nodejs          "node index.js"     About a minute ago   Up 2 minutes        8000/tcp            waldorf
da732b802ad5        fly-nodejs          "node index.js"     2 minutes ago        Up 2 minutes        8000/tcp            statler
But what can we do with these, now? Well, we designated port 8000, let's see what happens when we try to reach them...
~/fly-docker-example ❯❯❯ curl localhost:8000                           

curl: (7) Failed to connect to localhost port 8000: Connection refused
... We can't! Remember, the port we specified is only relevant when containers are communicating amongst one another. Even if they could communicate, which entity is telling our curl where to go? How would it know whether to hit statler or waldorf? That's when we require a proxy or a load balancer.
Nginx makes an excellent proxy and a pretty good load balancer; we can integrate it into our cluster of containers.
~/fly-docker-example ❯❯❯ mkdir nginx && $_
~/f/nginx ❯❯❯ 
Within our nginx directory, we'll have two files...

nginx.conf

A proxy is what sits in front of your containers and directs traffic. It's a vital tool for facilitating horizontal scale as it indicates which traffic goes where. Service Discovery describes how the proxy discovers the things into which it should route traffic.
Given that Docker has no magic to aid in service discovery out-of-the-box, we'll need to write an nginx.conf file so that the proxy knows what to do:
upstream fly {
  server statler:8000;
  server waldorf:8000;
}

server {
  location / {
    proxy_pass http://fly;
  }
}
We're passing requests made to http://fly to our upstream proxy. We've indicated our container names and the ports that they can be reached. Does the port look fishy? It might because those are container ports as opposed to public ports; we're going to run Nginx as a container.

Dockerfile

For that, another simple Dockerfile...
FROM nginx
COPY nginx.conf /etc/nginx/conf.d/default.conf
We're going to use nginx as our base image, then over-write the default.conf with our customized nginx.conf.

Similar to before, we want to build our image:
~/f/nginx ❯❯❯ docker build -t fly-nginx .

Sending build context to Docker daemon  3.072kB
Step 1/2 : FROM nginx
latest: Pulling from library/nginx
e7bb522d92ff: Pull complete
6edc05228666: Pull complete
cd866a17e81f: Pull complete
Digest: sha256:cf8d5726fc897486a4f628d3b93483e3f391a76ea4897de0500ef1f9abcd69a1
Status: Downloaded newer image for nginx:latest
 ---> 3f8a4339aadd
Step 2/2 : COPY nginx.conf /etc/nginx/conf.d/default.conf
 ---> 2154a0cde02a
Successfully built 2154a0cde02a
Successfully tagged fly-nginx:latest
Nginx is a web-server. By default, Nginx can be accessed via port 80. Ah, port 80. She's a lovely, busy port. Using Docker, we can link containers to our Nginx proxy.
docker run -d -p 8080:80 --link statler --link waldorf fly-nginx
We've specified our port like this: x:x. The first value, 8080, applies to the host. Within our example, my Macbook Air is the host. The second port is the container. As it's a web server first, it's going to be listening to standard HTTP port 80. We chose 8080 and not 8000 because there is no relationship between the container port we specified and the one we're assigning to our host. It's a choice for clarity, to indicate separation.
Now, when we apply curl...
~❯❯❯ curl localhost:8080                                         

You have reached statler

~ ❯❯❯ curl localhost:8080                                                     

You have reached waldorf
Awesome! By default, Nginx applies Round Robin load balancing. This means that incoming connections will alternate, one after the other, to our containers: statler, waldorf, statler, waldorf ~ ~ ~... There are different load balancing methods one can apply, your nginx.conf expanding in complexity to match.
But what happens when one of your servers is removed from your pool? Servers go down sometimes.
After removing waldorf via docker stop waldorf, here's what happens when we try again:
~ ❯❯❯ curl localhost:8080                                         

You have reached statler

~ ❯❯❯ curl localhost:8080                                                     

// 10 second pause

You have reached statler
Nginx will apply the default value of 10 seconds for timeout and 1 for retries in the event that a server cannot be reached. This works alright, but that means a connection will get hit by that delay; if they hit a retry, that's a default 20 seconds. If we start up the container again, using docker start waldorf, we'll see waldorf return to the rotation.
... And now for something completely different.
Fly + Wormhole: My Name Is Wormholio
Wormhole is an open source utility released by Fly. It securely connects two end-points. Let's it connect two of our end-points. We're going to be bad, though, and hard-code some token values in this section. Remember, always be sure that your sensitive tokens and values are protected using a sane .env flow!

Dockerfile (Revised)

Wormhole can be added to your application in several ways; there's a Heroku build pack, binaries for all operating systems, among other things. Before we reconfigure our base templatio image, we want to head over to Fly.io and set-up our site. It all begins with a hostname...
Adding your hostname creates a project. Within the project, our next stop is the Backends menu:
We'll need to retrieve a value from here - our FLY_TOKEN.
There are a few ways to add Wormhole to Docker, but the most simple of them is to ADD Wormhole and then pre-pend it to run as the first thing within your CMD:
FROM node:alpine
COPY index.js .
EXPOSE 8000

# We want to specify FLY_LOCAL_ENDPOINT so that we know it expects inter-container communication. Should match the container EXPOSE value. 
ENV FLY_LOCAL_ENDPOINT 0.0.0.0:8000

# This will download Wormhole every time. You can add it to the project dir, instead.
ADD https://github.com/superfly/wormhole/releases/download/v0.5.36/wormhole_linux_amd64 /wormhole

# Make it executable

RUN chmod +x /wormhole

# Run, starting with the path to Wormhole, which we specified at the end of our ADD line.

CMD ["/wormhole", "node", "index.js"]
After the Dockerfile is set, we can then build the image using docker build -t fly-wormhole .. Now, time for our replicants! We create them, just like before:
~ ❯❯❯ docker run -d -e "SERVER_NAME=statler" -e "FLY_TOKEN 94bdae7f898ce9ed357b4c6cf1248a5771f1541cc5d0a627eaf677fdaaee2491" --name=statler fly-wormhole                                                     
f179d44e380fd4e4d299266906c495b41acc35f65899b0606e17029045bfaa99

~ ❯❯❯ docker run -d -e "SERVER_NAME=waldorf" -e "FLY_TOKEN 94bdae7f898ce9ed357b4c6cf1248a5771f1541cc5d0a627eaf677fdaaee2491" --name=waldorf fly-wormhole

57c9795ebb93b9173d453763e0ff1d8af5af1dbaa2db447571a58cd58531d7be
But unlike before, we have no need to link these instances to an Nginx proxy. Fly is going to act as the proxy, instead. It's going to act as a reverse-proxy-as-a-service, load balance using a niftier algorithm, and provide HTTPS, too!
When we refresh our backends page, we can see that there are now two secure instances! What if we curl our hostname, now?
~ ❯❯❯ curl wormhole.goodroot.ca                     

You have reached statler

~ ❯❯❯ curl wormhole.goodroot.ca                                               

You have reached waldorf
Radical. Earlier, we mentioned service discovery. Each time a container is added or removed, it'll automatically join or exit the rotation. The algorithm applied for load balancing is The Power of Two Random Choices. We have an article that explains the algorithm in-depth, for those who are interested. Its use results in smooth, even dispersal of traffic across available nodes.
By adding your hostname to Fly, you have unlocked a global Application Delivery Network. Part of that is HTTPS. Your hostname receives an automatically renewing Let's Encrypt HTTPS certificate:
~ ❯❯❯ curl https://wormhole.goodroot.ca                                       

You have reached waldorf

~ ❯❯❯ curl https://wormhole.goodroot.ca             

You have reached statler
By leveraging that global network, Fly can terminate TLS connections much closer to your visitors, then blast everything in a backhaul connection through Wormhole, to your application end-points. This is true end-to-end encryption; your traffic won't bobble around un-encrypted in a CDN data-centre, somewhere. It's optimized for speed, too, given the shorter termination distance.
Summary
Docker enables smoother scaling. You can use Nginx as a proxy to horizontally scale your containers to meet demand. As we demonstrated, much of the work is hands-on. We didn't incorporate HTTPS or a hostname, and we observed that without automated service discovery, you're fandangling configuration files.
Fly gives you that service discovery, allowing you to simply sunrise more containers when you need them or sunset them when you do not. Some of the trickier bits, like reverse-proxying HTTPS traffic and terminating it effectively, are part of basic onboarding when you add your hostname to Fly. On top of all that, you receive access to a whole array of edge Middleware, giving you control over the request/response cycle.
Fly started when we wondered "what would a programmable edge look like"? Developer workflows work great for infrastructure like CDNs and optimization services. You should really see for yourself, though.