Thursday, 15 March, 2018 UTC


Summary

Have you ever wanted to serve bunch of different apps on the same hostname? We have, and earned VCL / HAProxy / nginx / mod_rewrite scars to show for it. It's a simple idea that's excruciating to implement.
Fly Edge Apps make sophisticated routing as easy as it should be. There aren't any janky config files, just good ol' Javascript logic. Logic that can be tested, iterated, and shared.
Our One Hostname example application demonstrates routing between multiple backends.
const mounts = {
  '/example': backends.generic("https://example.com"),
  '/heroku': backends.heroku("example"),
  '/surge': backends.surge("onehostname"),
  '/debug': debug,
  '/': backends.githubPages("superfly/onehostname-comic")
}

fly.http.respondWith(routeMounts)
Easy peasy, right? The magic is really in those backends. Fly Edge Applications have access to the standard fetch API. Each backends.* method call generates an appropriate proxy generator function that accepts a Request, tweaks it, grabs a Response from the origin, and tweaks that before ultimately returning it to the visitor.
Here's what happens when you visit (https://onehostname.com/heroku).
The edge app knows that backends.heroku("example") should generate a specific fetch function to handle that request:
export const heroku = function (appName) {
  return function herokuFetch(req, basePath) {
    const herokuHost = `${appName}.herokuapp.com`
    const headers = {
      'host': herokuHost,
      'x-forwarded-host': req.headers.get("hostname")
    }
    return proxy(req, `https://${herokuHost}`, { headers, basePath })
  }
}
Heroku apps are always available at https://<appName>.herokuapp.com. But they expect a very specific Host header. Browsers always send exactly what's in the address bar (Host: onehostname.com), but Heroku has no idea what to do with that. So we have to make sure to set Host: <appName>.herokuapp.com before sending the request.
But! The apps running on Heroku do need to the original hostname so they can build links properly. Most popular web frameworks support the X-Forwarded-Host: onehostname.com header, so if we set that we're getting closer.
However! The original request was for /heroku, and the Heroku app itself works at /. In Javascript-world, that's called pathname, so we can just strip the basePath out of the pathname before forwarding it along.
if (opts.basePath && typeof opts.basePath === 'string') {
  // remove basePath to serve `onehosthame.com/heroku/`
  // from `example.herokuapp.com/`
  url.pathname = url.pathname.substring(opts.basePath.length)
}
Phew. This was the simple example. Applying all this logic in config files used to mean whipping out gross regular expressions and saying a small prayer. Miserable.
Which made it downright impossible to properly support a more peculiar origin ... like GitHub Pages.
GitHub Pages have two different URL structures. The "canonical" URL for a repository looks like <organization>.github.io/<repo>. But some repositories have a CNAME specified in the project settings, and GitHub expects onehostname.com.
We can detect which configuration a given repository is running with the canonical GitHub Pages URL and looking for a Location header:
➜  curl -I https://superfly.github.io/onehostname-comic
HTTP/2 301
date: Thu, 15 Mar 2018 21:29:51 GMT
server: GitHub.com
content-type: text/html
location: http://onehostname.com
The Location header indicates what host header and path GitHub expects for this repository. Which means we can magically detect (a) which URL format to send, and (b) discover changes so people don't unintentionally break things by changing their Pages config.
Which makes backends.githubPages a little more complex than Heroku:
export const githubPages = function (repository) {
  // we're doing more with the response than the others, making this async
  // enables `await` in the function body
  return async function githubPagesFetch(req, basePath) {
    const [org, repo] = repository.split("/")
    const ghHost = `${org}.github.io`
    const headers = {
      host: ghHost
    }
    let path = '/' // cnames use /, non cnames use /<repo>/
    
    // check for cached cname
    let hostname = await fly.cache.getString(`github:${repository}`)
    
    let resp = null
    if (!hostname) {
      // no cname, use <org>.github.io/<repo>
      path = `/${repo}/`
      resp = await proxy(req, `https://${ghHost}${path}`, {
        basePath,
        headers
      })
   
      let location = resp.headers.get('location')
      if (location) {
        //github is redirecting us, which means this has a cname
        resp = null // gotta get a new response
        const url = new URL(location)
        hostname = url.hostname

        // cache the cname for 5 min before checking again
        if (hostname){
          await fly.cache.set(`github:${repository}`, hostname, 300)
        }
      } else {
        return resp
      }
    }
    // if we got here, need to fetch with the hostname and no path
    headers.host = hostname
    return proxy(req, `https://${ghHost}`, { basePath, headers })
  }
}
In mostly English, this builds a proxy that:
  • Checks fly.cache for a project CNAME
  • If there's no Cache entry
    • Request https://<organization>.github.io/repo/
    • If the response is good (no Location header), return that
    • Otherwise set hostname for future use
    • ... and cache it with fly.cache.set for future requests
  • Assuming we haven't returned a response yet
    • Set the host header to the CNAME value
    • Make a new request to <organization>.github.io
    • Return that response
Way back in `16, we started building Fly by generating and reloading nginx configs. One of the very first things we discovered was that this level of relatively simple logic ("get something from a GitHub Pages enabled repository") was exceedingly difficult to teach nginx. And we frequently had to offload all that decision making to our customers, which is kind of a silly UX. No one likes answering questions a computer should be able to figure out for itself.
You might have noticed that we only solved half the problem: this demo rewrites requests themselves, but doesn't rewrite the response content. Stay tuned for part 2 of this series, where we do magic like this:
const mounts = {
  // ghost
  "/articles": (req) => {
    return ghostImageMiddleware(req, backends.ghost)
  }
}
You can see that in action if you view source on this page. We write our posts in Markdown, and let our Edge App rewrite HTML, insert responsive image tags, and even resize and optimize images on-the-Fly. It's delightful.
Fly started when we wondered "what would a programmable edge look like"? Developer workflows work great for infrastructure like CDNs and optimization services. You should really see for yourself, though.