Improving My Site With Cloudflare Workers and Amazon S3

Table of Contents

So as of now, anything much more over 1 MB I’m not going to take up space on all my devices hosting — I’ll just offload it to someone else. And how do I get it back? JavaScript.

No wait, I’m actually serious. As of now, I essentially have Cloudflare as a CDN that’s backed by the Amazon S3 storage service. Posts such as the one about light balance have everything but the featured image… not here anymore.

So, let’s start with the basics: what are Amazon S3 and Cloudflare Workers?

Amazon S3

One of the many different services offered in… Amazon Web Services (AWS), S3 (Simple Storage Service) is an object storage system (so, files). You as a user can create “buckets” that you stuff your stuff into. If I say, for example, had a file named important.zip, in a bucket named test, in the Ohio data center, I could access it with the URL https://test.s3.us-east-2.amazonaws.com/important.zip. Note that there’s no mention of your account in there, all bucket names have to be globally unique, which is why the one I use has a unique prefix.

BIG IMPORTANT NOTE: DO NOT PUT PERIODS IN YOUR BUCKET NAME. data-teknikaldomain is valid, but data.teknikaldomain will cause every fetch to have a TLS handshake error: invalid certificate. The Amazon S3 TLS certificate for, let’s say, the us-east-2 region, is *.s3.us-east-2.amazonaws.com. That wildcard does not count for multiple subdomain levels, meaning that the TLS certificate for your bucket will not match the name of your bucket. This is why CloudFront won’t take buckets with dots in their name.

So this is the storage part, I upload files, grant them public read access, and off we go. It only costs around 2.3¢ per GB per month for the simple level, which isn’t that bad. But now that I can store stuff elsewhere, I need to actually get it. That’s where the workers come in.

Cloudflare Workers

Cloudflare offers a service for free called Workers, which allows you to run JavaScript (or WebAssembly) at their edge, meaning before requests start getting passed through the Cloudflare network. At the free tier, I can have 100,000 worker invocations (so, web requests) per day, and they have a limit of 10ms CPU (not actual) time before being aborted.

Workers can be used, on the paid plan, to, say, serve an entire static content site (like this one!) using nothing but Workers. A completely serverless infrastructure.1 Or, in my case, you can just.. grab data.

Workers are given a route, which is basically a really restricted RegEx for a URL, and if the URL matches, then the worker is executed, and it handles the request instead of going to the origin server. Workers can perform their own requests, synthesize new responses out of thin air, manipulate Cloudflare’s cache, and.. just do a ton of stuff.

Workers can be written in raw JavaScript, JavaScript that gets packed into WebAssembly on upload, or Rust, that gets packed into WebAssembly on upload. You can even take advantage of your favorite Node modules, meaning all sorts of cool stuff.

So, workers can intercept a request, pretty much at the earliest point, and handle it. So how do I marry these two systems?

The Result

You’ll notice now that any file not stored locally now has a URL like this: https://teknikaldomain.me/cdn?fetch=image.png. They can have a few query parameters, but the most important one is fetch which is the path to the file in the S3 bucket.

Here’s the code that gets run the moment you request that, aptly named cdn-serve:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
addEventListener('fetch', event => {
    event.respondWith(handleRequest(event))
})

async function handleRequest(event) {
    const request = event.request
    // Block all methods except GET and HEAD
    if (request.method === 'GET' || request.method === 'HEAD') {
        let response = await serveAsset(event)
        // Set error code if error
        if (response.status > 399) {
            response = new Response(
                'Error fetching resource from CDN: ' + response.statusText,
                { status: response.status },
            )
        }
        return response
    } else {
        return new Response(null, { status: 405 })
    }
}

function checkWebPSupport(url, img, request) {
    const webpRegex = /image\/webp/ // Match string "image/webp" in header data
    const urlStr = String(url)
    const acceptHeader = request.headers.get('Accept')
    // If the header exists AND contains WebP accept
    if (acceptHeader !== null && webpRegex.exec(acceptHeader) !== null) {
        let newUrl = urlStr.replace(/\.png$/, '.webp') // Replace PNG resource with WebP one
        return new Response(null, {
            status: 302,
            headers: { Location: newUrl },
        })
    }
    return null // Check failed, do not modify
}

async function serveAsset(event) {
    const request = event.request
    const reqHead = request.headers
    const url = new URL(request.url)
    const fields = url.searchParams
    let item = fields.get('fetch')
    const attach = fields.get('attach') == null ? 'inline' : 'attachment' // 'attachment' if it exists, 'inline' otherwise

    // Preconditions
    // fetch param actually specified
    if (item === null) {
        return new Response(null, { status: 400 })
    }

    // If requesting a PNG and have the 'upgrade' param, check upgrade capability
    const itemTypeRaw = item.split('.')
    const itemType = itemTypeRaw[itemTypeRaw.length - 1] // Everything from the final . after
    if (itemType.toLowerCase() == 'png' && fields.get('noupgrade') === null) {
        const response = checkWebPSupport(url, item, request)
        if (response) {
            return response
        }
    }

    // Do fetch
    const fetchHeaders = new Headers({
        'User-Agent': 'Cloudflare worker CDN-Serve/1.0.0',
    })
    if (reqHead.get('If-None-Match') !== null) {
        fetchHeaders.set('If-None-Match', reqHead.get('If-None-Match'))
    }
    if (reqHead.get('If-Modified-Since') !== null) {
        fetchHeaders.set('If-Modified-Since', reqHead.get('If-Modified-Since'))
    }
    if (reqHead.get('Range') !== null) {
        fetchHeaders.set('Range', reqHead.get('Range'))
    }

    const response = await fetch(
        `https://${BUCKET_NAME}.s3.${BUCKET_REGION}.amazonaws.com/${item}`,
        { headers: fetchHeaders },
    )

    // S3 returns a 403 Forbidden if not found, transform that into a 404 Not Found
    if (response.status == 403) {
        return new Response(null, { status: 404 })
    }

    const respHead = response.headers
    const type = respHead.get('Content-Type') || 'application/octet-stream' // 'octet-stream' is a good default for unknown content
    const itemNameP = item.split('/') // Split item name on path segments (extract raw name later)

    // Set headers
    const headers = new Headers({
        'Cache-Control': `public, max-age=604800`,
        'Content-Type': type,
        'Content-Disposition': `${attach}; filename="${
            itemNameP[itemNameP.length - 1]
        }"`, // Just the file name, not any path before it
        ETag: respHead.get('ETag'),
        'Last-Modified': respHead.get('Last-Modified'),
        'CF-Cache-Status': respHead.get('CF-Cache-Status')
            ? respHead.get('CF-Cache-Status')
            : 'UNKNOWN',
        Date: respHead.get('Date'),
        'Accept-Ranges': 'bytes',
    })
    if (respHead.get('Age') !== null) {
        headers.set('Age', respHead.get('Age'))
    }
    if (response.status == 206) {
        headers.set('Content-Range', respHead.get('Content-Range'))
    }

    return new Response(response.body, { ...response, headers })
}

The first bit of this code, the addEventListener, is complete boilerplate.

The rest though.. let’s take it piece by piece.

handleRequest

 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
async function handleRequest(event) {
    const request = event.request
    // Block all methods except GET and HEAD
    if (request.method === 'GET' || request.method === 'HEAD') {
        let response = await serveAsset(event)
        // Set error code if error
        if (response.status > 399) {
            response = new Response(
                'Error fetching resource from CDN: ' + response.statusText,
                { status: response.status },
            )
        }
        return response
    } else {
        return new Response(null, { status: 405 })
    }
}

There’s not too much logic here before it’s handed off to serveAsset, but it does check one thing before and one thing after. Before going to serveAsset, it makes sure that only GET and HEAD requests are allowed, and the rest will return a 405 Method Not Allowed. After fetching, if the response is an error (either 4xx or 5xx), it makes sure to pass through the status, and respond with an error message.

In the future, I might do something like request a special error page and load that in instead, but that’s beyond the scope of this right now.

checkWebPSupport

23
24
25
26
27
28
29
30
31
32
33
34
35
36
function checkWebPSupport(url, img, request) {
    const webpRegex = /image\/webp/ // Match string "image/webp" in header data
    const urlStr = String(url)
    const acceptHeader = request.headers.get('Accept')
    // If the header exists AND contains WebP accept
    if (acceptHeader !== null && webpRegex.exec(acceptHeader) !== null) {
        let newUrl = urlStr.replace(/\.png$/, '.webp') // Replace PNG resource with WebP one
        return new Response(null, {
            status: 302,
            headers: { Location: newUrl },
        })
    }
    return null // Check failed, do not modify
}

This one is “simple” in its purpose, but its implementation is a little weird.

When your browser requests a resource, it will send over an Accept header to the server to announce what content type it is willing to handle in return. You can see it in this sample request:

GET https://teknikaldomain.me/cdn?fetch=stickers/td-mdp.png HTTP/1.1
Host: teknikaldomain.me
Accept: image/webp,image/apng,image/*,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Sec-Fetch-Dest: image
Sec-Fetch-Mode: no-cors
Sec-Fetch-Site: cross-site
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36

So what happens is that on line 25, it checks for two things:

  1. There’s actually an Accept header present
  2. Said header contains image/webp in it somewhere (via RegEx)

If neither, exit immediately with null. Otherwise, it replaces the .png with .webp, and returns a 302 Found to serveAsset, which will spit this right back out. If your browser can handle WebP images and didn’t request one, it’ll be redirected to it.

After all this is done, we return to serveAsset

serveAsset

 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
async function serveAsset(event) {
    const request = event.request
    const reqHead = request.headers
    const url = new URL(request.url)
    const fields = url.searchParams
    let item = fields.get('fetch')
    const attach = fields.get('attach') == null ? 'inline' : 'attachment' // 'attachment' if it exists, 'inline' otherwise

    // Preconditions
    // fetch param actually specified
    if (item === null) {
        return new Response(null, { status: 400 })
    }

    // If requesting a PNG and have the 'upgrade' param, check upgrade capability
    const itemTypeRaw = item.split('.')
    const itemType = itemTypeRaw[itemTypeRaw.length - 1] // Everything from the final . after
    if (itemType.toLowerCase() == 'png' && fields.get('noupgrade') === null) {
        const response = checkWebPSupport(url, item, request)
        if (response) {
            return response
        }
    }

    // Do fetch
    const fetchHeaders = new Headers({
        'User-Agent': 'Cloudflare worker CDN-Serve/1.0.0',
    })
    if (reqHead.get('If-None-Match') !== null) {
        fetchHeaders.set('If-None-Match', reqHead.get('If-None-Match'))
    }
    if (reqHead.get('If-Modified-Since') !== null) {
        fetchHeaders.set('If-Modified-Since', reqHead.get('If-Modified-Since'))
    }
    if (reqHead.get('Range') !== null) {
        fetchHeaders.set('Range', reqHead.get('Range'))
    }

    const response = await fetch(
        `https://${BUCKET_NAME}.s3.${BUCKET_REGION}.amazonaws.com/${item}`,
        { headers: fetchHeaders },
    )

    // S3 returns a 403 Forbidden if not found, transform that into a 404 Not FOund
    if (response.status == 403) {
        return new Response(null, { status: 404 })
    }

    const respHead = response.headers
    const type = respHead.get('Content-Type') || 'application/octet-stream' // 'octet-stream' is a good default for unknown content
    const itemNameP = item.split('/') // Split item name on path segments (extract raw name later)

    // Set headers
    const headers = new Headers({
        'Cache-Control': `public, max-age=604800`,
        'Content-Type': type,
        'Content-Disposition': `${attach}; filename="${
            itemNameP[itemNameP.length - 1]
        }"`, // Just the file name, not any path before it
        ETag: respHead.get('ETag'),
        'Last-Modified': respHead.get('Last-Modified'),
        'CF-Cache-Status': respHead.get('CF-Cache-Status')
            ? respHead.get('CF-Cache-Status')
            : 'UNKNOWN',
        Date: respHead.get('Date'),
        'Accept-Ranges': 'bytes',
    })
    if (respHead.get('Age') !== null) {
        headers.set('Age', respHead.get('Age'))
    }
    if (response.status == 206) {
        headers.set('Content-Range', respHead.get('Content-Range'))
    }

    return new Response(response.body, { ...response, headers })
}

This is kinda the main entry point of the program, not counting the Cloudflare boilerplate.

A lot of variables are set up immediately, extracting data from the URL, parsing bits out, the works. Only the fetch param (which is assigned to item) is actually required, and explicitly checked for absence, which will return a 400 Bad Request if it doesn’t exist.

Next, if you’re requesting a PNG, and there is not a noupgrade parameter in the URL, call checkWebPSupport as discussed above. It may return a new Request, if it does, return that immediately, it’s a redirect.

After this is the actual resource fetch. Before fetching, we set the headers used. Besides setting a custom User-Agent so it shows up better in my logs, your browser’s If-None-Match, If-Modified-Since, and Range headers are passed through. The first two allow for conditional requesting, where S3 can return a 304 Not Modified if you already have the correct resource, saving bandwidth (and transfer costs for me!) where it’s not needed. The last one, Range, allows for requesting only parts of a file, say, to resume a long download.

Afterwards, fetch the result from S3 (lines 76-79). If S3 returned a 403 Forbidden, convert that to a 404 Not Found. S3 returns 403 if the object (file) you requested does not exist, and 404 if the bucket you requested does not exist, weird. After that, set the Content-Type header to the one S3 returned (or application/octet-stream as a fallback, which is basically a “no format” response). Responses are given a cache lifetime of one week, as they are unlikely to change. A few important headers are set here, like the Content-Disposition (automatically download or not, and filename to save as), and the ETag, Date, and Last-Modified headers, which can be used to save on fetches by allowing a client to make requests with a If-None-Match or If-Modified-Since,as discussed above. The ETag is an opaque (meaning that, for you, it’s just a label, you don’t know why it is what it is) header that identifies the resource. If two ETags are identical, the resources are identical. So, by requesting a certain ETag, we can be a bit more precise than relying on modification date. (ETags are being simplified here just a little).

After this, if the response was a 206 Partial Content, make sure to pass through the Content-Range header so the browser knows what the actual requested range was.

Finally, return the result.

The end result of all this processing? Cloudflare’s cache is essentially a CDN now, backed by storage from Amazon S3, all because of 1 KiB (!) of server-side code (after packing and WebAssembly).

Any calls to fetch() are cached in Cloudflare’s network cache, meaning I don’t even have to worry about that myself, it takes care of it for me.

Issues

Security

The only real “security” issue here is that (part of) the value of the fetch param appears in the output Content-Disposition header. In theory you could do some damage if there was a way to send newlines into that, but neither Cloudflare nor Amazon are particularly welcoming to trying to access files with newlines in them, and you’ll get errors down long before anything malicious could happen.

And even then, let’s say that you do find a way… congratulations, you can now inject code into… yourself? That data never goes anywhere near the rest of this, and pretty much gets bounced right back out to you, or anyone else who clicks a link. Meaning you’d have a malicious link to spread… and like we haven’t seen those before. Yes, I try not to add to the list of exploitable sites, but right now I’m going to take the slightly lazy way, and say that unless something changes, errors will happen because a filename contains a CRLF, and that’s the protection that takes.. exactly no effort on my part.

Addendum

Fun fact, here’s what the Cloudflare Workers dashboard reports this worker’s code is:

!function(e){var t={};function n(r){if(t[r])return t[r].exports;var o=t[r]={i:r,l:!1,exports:{}};return e[r].call(o.exports,o,o.exports,n),o.l=!0,o.exports}n.m=e,n.c=t,n.d=function(e,t,r){n.o(e,t)||Object.defineProperty(e,t,{enumerable:!0,get:r})},n.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},n.t=function(e,t){if(1&t&&(e=n(e)),8&t)return e;if(4&t&&"object"==typeof e&&e&&e.__esModule)return e;var r=Object.create(null);if(n.r(r),Object.defineProperty(r,"default",{enumerable:!0,value:e}),2&t&&"string"!=typeof e)for(var o in e)n.d(r,o,function(t){return e[t]}.bind(null,o));return r},n.n=function(e){var t=e&&e.__esModule?function(){return e.default}:function(){return e};return n.d(t,"a",t),t},n.o=function(e,t){return Object.prototype.hasOwnProperty.call(e,t)},n.p="",n(n.s=0)}([function(e,t){addEventListener("fetch",e=>{e.respondWith(async function(e){const t=e.request;if("GET"===t.method||"HEAD"===t.method){let t=await async function(e){const t=e.request,n=t.headers,r=new URL(t.url),o=r.searchParams;let s=o.get("fetch");const a=null==o.get("attach")?"inline":"attachment";if(null===s)return new Response(null,{status:400});const u=s.split(".");if("png"==u[u.length-1].toLowerCase()&&null===o.get("noupgrade")){const e=function(e,t,n){const r=String(e),o=n.headers.get("Accept");if(null!==o&&null!==/image\/webp/.exec(o)){let e=r.replace(/\.png$/,".webp");return new Response(null,{status:302,headers:{Location:e}})}return null}(r,0,t);if(e)return e}const l=new Headers({"User-Agent":"Cloudflare worker CDN-Serve/1.0.0"});null!==n.get("If-None-Match")&&l.set("If-None-Match",n.get("If-None-Match"));null!==n.get("If-Modified-Since")&&l.set("If-Modified-Since",n.get("If-Modified-Since"));null!==n.get("Range")&&l.set("Range",n.get("Range"));const i=await fetch(`https://${BUCKET_NAME}.s3.${BUCKET_REGION}.amazonaws.com/${s}`,{headers:l});if(403==i.status)return new Response(null,{status:404});const c=i.headers,f=c.get("Content-Type")||"application/octet-stream",g=s.split("/"),d=new Headers({"Cache-Control":"public, max-age=604800","Content-Type":f,"Content-Disposition":`${a}; filename="${g[g.length-1]}"`,ETag:c.get("ETag"),"Last-Modified":c.get("Last-Modified"),"CF-Cache-Status":c.get("CF-Cache-Status")?c.get("CF-Cache-Status"):"UNKNOWN",Date:c.get("Date"),"Accept-Ranges":"bytes"});null!==c.get("Age")&&d.set("Age",c.get("Age"));206==i.status&&d.set("Content-Range",c.get("Content-Range"));return new Response(i.body,{...i,headers:d})}(e);return t.status>399&&(t=new Response("Error fetching resource from CDN: "+t.statusText,{status:t.status})),t}return new Response(null,{status:405})}(e))})}]);

…Let’s beautify that:

! function(e) {
    var t = {};

    function n(r) {
        if (t[r]) return t[r].exports;
        var o = t[r] = {
            i: r,
            l: !1,
            exports: {}
        };
        return e[r].call(o.exports, o, o.exports, n), o.l = !0, o.exports
    }
    n.m = e, n.c = t, n.d = function(e, t, r) {
        n.o(e, t) || Object.defineProperty(e, t, {
            enumerable: !0,
            get: r
        })
    }, n.r = function(e) {
        "undefined" != typeof Symbol && Symbol.toStringTag && Object.defineProperty(e, Symbol.toStringTag, {
            value: "Module"
        }), Object.defineProperty(e, "__esModule", {
            value: !0
        })
    }, n.t = function(e, t) {
        if (1 & t && (e = n(e)), 8 & t) return e;
        if (4 & t && "object" == typeof e && e && e.__esModule) return e;
        var r = Object.create(null);
        if (n.r(r), Object.defineProperty(r, "default", {
                enumerable: !0,
                value: e
            }), 2 & t && "string" != typeof e)
            for (var o in e) n.d(r, o, function(t) {
                return e[t]
            }.bind(null, o));
        return r
    }, n.n = function(e) {
        var t = e && e.__esModule ? function() {
            return e.default
        } : function() {
            return e
        };
        return n.d(t, "a", t), t
    }, n.o = function(e, t) {
        return Object.prototype.hasOwnProperty.call(e, t)
    }, n.p = "", n(n.s = 0)
}([function(e, t) {
    addEventListener("fetch", e => {
        e.respondWith(async function(e) {
            const t = e.request;
            if ("GET" === t.method || "HEAD" === t.method) {
                let t = await async function(e) {
                    const t = e.request,
                        n = t.headers,
                        r = new URL(t.url),
                        o = r.searchParams;
                    let s = o.get("fetch");
                    const a = null == o.get("attach") ? "inline" : "attachment";
                    if (null === s) return new Response(null, {
                        status: 400
                    });
                    const u = s.split(".");
                    if ("png" == u[u.length - 1].toLowerCase() && null === o.get("noupgrade")) {
                        const e = function(e, t, n) {
                            const r = String(e),
                                o = n.headers.get("Accept");
                            if (null !== o && null !== /image\/webp/.exec(o)) {
                                let e = r.replace(/\.png$/, ".webp");
                                return new Response(null, {
                                    status: 302,
                                    headers: {
                                        Location: e
                                    }
                                })
                            }
                            return null
                        }(r, 0, t);
                        if (e) return e
                    }
                    const l = new Headers({
                        "User-Agent": "Cloudflare worker CDN-Serve/1.0.0"
                    });
                    null !== n.get("If-None-Match") && l.set("If-None-Match", n.get("If-None-Match"));
                    null !== n.get("If-Modified-Since") && l.set("If-Modified-Since", n.get("If-Modified-Since"));
                    null !== n.get("Range") && l.set("Range", n.get("Range"));
                    const i = await fetch(`https://${BUCKET_NAME}.s3.${BUCKET_REGION}.amazonaws.com/${s}`, {
                        headers: l
                    });
                    if (403 == i.status) return new Response(null, {
                        status: 404
                    });
                    const c = i.headers,
                        f = c.get("Content-Type") || "application/octet-stream",
                        g = s.split("/"),
                        d = new Headers({
                            "Cache-Control": "public, max-age=604800",
                            "Content-Type": f,
                            "Content-Disposition": `${a}; filename="${g[g.length-1]}"`,
                            ETag: c.get("ETag"),
                            "Last-Modified": c.get("Last-Modified"),
                            "CF-Cache-Status": c.get("CF-Cache-Status") ? c.get("CF-Cache-Status") : "UNKNOWN",
                            Date: c.get("Date"),
                            "Accept-Ranges": "bytes"
                        });
                    null !== c.get("Age") && d.set("Age", c.get("Age"));
                    206 == i.status && d.set("Content-Range", c.get("Content-Range"));
                    return new Response(i.body, {...i,
                        headers: d
                    })
                }(e);
                return t.status > 399 && (t = new Response("Error fetching resource from CDN: " + t.statusText, {
                    status: t.status
                })), t
            }
            return new Response(null, {
                status: 405
            })
        }(e))
    })
}]);

I mean yeah, it’s definitely smaller and minified.. though deciphering the exact logic used is left as an exercise to the reader.


  1. Editor’s note: Since writing this post, this isn’t true! Workers K/V, the part that made this possible, is now accessible on the free plan, too! ↩︎