Improving My Site With Cloudflare Workers and Amazon S3
Table of Contents
So as of now, anything much more over 1 MB I’m not going to take up space on all my devices hosting — I’ll just offload it to someone else. And how do I get it back? JavaScript.
No wait, I’m actually serious. As of now, I essentially have Cloudflare as a CDN that’s backed by the Amazon S3 storage service. Posts such as the one about light balance have everything but the featured image… not here anymore.
So, let’s start with the basics: what are Amazon S3 and Cloudflare Workers?
Amazon S3
One of the many different services offered in… Amazon Web Services (AWS), S3 (Simple Storage Service) is an object storage system (so, files).
You as a user can create “buckets” that you stuff your stuff into.
If I say, for example, had a file named important.zip
, in a bucket named test
, in the Ohio data center, I could access it with the URL https://test.s3.us-east-2.amazonaws.com/important.zip
.
Note that there’s no mention of your account in there, all bucket names have to be globally unique, which is why the one I use has a unique prefix.
BIG IMPORTANT NOTE: DO NOT PUT PERIODS IN YOUR BUCKET NAME. data-teknikaldomain
is valid, but data.teknikaldomain
will cause every fetch to have a TLS handshake error: invalid certificate.
The Amazon S3 TLS certificate for, let’s say, the us-east-2 region
, is *.s3.us-east-2.amazonaws.com
. That wildcard does not count for multiple subdomain levels, meaning that the TLS certificate for your bucket will not match the name of your bucket. This is why CloudFront won’t take buckets with dots in their name.
So this is the storage part, I upload files, grant them public read access, and off we go. It only costs around 2.3¢ per GB per month for the simple level, which isn’t that bad. But now that I can store stuff elsewhere, I need to actually get it. That’s where the workers come in.
Cloudflare Workers
Cloudflare offers a service for free called Workers, which allows you to run JavaScript (or WebAssembly) at their edge, meaning before requests start getting passed through the Cloudflare network. At the free tier, I can have 100,000 worker invocations (so, web requests) per day, and they have a limit of 10ms CPU (not actual) time before being aborted.
Workers can be used, on the paid plan, to, say, serve an entire static content site (like this one!) using nothing but Workers. A completely serverless infrastructure.1 Or, in my case, you can just.. grab data.
Workers are given a route, which is basically a really restricted RegEx for a URL, and if the URL matches, then the worker is executed, and it handles the request instead of going to the origin server. Workers can perform their own requests, synthesize new responses out of thin air, manipulate Cloudflare’s cache, and.. just do a ton of stuff.
Workers can be written in raw JavaScript, JavaScript that gets packed into WebAssembly on upload, or Rust, that gets packed into WebAssembly on upload. You can even take advantage of your favorite Node modules, meaning all sorts of cool stuff.
So, workers can intercept a request, pretty much at the earliest point, and handle it. So how do I marry these two systems?
The Result
You’ll notice now that any file not stored locally now has a URL like this: https://teknikaldomain.me/cdn?fetch=image.png
.
They can have a few query parameters, but the most important one is fetch
which is the path to the file in the S3 bucket.
Here’s the code that gets run the moment you request that, aptly named cdn-serve
:
|
|
The first bit of this code, the addEventListener
, is complete boilerplate.
The rest though.. let’s take it piece by piece.
handleRequest
|
|
There’s not too much logic here before it’s handed off to serveAsset
, but it does check one thing before and one thing after.
Before going to serveAsset
, it makes sure that only GET
and HEAD
requests are allowed, and the rest will return a 405 Method Not Allowed
.
After fetching, if the response is an error (either 4xx or 5xx), it makes sure to pass through the status, and respond with an error message.
In the future, I might do something like request a special error page and load that in instead, but that’s beyond the scope of this right now.
checkWebPSupport
|
|
This one is “simple” in its purpose, but its implementation is a little weird.
When your browser requests a resource, it will send over an Accept
header to the server to announce what content type it is willing to handle in return.
You can see it in this sample request:
GET https://teknikaldomain.me/cdn?fetch=stickers/td-mdp.png HTTP/1.1
Host: teknikaldomain.me
Accept: image/webp,image/apng,image/*,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Sec-Fetch-Dest: image
Sec-Fetch-Mode: no-cors
Sec-Fetch-Site: cross-site
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36
So what happens is that on line 25, it checks for two things:
- There’s actually an
Accept
header present - Said header contains
image/webp
in it somewhere (via RegEx)
If neither, exit immediately with null
.
Otherwise, it replaces the .png
with .webp
, and returns a 302 Found
to serveAsset
, which will spit this right back out.
If your browser can handle WebP images and didn’t request one, it’ll be redirected to it.
After all this is done, we return to serveAsset
serveAsset
|
|
This is kinda the main entry point of the program, not counting the Cloudflare boilerplate.
A lot of variables are set up immediately, extracting data from the URL, parsing bits out, the works.
Only the fetch
param (which is assigned to item
) is actually required, and explicitly checked for absence, which will return a 400 Bad Request
if it doesn’t exist.
Next, if you’re requesting a PNG, and there is not a noupgrade
parameter in the URL, call checkWebPSupport
as discussed above.
It may return a new Request
, if it does, return that immediately, it’s a redirect.
After this is the actual resource fetch.
Before fetching, we set the headers used.
Besides setting a custom User-Agent
so it shows up better in my logs, your browser’s If-None-Match
, If-Modified-Since
, and Range
headers are passed through.
The first two allow for conditional requesting, where S3 can return a 304 Not Modified
if you already have the correct resource, saving bandwidth (and transfer costs for me!) where it’s not needed.
The last one, Range
, allows for requesting only parts of a file, say, to resume a long download.
Afterwards, fetch the result from S3 (lines 76-79).
If S3 returned a 403 Forbidden
, convert that to a 404 Not Found
.
S3 returns 403
if the object (file) you requested does not exist, and 404
if the bucket you requested does not exist, weird.
After that, set the Content-Type
header to the one S3 returned (or application/octet-stream
as a fallback, which is basically a “no format” response).
Responses are given a cache lifetime of one week, as they are unlikely to change.
A few important headers are set here, like the Content-Disposition
(automatically download or not, and filename to save as), and the ETag
, Date
, and Last-Modified
headers, which can be used to save on fetches by allowing a client to make requests with a If-None-Match
or If-Modified-Since
,as discussed above.
The ETag
is an opaque (meaning that, for you, it’s just a label, you don’t know why it is what it is) header that identifies the resource.
If two ETag
s are identical, the resources are identical.
So, by requesting a certain ETag
, we can be a bit more precise than relying on modification date. (ETag
s are being simplified here just a little).
After this, if the response was a 206 Partial Content
, make sure to pass through the Content-Range
header so the browser knows what the actual requested range was.
Finally, return the result.
The end result of all this processing? Cloudflare’s cache is essentially a CDN now, backed by storage from Amazon S3, all because of 1 KiB (!) of server-side code (after packing and WebAssembly).
Any calls to fetch()
are cached in Cloudflare’s network cache, meaning I don’t even have to worry about that myself, it takes care of it for me.
Issues
Security
The only real “security” issue here is that (part of) the value of the fetch
param appears in the output Content-Disposition
header.
In theory you could do some damage if there was a way to send newlines into that, but neither Cloudflare nor Amazon are particularly welcoming to trying to access files with newlines in them, and you’ll get errors down long before anything malicious could happen.
And even then, let’s say that you do find a way… congratulations, you can now inject code into… yourself? That data never goes anywhere near the rest of this, and pretty much gets bounced right back out to you, or anyone else who clicks a link. Meaning you’d have a malicious link to spread… and like we haven’t seen those before. Yes, I try not to add to the list of exploitable sites, but right now I’m going to take the slightly lazy way, and say that unless something changes, errors will happen because a filename contains a CRLF, and that’s the protection that takes.. exactly no effort on my part.
Addendum
Fun fact, here’s what the Cloudflare Workers dashboard reports this worker’s code is:
!function(e){var t={};function n(r){if(t[r])return t[r].exports;var o=t[r]={i:r,l:!1,exports:{}};return e[r].call(o.exports,o,o.exports,n),o.l=!0,o.exports}n.m=e,n.c=t,n.d=function(e,t,r){n.o(e,t)||Object.defineProperty(e,t,{enumerable:!0,get:r})},n.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},n.t=function(e,t){if(1&t&&(e=n(e)),8&t)return e;if(4&t&&"object"==typeof e&&e&&e.__esModule)return e;var r=Object.create(null);if(n.r(r),Object.defineProperty(r,"default",{enumerable:!0,value:e}),2&t&&"string"!=typeof e)for(var o in e)n.d(r,o,function(t){return e[t]}.bind(null,o));return r},n.n=function(e){var t=e&&e.__esModule?function(){return e.default}:function(){return e};return n.d(t,"a",t),t},n.o=function(e,t){return Object.prototype.hasOwnProperty.call(e,t)},n.p="",n(n.s=0)}([function(e,t){addEventListener("fetch",e=>{e.respondWith(async function(e){const t=e.request;if("GET"===t.method||"HEAD"===t.method){let t=await async function(e){const t=e.request,n=t.headers,r=new URL(t.url),o=r.searchParams;let s=o.get("fetch");const a=null==o.get("attach")?"inline":"attachment";if(null===s)return new Response(null,{status:400});const u=s.split(".");if("png"==u[u.length-1].toLowerCase()&&null===o.get("noupgrade")){const e=function(e,t,n){const r=String(e),o=n.headers.get("Accept");if(null!==o&&null!==/image\/webp/.exec(o)){let e=r.replace(/\.png$/,".webp");return new Response(null,{status:302,headers:{Location:e}})}return null}(r,0,t);if(e)return e}const l=new Headers({"User-Agent":"Cloudflare worker CDN-Serve/1.0.0"});null!==n.get("If-None-Match")&&l.set("If-None-Match",n.get("If-None-Match"));null!==n.get("If-Modified-Since")&&l.set("If-Modified-Since",n.get("If-Modified-Since"));null!==n.get("Range")&&l.set("Range",n.get("Range"));const i=await fetch(`https://${BUCKET_NAME}.s3.${BUCKET_REGION}.amazonaws.com/${s}`,{headers:l});if(403==i.status)return new Response(null,{status:404});const c=i.headers,f=c.get("Content-Type")||"application/octet-stream",g=s.split("/"),d=new Headers({"Cache-Control":"public, max-age=604800","Content-Type":f,"Content-Disposition":`${a}; filename="${g[g.length-1]}"`,ETag:c.get("ETag"),"Last-Modified":c.get("Last-Modified"),"CF-Cache-Status":c.get("CF-Cache-Status")?c.get("CF-Cache-Status"):"UNKNOWN",Date:c.get("Date"),"Accept-Ranges":"bytes"});null!==c.get("Age")&&d.set("Age",c.get("Age"));206==i.status&&d.set("Content-Range",c.get("Content-Range"));return new Response(i.body,{...i,headers:d})}(e);return t.status>399&&(t=new Response("Error fetching resource from CDN: "+t.statusText,{status:t.status})),t}return new Response(null,{status:405})}(e))})}]);
…Let’s beautify that:
! function(e) {
var t = {};
function n(r) {
if (t[r]) return t[r].exports;
var o = t[r] = {
i: r,
l: !1,
exports: {}
};
return e[r].call(o.exports, o, o.exports, n), o.l = !0, o.exports
}
n.m = e, n.c = t, n.d = function(e, t, r) {
n.o(e, t) || Object.defineProperty(e, t, {
enumerable: !0,
get: r
})
}, n.r = function(e) {
"undefined" != typeof Symbol && Symbol.toStringTag && Object.defineProperty(e, Symbol.toStringTag, {
value: "Module"
}), Object.defineProperty(e, "__esModule", {
value: !0
})
}, n.t = function(e, t) {
if (1 & t && (e = n(e)), 8 & t) return e;
if (4 & t && "object" == typeof e && e && e.__esModule) return e;
var r = Object.create(null);
if (n.r(r), Object.defineProperty(r, "default", {
enumerable: !0,
value: e
}), 2 & t && "string" != typeof e)
for (var o in e) n.d(r, o, function(t) {
return e[t]
}.bind(null, o));
return r
}, n.n = function(e) {
var t = e && e.__esModule ? function() {
return e.default
} : function() {
return e
};
return n.d(t, "a", t), t
}, n.o = function(e, t) {
return Object.prototype.hasOwnProperty.call(e, t)
}, n.p = "", n(n.s = 0)
}([function(e, t) {
addEventListener("fetch", e => {
e.respondWith(async function(e) {
const t = e.request;
if ("GET" === t.method || "HEAD" === t.method) {
let t = await async function(e) {
const t = e.request,
n = t.headers,
r = new URL(t.url),
o = r.searchParams;
let s = o.get("fetch");
const a = null == o.get("attach") ? "inline" : "attachment";
if (null === s) return new Response(null, {
status: 400
});
const u = s.split(".");
if ("png" == u[u.length - 1].toLowerCase() && null === o.get("noupgrade")) {
const e = function(e, t, n) {
const r = String(e),
o = n.headers.get("Accept");
if (null !== o && null !== /image\/webp/.exec(o)) {
let e = r.replace(/\.png$/, ".webp");
return new Response(null, {
status: 302,
headers: {
Location: e
}
})
}
return null
}(r, 0, t);
if (e) return e
}
const l = new Headers({
"User-Agent": "Cloudflare worker CDN-Serve/1.0.0"
});
null !== n.get("If-None-Match") && l.set("If-None-Match", n.get("If-None-Match"));
null !== n.get("If-Modified-Since") && l.set("If-Modified-Since", n.get("If-Modified-Since"));
null !== n.get("Range") && l.set("Range", n.get("Range"));
const i = await fetch(`https://${BUCKET_NAME}.s3.${BUCKET_REGION}.amazonaws.com/${s}`, {
headers: l
});
if (403 == i.status) return new Response(null, {
status: 404
});
const c = i.headers,
f = c.get("Content-Type") || "application/octet-stream",
g = s.split("/"),
d = new Headers({
"Cache-Control": "public, max-age=604800",
"Content-Type": f,
"Content-Disposition": `${a}; filename="${g[g.length-1]}"`,
ETag: c.get("ETag"),
"Last-Modified": c.get("Last-Modified"),
"CF-Cache-Status": c.get("CF-Cache-Status") ? c.get("CF-Cache-Status") : "UNKNOWN",
Date: c.get("Date"),
"Accept-Ranges": "bytes"
});
null !== c.get("Age") && d.set("Age", c.get("Age"));
206 == i.status && d.set("Content-Range", c.get("Content-Range"));
return new Response(i.body, {...i,
headers: d
})
}(e);
return t.status > 399 && (t = new Response("Error fetching resource from CDN: " + t.statusText, {
status: t.status
})), t
}
return new Response(null, {
status: 405
})
}(e))
})
}]);
I mean yeah, it’s definitely smaller and minified.. though deciphering the exact logic used is left as an exercise to the reader.
-
Editor’s note: Since writing this post, this isn’t true! Workers K/V, the part that made this possible, is now accessible on the free plan, too! ↩︎