Teknikal's_Domain

#<NTA:NnT:SSrgS:H6.6-198:W200-90.72:CBWg>

TDNET 2.0: the New Homelab (Part 2)

2021-02-11 25 min read Unable to load comment count
Part 2 of 2

Now, this is the second part of a two-part post, that one covered the tech and background, and this will be the tour. So, let’s begin, running this one front-to-back.

Physical Hosts

These are all real-metal machines living in my room (for the time being).

Entry: gateway (pfSense)

My main firewall / router here is a pfSense install running on a little mini-ITX case with an extra NIC expansion card, giving it two total. This thing isn’t the most powerful (it’s definitely not the least), but it’s plenty enough for my current needs. pfSense handles the actual connection routing, NAT, and firewalls my systems from the rest of the internet. Most of my NAT rules are made of aliases and tables, making it easy to extend them later if I expand my service set.

Also, for internal certificates, pfSense handles that, meaning my internal S/MIME plyground1, Zabbix certs (more on that later), and other similar things, are all managed here.

This is also what handles all my VPNs, which at the moment, there’s three configured: OpenVPN, which is what I use for almost everything, IPSec with a mobile client config, since my phone natively supports IPSec VPNs, and a Wireguard instance, that’s not all that often used, but is there more for testing.

Also, funnily enough, one of the front-panel USB ports is being used to power my Pi.

DNS and other tiny features: singularity

A tiny little Raspberry Pi 3B+ plugged into the front of gateway (PoE hat coming soon, maybe), who’s main job, as you might guess from the hardware and hostname, is to run Pi-Hole. I’ve also played around with offloading some other things onto here that are relatively light-weight, like the Borg endpoints, or the Salt master.

Auxiliary storage: filecube (ReadyNAS)

Yeah, that little brick I had to rip a button off the PCB to get working. Still got 12 TB of storage, still exporting everything over NFS, still IP-locked, but it still has better performance in this network than SMB does. For those unaware of how this NAS fits into the system, it’s my bulk data storage. Movies, Syncthing data, Nextcloud, and a few other pieces (like Proxmox VM/CT backups) are all stored on this. Pretty simple, really.

There’s an external USB 3.0 hard drive that all my shares back up to weekly (minus the media share, I don’t care if I lose all my movies), and then I run occasional jobs that push all the data over to the AWS Storage Gateway over, yes, you guessed it, NFS, which will upload it all to Amazon S3 and transition into Glacier tier storage. That external drive exists as my “grab and go” box, I can rebuild the entire network from there if I have the data, given… comparable hardware.

Main Hypervisor: centproc / centcom (Dell R710 / Proxmox VE)

Literally everything else happens in here, really. As stated before I’m on a container architecture now, the only true virtual machine in this network is an AWS storage gateway (which we’ll get to later). It’s also got an iDRAC card, allowing for some remote management (remote console is completely broken though), and monitoring (via IPMI). Somehow not the noisiest piece of equipment (that award goes to power), but definitely second noisiest. Just around 120 GB of ram, and on the average day, I’m using one third of that, on average. The unit itself has 8 TB of usable RAID 5 space for containers, which I do really need to move to SSDs — the IOWait times on that array can often spike into literal minutes during heavy load, “heavy” meaning “I joined a Matrix chatroom.”

And just for fun (not really), every week it does a full container backup to filecube, meaning that every week, every single container I have is capable of being instantly restored to its previous state at the push of a button (and what’s effectively a container reboot).

For the curious, centproc (Central processor) is the actual hypervisor’s hostname, and centcom (Central Command) is the name given to the iDRAC’s IP, which itself is a name lifted from SS13 .

No brownouts, no downtime: power

Aptly named, I know. This is the Eaton UPS sitting right next to centproc, and is what everything is connected to. Not all devices are connected to battery sockets, some only get surge protection, but everything is, in one form or another, powered through here. I also have a fair bit of management available through a web management card, which I might write about sometime soon, we will see. Because there’s a pretty good story with that tiny little thing.

Too many cables! nethub

And you thought that filecube was the only piece of Netgear equipment that I had? Not exactly. The true reason for all the spaghetti wiring around here is this Netgear-brand 16 port, gigabit, managed, PoE capable, and not-half-bad switch, which I have talked about before, along with that battery.

Virtual Hosts

None of these are actual hardware, all of them minus one are an LXC container, running on centptoc.

Core hosts

COnsider these hosts the “critical path,” either they’re providing services, or they’re infrastructure to provide said services. In other words: these can’t be played with completely willy-nilly, because taking one of these down would have some noticeable affects.

GitLab: gitlab

This one likely doesn’t need much explanation. This is where, well… yeah I can’t exactly remember what software package lives on here, otherwise this one isn’t too special.

Okay, there is one thing that’s special: I’m using Amazon S3 as the backing store for Git LFS objects, but besides that, yeah, nothing special.

General Version Control Services: gen

Gen is the only one on this network that allows shell access onto the machine, in an extremely sandboxed manner (read: Docker), but this allows access to the rest of the VCS services that I’m running, CVS, Subversion, BitKeeper, darcs, GNU Bazaar, Monotone, Mercurial, and Fossil. Unfortunately not all of them are really meant to be hosted the way that I’m running them, meaning that I am really thankful that all of them support connections over SSH, one way or another.

I also have Apache set up providing web interfaces for CVS, Subversion, darcs, and Mercurial, BitKeeper and Fossil both provide a a web interface by themselves, and, if your user agent and host is right, Apache will allow read-only HTTP access if you give it permission to.

And about that shell access, Not only is that controlled by LDAP, but it’s also possible to upload your SSH key into your LDAP object, which it will indeed pick up on. I also, for extra fun, have 2FA enabled, if you decide to generate an HOTP or TOTP key with the google-authenticator command, any HOTP / TOTP app can grab the key and be usable as 2FA. It can even generate QR codes on the CLI to scan!

Self-hosted mail: mail

If I said Postfix and Dovecot, would you be surprised? Postfix is set up with a load of anti-spam and security measures on it, and Dovecot has Sieve ready for use. Both of them will reference LDAP for your user accounts, as well as email addresses you’re authorized to send from, and what email addresses are addressed to you.

Some of those security and anti-spam measures include the holy triplet of email, SPF, DKIM, and DMARC, with outgoing reporting all configured, MTA-STS and TLSRPT are configured on my domain, reporting and checking outgoing MTA-STS are still WIP. But hey, I have DANE checking, and valid DANE records, with a rollover mechanism in place.

Additionally, SpamAssassin backed by Redis is handling spam filtering, with a pretty heavily trained Bayes filter, as well as the “Junk” button in the webmailer (Roundcube) will automatically classify said message as spam, automatically.

Speaking of, I’m using Roundcube as the webmailer with a pretty okay plugin set, complete with PGP support, Thunderbird labels, showing DKIM status on messages, all sorts of fun stuff.

Honestly, say what you like, but I will claim that I’m more secure and less accepting of spam than Gmail, and I’ll stick by that.

Integrated IRC server: irc

Pretty simple and pretty dormant, actually. UnrealIRCd with a tiny bit of custom config, as well as Anope services, all under the KiwiIRC WebIRC interface, assuming you don’t want to use your own IRC client.

Again, a rather simple machine. And, The last four here, gitlab, gen, mail, and irc, are the original four that I actually started with, fun fact.

HTTP routing, TLS-terminating network edge: border

HAproxy that handles like 80% of all network traffic, and 100% of all HTTP traffic. I have HAProxy running TLS termination, meaning this is the first stop once your traffic reaches my network (after going through gateway), and is the last one where it’s encrypted. Maybe a little insecure, but it makes things a lot easier to manage without having to figure out all the weird certificate deployments, and just have HAProxy handle the HTTPS side. This also means that this is the one machine that handles certificate generation and distribution (to mail for SMTP / IMAP and irc for… IRC), and part of that periodic script will also update the DANE records for the mail server.

Some traffic, like SMTP and SSH, just bypasses HAProxy completely, usually for a lack of understanding the PROXY protocol, and needing the client’s IP address. But since all HTTP traffic goes through here, it does a lot of Host header-based routing.

Matrix homeserver: matrix

This one runs Synapse, but I will transition over to Dendrite once it’s stable and good. It’s actually quite a few different resource limit changes, since Synapse is actually quite memory hungry, especially when joining even small to medium sized rooms. I’m running it off PostgreSQL, but unfortunately just due to the latency of the RAID array that it’s on, some things are just way too slow. And it seems that until we get something more performant (like Dendrite), or I can upgrade literally the entire array in here to SSDs, I’m stuck with my kinda slow homeserver.

Oh, and don’t worry if you don’t know what Matrix is. All I’ll say for right now is that it’s a chat protocol, my favorite chat protocol, but there’s a post coming up in a few days that talks about it.

This: blog

Honestly this is likely one of the simplest ones here, consisting of basically 4 programs:

• NGINX
• Git
• at
• Hugo

Git to pull content from the repository, Hugo to generate the actual HTML pages, NGINX to serve them, and then at to schedule posts. See, I tend to write these in batches, and Hugo provides three different date fields to play with:

• date, which affects the date displayed on the post, and its sort order
• publishDate, which is the date after which the post will be included in the generated content
• expiryDate, which is the date after which the post will be removed from the generated content again

So what I generally do, after writing one, is give it a publishDate of 8 AM (EST), the next day that there is no post. When I run my update script, it will look at all the posts with those dates (hugo list futures), and will schedule jobs to re-run Hugo at that time, using at. Meaning that once I’ve written a batch of 4 or 5 posts, I can just assign them sequential publishDates and then the server handles including them all automatically.

Giant headache — User account control: ldap

Amazingly, I got this working! I know the last time I talked about this, I said it was, well, a giant headache, but now I’ve gotten it working relatively well. Most public services that I have here, meaning stuff provided by gitlab, gen, mail, mediacenter, and files, are all controlled by LDAP. Even better, I have used groupOfNames objects to do a really cool trick: easily specifying which services a user has access to! It’s pretty simple, I have a group like, say, cn=mail,ou=services,ou=groups, and I can add or remove users to/from that group. When the mail server wants to check a user, it has the following search filter:2

(&(objectClass=inetOrgPerson)(memberOf=cn=mail,ou=services,ou=groups,dc=tdstoragebay,dc=com))


The first condition is that you have the inetOrgPerson class, which defines some required attributes like homeDirectory. The second is that you are a memberOf the named group there, which is an extension that the LDAP server supports. Thus, if you aren’t in the group, you literally do not exist to the service that’s looking.

After figuring that part out, that one thing, since I knew I was not the only person that would want to do this, the rest all fell into place. Now, I’ve got a fully working setup for this, and all I need to do is just set the appropriate search filters in any application that needs to use it (which for gen, using LDAP to control logins, was a right pain)

TV on my time: mediacenter

This one runs a few different services on it at once. The first is Jellyfin, which is an open source fork of Emby, which is a previously open source fork of Plex. See, some people didn’t like how Plex was getting pretty greedy, as well as their privacy policy basically saying “we reserve the right to share your watch habits of your own data, if not parts of your data,” so someone created a not-perfectly-compatible fork called Emby, which was open source, and people were happy. But then Emby got greedy, wanting you to pay for things like watching with any clients other than a web browser, and LDAP support, and also went closed-source, so someone created a not-perfectly-compatible fork called Jellyfin, which is open source, and people were happy. Until Jellyfin gets greedy…

Let’s not think of that right now. I find it funny the cycle that has been going on and, knowing humans, is likely to keep repeating, but Jellyfin is the current iteration of the recommended self-hosted media server. I can throw any movies, TV shows, even my music collection on the NAS share it’s using (in the correct folders), and it will grab all the metadata it can: posters, IMDb ratings, descriptions, cast and crew, similar shows that you have, etc. Even with the little container I’ve assigned it here, it’s still plenty capable of doing live transcoding of media as it’s streaming out to a client.

In addition to that, which is the side that needs a disclaimer, so, disclaimer: I do not support illegal activities. Illegal activities are… illegal, and can carry penalties. What I am about to describe is a setup that is popular but is not one that I am actively using at the moment.

There’s also the pretty holy trio of Sonarr, Radarr, and Transmission. Again I will re-iterate that None of these are being used right now, but for those curious:

• Sonarr is a TV show grabber. You give it some torrent indexers, a torrent client, and tell it “I want this show,” and it’ll scan the indexers, find the files, and send them to your torrent client.
• Radarr is the exact same thing, but for Movies, not TV shows
• Transmission is a pretty popular torrent client, and is capable of being run in a daemon mode, which is what, if I was using it, I would prefer be done.
• Bazarr, the fourth one that’s not well known, is the only piece of totally legally ok software on here, because Bazarr has integrations with a number of subtitle APIs, as well as Sonarr’s and Radarr’s API, meaning that it will automatically attempt to grab subtitles from open subtitle providers if none was found in the download, or the ones in the download weren’t a high enough match.

Sonarr and Radarr have some really cool features too, like, on the Sonarr side, toggling watching for individual seasons, or even individual episodes of a TV series, as well as telling you when new episodes air, which it will download episodes as they air if you let it. And, common to both of them, you can tell them what resolutions you want to download, and they will only consider those as candidates, and if they had to grab a pretty low one last time, will keep searching to see if they can, maybe, replace that download with a higher-quality one if it’s available.

And if there’s anyone else curious: All the media stored accessible to this server is currently media that I have acquired legally (usually through borrowing digital copies at the library).

Personal cloud: files

Nextcloud, all the way. Being LDAP controlled, I do give out accounts to some of my friends who use it, but it’s mainly there basically for me to just disregard Dropbox, Mega, and Google Drive. Automatic upload means that folders off my phone can be automatically synced (like pictures saved from Reddit), the Joplin plugin makes it really easy to sync and share my Joplin notes, though that’s not a requirement if you just want to synchronize, since it can do that using Nextcloud’s WebDAV connector. I also have tons of other things installed on here, which, not an exhaustive list, includes a Matrix client (Element) that points to matrix by default, some utilities for handling camera raw files, some GPX file handling, and, well, just a lot of “hey that might be useful” stuff. I also have some “external storages” plugged in, like the S3 bucket for this site’s CDN, making it a lot easier to manage than through the AWS console, or the aws CLI.

In addition, Syncthing is also running here (since the NAS is too weak to handle that itself), which is restricted to a different share than Nextcloud. Every device I have has a Syncthing instance running, doing some arbitrary things, like keeping one specific folder up to date across all devices, making sure that my main workstations can access my giant folder of camera raws, or, automatically uploading my phone’s camera roll to the NAS for safe keeping, and maybe later, importing into pics for organization if I like them enough.

And in addition to that, there’s one more service that’s offloaded here because the NAS can’t handle it: BorgBackup. This one isn’t because it’s too weak, it could easily handle that, but some of the requisite packages literally do not exist for a 32-bit ARM CPU. Meaning while it’s not literally too weak, the CPU itself is just not supported. And really, all that means is one login account available over SSH (pubkey authentication only!), and another NAS share with all the Borg data. Every week, a cron job in all my containers will start a series of Borg tasks, creating a few different backups in their respective repositories.

Thing that screams a lot: monitoring

The single highest source of incoming emails as reported by mail, this one has two main jobs: network monitoring, with Zabbix, and Syslog ingest, with Graylog. Zabbix is a very cool, free, almost unlimited monitoring system, that’s a little complex to set up but so, so beautiful when it’s working. In Zabbix, “hosts” (devices to monitor) are usually assigned a set of “templates.” These templates contain “items” (individual probes or data points, like “Current swap usage”), and then “triggers” (expressions for generating an alert) can be built off of those items. It’s a pretty hierarchical system (especially when you get into item discovery), but once you learn it and learn what goes where, it’s very powerful. Most of Zabbix’s monitoring comes from a Zabbix agent, a small service running in the background on any given host, either sending metrics to the Zabbix server, or waiting for a request from the Zabbix server to respond to. But in addition to this, say you’re a NAS with no builds, it can also monitor using SNMP (including v3), or JMX (Java Management eXtensions, you get 0 guesses as to what can be monitored with this), or IPMI (Basically, server stuff). I use the Zabbix agent wherever possible since its the most flexible, but I do have some others in use:

• centproc is monitored via IPMI from it’s iDRAC card, centcom, meaning I have constant readouts of power draw, fan speed, and overall system status. (I also have a Zabbix agent installed on Proxmox for OS-level monitoring too.)
• power and filecube both use SNMP for monitoring, and I’ve imported their custom MIBs, meaning I have all the fancy readouts.
• gen also has a JMX connector, because I do also have Apache Tomcat running, currently to support the PlantUML WAR that gitlab uses for in-document graph rendering, and I occasionally use for building graphs too.

Almost anything on the network here is going to be captured, graphed, and given alert triggers, meaning even when I’m away, the constant email feed keeps me up to date with what’s going on. One note on Zabbix though is that you have to do a little balancing with how you structure your Zabbix agent items. Normal items work by having the server contact the Agent, and request a particular item, which it will return. You can also specify an item is an “active” type where by the Agent periodically asks for an active configuration, and, on seeing these active items, will then send them to the server itself, without needing a request, thus pushing most of the processing load onto the agent’s device, not the server’s device.

And on the other side we have Graylog, which is a Syslog server that’s backed by the power of ElasticSearch. I’ve defined a number of Syslog inputs, one per host, so that I can segregate my extractors per host. I also have a few for the so-called “sidecar”, which in my case would be used for streaming over log file contents as if they were Syslog messages, allowing me to filter log events too. Realistically, just having all my logs in one place to search through is cool enough, but it gets better: I can define “extractors” that will capture a piece of the message and break it out (ahem, extract it) into its own separate field in the event, allowing for separate handling or processing. I can even have Graylog send me alerts when it sees certain messages. The average day sees hundreds of thousands of messages being taken in and processed with exactly no sweat by the ElasticSearch backend.

Would you like to take a guess as to what host generates the most log messages? Once you have made up your mind, you can read the answer here:3

“Rapid” prototyping area: docker

okay, everyone in this area has at least one thing, somewhere, that has Docker available for use. Besides being where my password manager resides,4 it’s also where I put containers where I’m testing them, usually things like OpenLDAP, or SonarQube, or any of those fun things. I’ve also used it for my play with other forms of database, like neo4j, MongoDB, CouchDB, and OrientDB. Realistically though this one does next to nothing most of the time, since I’m either spinning up a Docker container with the intent of not keeping it around for long, or, if I decide I will keep it around, that’s another container to keep it segregated from everything else.

GitLab job runners for pipelines: runner-1, runner-2

There’s two here, one of which runs jobs in a Docker environment, one does not. This means that if you have jobs that require docker, then the non-Docker one can take the job without needing some DinD shenanigans. Either way, any pipelines or just, general CI/CD jobs from gitlab are going to be sent here for dealing.

SKS PGP Keyserver: keys

I am currently, I kid you not, writing my own HKP -compatible keyserver, which takes a few cues from SKS, some from Hagrid, and some from my own experiences with some related topics, but put simply, it’s a server implementation written in Go that handles the HKP /pks/lookup endpoint, allowing for multiple keys per email / UID, key deletion, and cryptographic key verification, meaning for a key to be made publicly searchable, you are required to sign a message with said key to have it marked as legitimate and searchable.

But since that’s still in development, right now, I have it running the old SKS, too old to be modified, known issues (certificate spamming attack), and currently, synchronizing with nobody. Once I finish that development project though, I’ll be deploying it here, and once I’m comfortable with it, I’ll probably make that keyserver my first ever truly public project given to the world, nice.

Details of my insanity — Internal bookkeeping: docs

Tiny little BookStack instance that I’m using to document things like my LDAP and SNMP extensions, some internal configuration weirdness, and any pieces of behavior by programs that isn’t obvious enough that I feel it’s worth writing down. Ideally, I’d have stuff documented so well that literally anyone with an ounce of smarts would be able to, with that documentation, continue management of the entire network, but… well, since I ain’t handing off ownership to anyone else anytime soon, it exists just so I have consistent records of what I’ve done and what I know.

EVE online community — Central management: salt

Salt master, and the rest of the core hosts are all Salt minions that are looking to it. Every service, every config file, every package has been marked, by this host, as needing to be installed for said minion host. They’re not perfectly tied to it, I can still modify things minion-side, say, if I’m testing out some modifications to Postfix’s config, but realistically I’m making all changes on the master side, and then using Salt to apply the changes, and since all the configuration is stored in Git, I have the full power of Git to manage different attempts, ideas, and, yes, fails.

Here, AWS, hold this for me: aws-gateway

The only VM on the network, not a container, is an appliance that Amazon distributes in a manner that… took a little bit of fudging to make work right. But once I got it working right, I now have this, which, if you’re not aware, is kinda cool. The device here is an AWS Storage Gateway, which allows you to export one or more buckets, or paths under an S3 bucket, as either SMB or NFS shares to the rest of the network. I use it pretty much exclusively to have NFS-mountable shares for the NAS to back up to. It doesn’t tend to do this that frequently, since every backup adds to my monthly S3 costs, but I do have it as my off-site.

Auxiliary Hosts

All these are either things that I have that aren’t really related to anything, or are just testing systems / playgrounds for various experiments. They could crash, they could get corrupted, they could completely disappear of the face of the earth, and it wouldn’t be a serious issue for me.

Literally just cats: pics

This one runs Lychee. Lychee is one of a few different self-hosted photo managers out there a la Google Photos. Others like PhotoPrism do exist, but this one is the one I’m going with. I could very easily just add an extra use_backend with if { http.req(Host) -m beg pics } to HAProxy and make this public, and Lychee even has the ability to make photos public or private, meaning I’d only let the world see what I wanted them to see, or just make them unlisted, so I can share the URL, but it’s not visible from the main page. But for right now, just having some place, on here, to store all my photos, built for actual tagging and organization, is all I need.

A totally real Time Capsule: time-capsule

Post on this one coming too, but this was where I was testing getting a valid Time Machine backup destination working on Ubuntu 20.04. And it turns out, all you need are a few configuration options in Samba, an extra piece of software to handle indexing and searching, if you feel like it, and then Netatalk, which handles AFP, which is obsolete at this point (all modern macs recommend SMB), but still valid, recognized, and useful.

And that should wrap it up. Over 30 minutes of total estimated reading time, one again taking the crown of longest post on this site, with this one taking the award for longest single post, and both together holding the spot for longest multi-post. It’s been a pretty interesting journey for me just to walk through and list off everything that’s going on, and I bet someone, somewhere, is probably going to read this and think there’s a much better solution or piece of software that I could be using instead, but this is what I have now, and it works… so that’s what I’m using. It’s a little bit of a pain to maintain everything at times, but it’s also plenty fun just to be able to say “look, I built this.” Plus, the extra experience I get with all the relevant tech here has to be worth something, somewhere, right? Well if anything else, at least the waste heat from the few physical machines in here provides enough for me to stay nice and warm in the winter, yes, the heating bill actually drops a ton just from the heat all the computers in this place (including those that aren’t mine) produce. Heck, it’s -2.17°C outside, and 19.89°C inside right now, and I know most of that isn’t from the furnace, so, it’s definitely contributing a not-insignificant amount to that.

Anyways, enough rambling, if you’ve stuck around to read all of this, thank you, and I hope you’ll continue to take a look around the rest, and keep up with new content as it comes out.

1. No post on that one. I’ve played around a little but I’m not going to get serious about S/MIME until the Enigma plugin for Roundcube starts adding support (if they ever do). ↩︎

2. The one actually used is more complex, but I stripped out the parts that aren’t important, like only taking into consideration email attributes with the right domain name. ↩︎

3. Not a host per se, but… border's HAProxy. HAProxy itself emits its own Syslog messages instead of relying on the host’s own logging, and HAProxy logs are a good 80% of what I get, since there’s one log per connection. ↩︎

4. Since writing, it’s on its own container now. Even if you did break into my network somehow, no passwords for you! ↩︎