Securing servers and managing backups are two of the many endless tasks I’ve undertaken as a SysAdmin. This post will be about the former, as I’m not sure I’ll ever have the energy to write about the latter.

How the hell can you even hit an inode limit when like half the disk is still unused?

I digress…

There’s always something to think about when working through security tasks. New software almost always means new vulnerabilities, so is it worth it to install a piece of new software? How can we balance security with ease of use and existing workflows? What are the best and quickest ways to respond when (not if) a compromise happens? At what point do I snap and embrace security-through-obscurity to the point I migrate all of our infrastructure to three Haiku servers running in my closet?

In short: it never ends.

Rock goes up. Rock comes down.

One project I’ve been working on the past few months at Reclaim to help increase our security has been spinning up SSH Gateway servers. Centralized points of access for the team to log in to our fleet of servers via SSH. This centralization allows us to: more tightly restrict who can log into what servers, keep track of logins, more easily cycle SSH keys in and out when needed, and so on. For a remote company distributed across the country, it’s a lot easier to do this than explicitly allow 10+ different (often changing) IP addresses across ~200 servers.

But notice I said points and not point. Plural.

A centralized point of access also means a centralized point of failure. That’s why I set up multiple such SSH Gateways in separate locations with different server providers. If one goes down due to maintenance or otherwise, we have something to fall back on. Sure, there’s always the chance that everything fails, but at that point I think we’d have bigger things to worry about. I’ll probably spin up a few more in the future just to be safe…

To harden the servers, I referred back to my personal project of setting up a Tor Hidden Service. Password authentication is totally disabled, and both Fail2Ban and UFW are running to lock things down to only legitimate SSH traffic. I even installed the firewall we use infrastructure-wide (Bitninja) to help out as well. More hardening will be done as I continue to “balance security with ease of use”.

Say it again: it never ends.

Ansible (as is the case with about half the work I do) has been instrumental in this project. That software has made it seamless to push out (and remove) SSH keys, to update sshd config files, allow-list the gateway servers in firewalls, set rules in /etc/hosts.{allow,deny}, and so on.

So far, the SSH Gateways have felt more useful to the Infrastructure team than anyone else at Reclaim. After all, we’re the ones who spend most of the day in a terminal looking at bash (or in my case, zsh) prompts, while everyone else usually opts for whatever web interface WHM offers. Nonetheless, everyone on the team does still needs SSH access; but getting everyone to generate their own ed25519 keys, send them to me, and then modify their ~/.ssh/config files to use those keys can be like herding cats. Luckily, Cloudflare has a way to make access easier without really sacrificing any security: browser-rendered SSH terminals that, with some configuration, automatically authenticate with GSuite (Google Workspaces?), and automatically drop someone into their account on the SSH Gateway servers. All without them ever needing to generate keys open a local config file (I still want you to generate an SSH key and modify your conf though).

Cloudflare has been very useful to this and related projects. Beyond the terminal-in-a-browser, we’ve started to look in to using them to lock down the WHM login so that only people who authenticate with GSuite can even access it. More testing needs to be done, especially in regard to how our billing system would work with this, but each step still moves us uphill. And while something like that makes perfect sense for the infrastructure that is 100% ours (since WHM here will only be accessed by us), there’s more to consider for the infrastructure we run for/with other folks (DoOO, Managed Hosting, and so on). How can we still provide the low level of access (some) current admins are familiar with, while still keeping things locked down? Are WHM resellers enough for this? Or do we need something more?

More questions. It never ends.

And it never will. Security is a moving target, and the rock will always roll back downhill waiting to be pushed up again.

I’ll end this post with a song from one of my favorite groups. Not one that’s relevant to anything I talked/wrote about above (except maybe on a philosophical level), but rather one that’s helped keep me sane as I respond to security issues and dig through confs. Rolling the rock uphill again and again every day.

The only song (one of four, apparently) that I’ve been able to find from their Peel session.

Their name is admittedly provocative, but in the liner notes to their album they explain it as reclaiming “a tool of degradation” and using it as a “reminder that things have not and will not change until we change them”.

Michael Franti is an absolute genius.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.