When I joined the team at Reclaim Hosting doing customer support last month, one of the biggest issues we were (and still are) facing was high server loads and the resulting downtime. Being the aspiring Linux administration nerd I am, I got tasked with figuring out a way to find who and/or what was causing all this downtime. Now, I could go into each server and run a series of cryptic
iostat commands, manually compile a list of users and processes, and then send the results over to the office via Telex, because doing all of that work by hand would be so archaic I would need access to a teleprinter.
Enter: python3. My personal language of choice when it comes to doing literally anything. Simple scripts? Yup. Automation? Yeah. Writing a rudimentary engine to build text adventure games on? Yessir.
I decided to write a script that would record the uptime of a server, and then log whatever processes were running right before it crashed. I wanted to call it Chris’ Uptime Monitor, but I don’t think that would go very well with the UNIX-standard 3 letter acronyms for naming programs. So I called it Aftermath Blame Assigner and pivoted to have it log all reboots and monitor the CPU, RAM, and Swap usage. Once the usage of any of these things hits a certain percent, then it starts to log the 15 most intensive processes and the 5 users running the most processes, as well as the current CPU/RAM/Swap usage and Disk I/O. After any crash, check the logs, and you should find who or what caused it.
Now my daily driver distro is Manjaro; it’s where I program, where I game (and I am so glad I got World of Warcraft to finally run, thank you Lutris), and where I spend hours a day looking at Reddit. I also keep a small Ubuntu server around to store files and whatnot. So I wrote the script with these two distros in mind, but most of the company servers run on CentOS. And CentOS doesn’t have python3 installed by default, nor do the distro’s maintainers make it easy to install.
After spending some time figuring out the way to install python3, pip3, and the rest of the dependencies on CentOS, the next step was working on getting it to run on each reboot. On Ubuntu, it was as simple as adding an @reboot line to the crontab. Now, why would CentOS make it that easy? I ended up just writing a systemd service file and including instructions on how to install it. That seemed simpler than dealing with any more problems from CentOS.
Finally, it was done: The Aftermath Blame Assigner (GitHub, GitLab). Onto the next task to do on another slow Saturday.
This time, it was finding broken links on the company’s community forms. I found a nice guide online on scraping a website with wget and then grep-ing through the logs to find errors, so I wanted to automate this with yet another script.
Then came more problems. Not only did the logs show the errors, but every URL with the word error in it. Now I could have spent a few more days working on this, but I knew deep in my heart there was already a tool that would do this, so why reinvent the wheel and write another setup.py?
And here it was, pylinkvalidator, by Barthelemy Dagenais. The solution to all my problems for this specific task. I cloned the repo, and and one
sudo pip install pylinkvalidator && python2 ./pylinkvalidator/pylinkvalidator/bin/pylinkvalidate.py -o ./reclaim.log -O -P https://community.reclaimhosting.com/ later I had a list of every broken link and what page(s) they were on.
Automation at it’s finest. This is what my Computer Science degree prepared me for: writing basic scripts and googling things correctly to find great software that already does what I was trying to do.