Sovereignty Over Convenience
I’ve recently been on a journey reclaiming sovereignty over all of my data and infrastructure. I still remember the era before the cloud when you had to do everything yourself.
Cloud changed all of that, and it was really nice - and convenient. In the back of my mind, though, there was a tiny little scratch.
Security is inversely proportional to convenience
– Professor Evi Nemeth
GitHub was so convenient compared to running your own infrastructure. The scratch stayed small. Then Microsoft bought GitHub.
Over the last few years, things changed. Trust in big tech has eroded so heavily.
LLM’s have also made things a lot worse - in two ways.
Every organisation seems to be trusting LLM’s to write and deploy code. I’ve got the LLM’s to write code - and I would not trust that anywhere near security critical code. GitHub is using non-deterministic, hallucination-prone models to make security decisions.
Secondly, LLM’s are vacuuming up all the data they can get their hands on - no permission sought.
However, my much bigger issue is that I have no choice in the matter. My consent was not sought - my concerns were not voiced. My only choice is to carry on in their boat or get out.
Ultimately, though, way down deep, in the very core of my questions, there were two basic questions.
- What is all my stuff on the cloud being used for?
- How safe is it?
As each day passed, and my trust in the big tech eroded, the pain of setting up the infrastructure started to pale in comparison to the pain of my data being used without my informed consent.
I had to act. But first - an inventory!
My cloud data mainly lived across:
- GitHub
- multiple repos, public & private
- blog site (on github pages with custom domain)
- Google
- Emails
- Documents
- Photos
- Facebook
- Photos
- Posts
- who knows what else
As a starting point, I wanted to tackle my public infra - mainly GitHub.
I ultimately wanted to feel as safe as I felt before the trust in the cloud was eroded, with the minimal effort. That meant:
- Truly private storage for my private data
- Strong GDPR support.
- Hosting the public aspects on trustworthy platforms
- Resilience. I’d need to build that manually, starting with offsite backups.
- High availability. I’d need to get as close to that as I reasonably could - so I also needed monitoring
It would be no small task, and the biggest pain point would be ongoing maintenance and supporting it if something goes wrong.
I would need to document how every bit ties together, make it easy enough to redeploy services, reinstall servers and to monitor them, all the while also keeping it safe, secure and minimising any attack surface.
Right… There is a cost to getting all of these “for free” on the cloud.
Tooling & Services
Cloud IaC
Documenting all of this would be tricky - unless I use Infrastructure as code. I’d used terraform quite a bit and was comfortable with it - well, opentofu now. However, I knew that I would want to separate things into small units and that I would want to make re-usable components - for static site hosting for example. These features were easier with pulumi - and I can skip the cloud features.
Bare Metal IaC
That would however, not cover any server configuration. I would have servers both at home and remotely. I could just configure them once and hope for the best - but they will inevitably need a full upgrade, which might need config tweaking again. I might need to reinstall, and then I’d need to figure out all the configuration I did.
I have been pretty disciplined in the past about noting config changes so I could reproduce it later, but that’s neither an easy nor a pleasant task.
I wanted something automated - I considered something like etckeeper , but that would store every config change, including from package updates. I wanted to track only my changes.
I wondered if overlayfs
could be a solution, with my version overlaid over the packaged installed
/etc - but it wasn’t really designed for such a use case. I’d also need to
somehow redirect the package installs to a different place than /etc
Another option was stow. I was already using it for my dotfiles. I could have
root stow files into /etc. This option felt the most straightforward until
I had to re-install my home server (omv -> proxmox).
I wanted to be able to automate any reinstalls further - not just /etc, but
also package installs. The most straightforward tool fit for this was ansible.
I did consider chef, puppet, etc. but they offered a lot of features I
didn’t need. The only feature that ansible didn’t give me that I wanted was
state management like pulumi, but from what I could see, the alternatives did
not provide that.
In the end, I just decided to manually clean up after ansible when it leaves any files, config or packages behind. Worst case, I also had the nuclear option of wiping and reinstalling everything to get rid of any cruft since I configure everything through ansible. I’ve already done that once.
Cloud Services
There are specific cloud services I would still need. I need offsite backup - in case something happens to my server. I also need to host my blog and my public sources repos.
In Europe, I found two strong candidates: Hetzner , immediately felt more enterprise level, which was confirmed with the pricing. Scaleway felt friendlier and the pricing was more accessible.
Ultimately, I picked Scaleway because the pricing was cheaper at lower usage levels, giving me a bit of time to ramp up. The interface was also easier to understand and navigate.
Scaleway has Object Storage for offsite backup and for static site hosting.
I considered hosting a forgejo instance but that meant a VPS, a database server, patches, potential issues around bots / LLM training, higher level of complexity, and cost. It would also add friction to user interaction - they’d have to register to my instance, which would have only my code.
I instead opted for codeberg . If it’s good enough for zig, it’ll be good enough for me.
forgejo might become interesting again after it integrates ForgeFed , which would make cross-instance collaboration easier.
Backup
For backup, restic was best fit compared to borg. It has better S3/Object-Storage backend support natively. More importantly, borg needs to be a running service, and so more maintenance. With restic, I can just sftp to the server for all operations, backup, view or restore.
Monitoring
The last time I did monitoring and alerting in production, I was using munin / monit / nagios. Everyone has moved on. I’d evaluated datadog, grafana, New Relic etc in a previous job, but of course, I am not opting for a cloud option.
While there are a few options out there, prometheus and grafana came up as fairly standard, and I wanted more experience in them, so they were picked for the stack.
Migrating
The first step was to back everything up.
I set up restic on my desktop, backing up to my server (atlas), which backed
everything up to Scaleway’s object storage.
That took many, many hours to complete.
The next step was to tidy up my home infrastructure.
atlas
The server I have was already running omv
and
plex
. While 15+ years old, it is a dual cpu box with
32G of RAM. It had 4 2T magnetic drives on mdadm using raid-6
I had three more 2T drives on my desktop that I wanted to move over to the server because it’ll be good to have the space.
I would also need additional services on the box.
prometheusandgrafanaand any other related services for monitoring and alerting.- codeberg actions runner in a contained environment to mitigate security risks
- forgejo instance for my private source repos
I could still run debian with docker and deploy services on there. However, I wanted more isolation for the codeberg actions runner - just in case it managed to escape the container.
Proxmox Virtual Environment was a good fit. I’d used it years earlier, but wiped it after it’d fallen behind on updates and it felt like a huge task to upgrade it. This time, I’ll be using ansible so I could even wipe and reinstall if I had to.
Proxmox would also bring zfs, with raidz2 which provided safer array expansion and scrubbing to catch bitrot early.
There is a single SSD on atlas, and installing proxmox on there was the easy part. I then had to find temporary storage for around 4T of data.
I spread them out on my desktop over several drives, I had some space on my M2 SSD and my games drives which had a lot of space but no resilience. I put all of my data which I could recreate if I needed to on there - e.g. my blu ray rips.
All the more valuable data luckily fit on my desktop’s raid5.
Worst case, all of it was also backed up using restic to a scaleway storage bucket.
Base Config
atlas remains for the most part the core install of proxmox.
Apart from basics like neovim, it’ll also host the zfs storage:
- through sftp for restic
- nfsv4 for the other linux boxes
- samba (when I need shares on my dual boot or vm’s)
- ssh for git
- data directories for all the services running in docker.
It also hosts the LXC containers for:
- Codeberg Actions Runner (
ussain)- on a separate network so that it can’t access my LAN
- docker services (
loom)- prometheus
- blackbox-exporter (web site monitoring)
- prometheus-pve-exporter (for proxmox stats)
- grafana
- jellyfin (privacy respecting alternative to plex)
- prometheus
Since all of these are deployed via ansible with the data stored in atlas’ zfs, it’s easy enough to change the configuration and deploy them to another lxc. Only other manual step would be to delete the previous instance since ansible doesn’t clean up after itself.
Scaleway
Scaleway will host my blog site. I’d been meaning to rename it anyway, so this was a good opportunity to do that. My website uses hugo which outputs static html, so I can host it on S3 like Object Store. The problem with that it gives you a long domain name and doesn’t support custom domains directly.
To be able to route it via a custom domain, the simplest solution is Edge Services on Scaleway. On AWS, I’d used CloudFront - but it is annoying, and you needed to set up http -> https redirect as well.
Scaleway also has an additional cost component. You need to pay €0.99/month minimum. That’ll give you one pipeline (i.e. one domain) and it’s €4 for each additional pipeline.
I wanted to host two domains - so that’d set me back €4.99/month. Doesn’t break the bank, but it’s also much more expensive than AWS. A small price to pay for privacy and sovereignty.
I found the edge services quite fiddly though. I could delete it from the console, but pulumi didn’t detect the changes correctly. I also had some issues which meant that it burned through generating 50 ssl certificates and required I wait 7 days before trying again.
At this point, I decided to deploy a VPS instead.
I knew that I would eventually need a VPS for remark42 to support commenting on my blog. A VPS is around €7, only €1 more than the Edge Services pipelines.
I used pulumi to provision the VPS(hera), including ipv4 and ipv6, and used ansible to configure it.
I also put together a pulumi stack for ansible outputs. It picks up the
bucket s3 urls and writes a config file for ansible. ansible then uses this
to configure caddy to route the relevant domains to their s3 buckets. This
automation helps to keep the cloud state synced with the server configuration
without manual intervention.
What it does have though is a node exporter for prometheus. However, it’s not
easy for prometheus to connect to it. The safe option I could come up with was
to use wireguard between the docker lxc (loom) and hera.
I also added a firewall rule on loom to prevent any new connections originating
from hera. I want prometheus to be able to access hera, but not the other
way around. That adds an extra layer of protection if hera is ever
compromised.
Again, all of this is configured through ansible, so if hera is ever
compromised, I could just wipe it and reconfigure it with one command.
Deploying the websites
Now that the websites were configured, I wanted to give the visitors more than an error page.
This was also the appropriate time to move my repo from GitHub to codeberg. I
created a new repo on codeberg and pushed the repo up. I just edited
.git/config instead of removing the origin and adding it back in.
The site was previously using both GitHub Actions and GitHub Pages. It would now be using forgejo actions and rclone to push to Scaleway. But wait - it needed auth.
So far, codeberg has been the one bit of infrastructure I couldn’t code. I had to actually go on to the website and click through the UI manually. It has an API, but pulumi does not support it.
I did however put together a little script which takes a CODEBERG_TOKEN env
var, picks up the secret key from pulumi outputs and sets it as a secret on the
specified repo.
I have a function in pulumi for static sites which also creates and exports this token, which makes it pretty straightforward to add more static sites. This reusable functionality was the reason I wanted pulumi from the start. I’d done reusable components in terraform as well, but they were a lot clunkier.
I considered collecting web server logs, as I remembered doing back in the day - with apache and analysing them awstats . These days, it would probably be GoAccess . However the privacy concerns around storing ip addresses needed handling properly, so I put that on the backburner.
Monitoring & Alerting
Monitoring and Alerting was honestly the biggest reason why I kept putting off bringing everything in-house. Not the work involved in building or running it, but the feeling of being constantly on call - which I was for 13 years.
That was for multiple high profile, high traffic websites. This is my personal blog - it’s not a problem if the site is offline for a few hours - nobody is losing money. Still, the pavlovian response was one of stress.
I was able to work through it largely by reminding myself that there are no SLA’s.
I got prometheus to scrape all the data it can from the server, the lxc’s and my desktop. I pulled in some dashboards from grafana to visualise them. It’s nice to see a historical usage for my desktop and the server, including averages, growth rate etc.
However, alerting was the main reason for all of this. blackbox-exporter monitors my blog and another site I’ve got set up. This is valuable, and nice to see green across the board and that it checks expiry for the ssl certificate. I’d been burned in the past with certificate expiry and it’s nice to not have to worry.
When I set up the new site, I also noticed that I’d forgotten to set up a redirect from a previous domain. That domain had a lot of SEO juice, which was cut off for a while because I hadn’t tested it.
This time, I made sure to add all redirects to the blackbox_exporter tests.
I configured them to go to a channel in a discord server, the path of least resistance. A better option would probably be fluxer.app but it doesn’t yet have a mobile app.
It triggered an alert the other day. My heart skipped a beat before I remembered that it was just my blog. It resolved itself while investigating it. I could find nothing wrong on the server, which was unlikely to be the culprit anyway.
It was probably the Object Storage Bucket, so I added the website endpoints to the monitoring as well. In the event of a future site failure, I will be able to see at a glance at which point the failure is.
For a moment, I regretted switching it all to my infrastructure - because it would not have gone down for a few minutes on GitHub. I then realised that I simply don’t know how often my site went down when it was on GitHub or for how long - it was never monitored.
Journey So Far
I am thrilled to have my blog live on my own infra instead of on GitHub’s. Having the whole setup documented in pulumi and ansible makes it much easier to maintain and manage. It also makes it much easier to take a look at how I set something up.
I also appreciate that I can redeploy all or parts of the services with ease if required, and that upgrades should be relatively pain-free.
The infrastructure that is deployed feels less opaque than when I had done similar things 15+ years ago, thanks to it all being managed via IaC.
Next Steps
I also have to pull everything down from google and facebook. I already have the file storage ready - I just need to pull everything down, then delete it from the cloud.
Email will be a bit more work. I am using mailbox.org for my new domain, which feels good enough. I have half a dozen email addresses. I can rationalise them down to three, but I also want to wipe out all the junk email in the process.
Google Photos will be trickier still, partly because I’ve not looked for a self-hosted solution for that yet. There is also the problem with how to provide access to my internal servers to the internet safely so that photos can be uploaded / downloaded from the phone when I’m out and about. Headscale might be one way to solve this.
I also want to put together a dashboard on grafana that will show me a comprehensive high level overview of my whole estate on one monitor - that’s an endeavour for another day.
Conclusion
So far in my journey, I have migrated key repos from GitHub, both private and
public to safer places. I feel substantially safer already. Most of the repos
were code, but one was my zettelkasten / second brain knowledge archive. This
used to be a private repo on GitHub. It is now stored on atlas across from me
in my room, encrypted and backed up to Scaleway. That was the biggest win of
this whole process.
The rest of the data will come down here in time, and I have no doubt that I’ll feel safer by the end of it.
Am I actually safer? That is much harder to measure. I know that I have limited the attack surface as much as possible. Plus, my scale is so small that I am unlikely to be targeted.
On the other hand, GitHub and other cloud platforms have a lot of people worrying about and considering security and safety on a daily basis. They regularly patch the servers and track security vulnerabilities.
Am I actually safer? I don’t know - but I feel safer.
