My k8s journey continues... The Aftermath

My k8s journey continues... The Aftermath

Over my last three posts, I have shared some of my k8s anecdotes, detailing my experiences surviving since last Christmas. In this concluding entry of my adventure series, I aim to bring closure to this chapter of my k8s journey.

Within this blog post, I will delve into topics such as hardware, k8s workload, resolving storage dilemmas, and k8s maintenance, providing a comprehensive overview of my journey with this technology.

But let's start...

Hardware Issue

As I discussed in my previous blog post detailing the construction of my k8s cluster, I arranged my five Raspberry Pi 4B+ devices in a stack. However, it soon came to my attention that one of the Pis was experiencing issues. The problem manifested in the form of intermittent reboots, which occurred unpredictably. Sometimes, the Pi would reboot after just one day, and at other times, it would run for over a week before restarting. There were even days when it would reboot twice.

Initially, I suspected that overheating might be the cause of the problem, so I adjusted the cooling fan that cools the Pis. However, this proved futile. I also checked if any script was causing the reboots, but my efforts yielded no results. I tried to narrow down the issue by monitoring the system log using the command journalctl -f -p 3, but this too failed to provide any useful information. Switching from Ram-based logging to disk-based logging was also to no avail.

I even attempted to swap out the USB-C cable and tried changing where the cable was plugged into the power supply. Additionally, I updated the DietPi OS and the Raspberry Pi firmware regularly, replaced the network cable, and performed several other troubleshooting steps. However, none of these measures resolved the problem.

There are only three options that I have yet to try:

  • Swapping out the USB stick from which the OS runs. However, doing this would require reconfiguring k8s and would entail several steps to bring it back to the same OS/k8s state it is currently in.

  • Switching to RaspbianOS instead of DietPi. This would be a time-consuming task and would require a lot of configuration work, similar to the previous option.

  • Changing the problematic Pi's position in the cluster stack. Unfortunately, this is the most challenging option as the faulty Pi is in the middle of the stack. It would necessitate disassembling the entire Pi cluster stack and reassembling it with a different Pi in the middle. It would take at least a day or two to confirm whether the Pi is still rebooting after making this change. Therefore, I have yet to attempt this.

Despite knowing that I must ultimately address this problem, I am choosing to view it positively. I now have a Chaos Monkey in my k8s cluster, which enables me to check if the cluster remains stable even when one server is rebooting. Unfortunately, the unexpected reboots have had an unintended side effect, which I will describe in a section below.

K3s Maintenance

Just like with old steam and fuel engines, it's important to regularly check and maintain the oil, coolant, filters, and fuel/water levels of machines running k8s.

In my case, this means taking care of the OS, k3s, and the apps running on it.

DietPi Update

Let's start with the OS. Since I installed DietPi, I've decided to do updates every three months. First, to avoid too much disruption to a smoothly running system, and secondly, because it's time-consuming to update five Raspberry Pis.

Updating DietPi is straightforward. Use dietpi-update and follow the steps.

Tip: Before you do this, make sure you have disabled the "Headless" mode. I didn't do that, and I had to troubleshoot one of the Pis that got a different IP, and I could no longer reach it via SSH. I had to attach a keyboard and monitor to log in locally and fix the problem.

Other than that, DietPi is easy to update. The other Pis had no problem doing it this way. Also, make sure to read all the prompts during the update to avoid other surprises.

K3s Update

Updating K3s is also very easy but time-consuming. First, you need to drain the node you want to update. That's easily done with the Lens k8s management software. Also, set the node in suspend mode, which can also be done with the Lens app. After that, you can easily update the k3s server software. Just update k3sup (k3sup.dev) to the newest version, which usually updates the version of k3s. Then run the install/join commands I used when I installed the original version. That's it. Rinse and repeat for all other Pis, and normally, this is a smooth operation.

Image Update

The image update is one of the more problematic updates. First, there is the problem of getting the information that there is a new image available. I haven't found a good solution for that yet. But when you become aware of a new image, then you have to dig through the documentation to see if and why you should update.

If you decide to update, there are two ways to do it. The first way is to update the image version in the k3s deployment config via the Lens.app, which is easily done, and Lens takes care of the update and the redeployment. Usually, it runs smoothly, but as I described already, in one case, it forced me to move the CouchDB instance back to a Docker host. This is time-consuming because of the research and decision-making. The actual doing is very easy in my opinion.

The second way to do it is to update the helm charts, which I will describe now.

Updating Helm Charts

Updating the Helm charts is theoretically very easy: download the newest version and redeploy. However, most of the time, Helm charts are not up to date. They don't use the latest image versions, or maintenance is abandoned. So you have to manually update the image in the Helm configs, with the danger that something goes wrong. And you have to figure out what Helm chart version to set.

In my case, I took the first route, but with that route, I lost compatibility with the original Helm charts. But I was willing to take that risk.

All in all, everything is manageable, but you need to plan time for troubleshooting and for the actual work.

Gitea Problems

Gitea is a great self-hosted Git service, but it is not immune to problems. In my case, a reboot, or if that matters, any reboot of one of the nodes or the NFS server, causes Gitea to stop working. It turned out that the NFS mount, which was used to store the Git repositories, was not being reused after the reboot. This causes Gitea to not be able to find the repositories and settings, and thus, not function properly.

To solve this issue, we needed to make sure that the NFS mount was being reused after the reboot. This was done by modifying the fstab file to include the "auto" option, which tells the system to automatically mount the NFS share at boot time. Once this was done, Gitea was able to access the repositories again but that wasn't all I had to fix.

However, we also encountered another problem. The Git repositories were being archived by k8s, even though I had set the NFS share to "Retain" mode. This was a serious issue, as I did not want to lose any repositories due to this error.

To solve this problem, I found on this k8s document page, I needed to patch the Gitea PV configuration, with this command

kubectl patch pv <your-pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

This command switches the default "ReclaimPolicy" from "Delete" to "Retain". This ensures that the repository is retained on the NFS share, even if the NFS or the k3s servers are rebooted. After making this change, I tested the system and found that the repositories were no longer being lost.

In the end, I was able to solve both the NFS mount reuse problem and the repository retention problem, and Gitea was once again functioning properly. However, it was a reminder that even the best self-hosted services can run into problems, and it is important to have a solid understanding of the system and its components to be able to troubleshoot and solve any issues that arise.

Conclusion

Yes, Kubernetes (k8s) is indeed powerful and versatile, but it can also be complex and requires patience to master its concepts and configurations. It took me some time to understand its deep complexities, and I still have more to learn. While I can manage to do some basic tasks, for more complex problems, I often rely on Google to find solutions. Although this approach may not be suitable for professional admins managing large k8s clusters, it works for me.

Would I use k8s again? No, I wouldn't. Most of the fancy features, such as rolling updates and zero downtime, are not necessary for me. My Docker host on my old Mac Mini ran without issues for years, and when I needed to update the OS or an image, it was easy and fast. In contrast, managing my five Raspberry Pi's with k8s requires significant time and effort, and I never know if I'll run into the same problems I've described earlier.

In the future, I plan to stick to my Linux server solution with Linux servers as Docker hosts. I'm interested in trying Docker Swarm this time, which only requires two Docker hosts. Although I have five k8s (4 k8s Pi's and one docker service host) Raspberry Pi's in service, I prefer to stick to two Pis and use the other three for other projects.

However, I have to say that I learned a lot about k8s, and I now have a better understanding of it, which is helpful when colleagues discuss it at the office. For that reason, I'm glad I took on this k8s adventure.

Well, let's see what the rest of the year brings in terms of k8s adventures.

As always apply this rule: "Questions, feel free to ask. If you have ideas or find errors, mistakes, problems or other things which bother or enjoy you, use your common sense and be a self-reliant human being."

Have a good one. Alex