makerhouse_network

Public scripts, services, and configuration for running MakerHouse's home network. This network supports:

TODO features here

For more high level details, see [this blog post](TODO TODO TODOOOOOO)

TODO use the drawing at https://docs.google.com/drawings/d/1UkQKlT5fA8L5bAdiAecp-bR1siNsGnlf4KK2kBhsDHk/edit

Setup

Setting up a replicated pi cluster from scratch is an involved process consisting of several steps:

Setting up the cluster

Purchasing the hardware
(optional) Network setup
Flashing the OS
Installing K3S and linking the nodes together

Configuring the cluster to be useful

Configuring the load balancer and reverse proxy
Installing a distributed storage solution
Setting up SSL certificate handling and dynamic DNS

Setting up customization for IoT and other uses

Deploying an image registry for custom container images
Setting up monitoring/alerting and IoT messaging

There are knowledge prerequisites for following this guide:

Some basic networking (e.g. how to find a remote device's IP address and SSH into it)
Linux command line fundamentals (navigating to files, opening and editing them, and running commands)
It's also useful to know what DHCP is and how to configure it and subnets in your router, for the optional network setup step.

Even if you have advanced knowledge of kubernetes, be prepared to spend several hours on initial setup, plus an hour or two here and there to further refine it.

Purchasing the Hardware

For the cluster network, you will need:

An ethernet switch (preferably gigabit) with as many ports as the number of nodes in your cluster, plus one.
A power supply for your switch
An ethernet cable running to whatever existing network you have.

For each node, you will need:

A raspberry pi 4 (or better), recommended 4GB. Ideally all nodes are the same type of pi with the same hardware specs.
A USB C power supply (5V with at least 2A)
A short ethernet cable (to connect the pi to the network switch)

For sufficient storage, you will need (per node):

A USB 3 NVMe M.2 SSD enclosure https://www.amazon.com/gp/product/B07MNFH1PX
An NVMe M.2 SSD (I picked this 256GB one)

Before continuing on:

connect your switch to power and the LAN
connect each raspberry pi via ethernet to the switch (whichever port doesn't matter)
Install an SSD into each enclosure, then plug one enclosures into one of the blue USB ports on each raspberry pi
- At this point, it helps to label the SSDs with the name you expect each node to be, e.g. k3s1, k3s2 etc. to keep track of where the image 'lives'.

A note on earlier versions of raspbery pi:

Try to avoid using raspberry pi's earlier than the pi 4. To check for compatibility, run:

uname -a

If the output contains armv6l then kubernetes does not support the device. There are precompiled k8s binaries for armv6l which you could get, but you’d have to compile manually. This issue describes that kubernetes support for armv6l has been dropped.

A comment at the end of that issue links to compiled binaries for armv6l:

https://github.com/aojea/kubernetes-raspi-binaries

(Optional) Network setup

This guide will assume your router is set up with a LAN subnet of 192.168.0.0/23 (i.e. allowing for IP addresses from 192.168.0.1 all the way to 192.168.1.254).

192.168.0.1 is the address of the router
IP addresses from 192.168.0.2-254 are for exposed cluster services (i.e. virtual devices)
IP addresses from 192.168.1.2-254 are for physical devices (the raspi's, other IoT devices, laptops, phones etc.)
- We recommend having a static IP address range not managed by DHCP, e.g. 192.168.1.2-30 and avoiding leasing 192.168.1.1 as it'd be confusing.

If you wish to have public services, set up port forwarding rules for 192.168.0.2 (or the equivalent loadBalancerIP set below) for ports 80 and 443, so that your services can be viewed outside the local network.

Flashing the OS

Setup SSD boot

Follow these instructions to install a USB bootloader onto each raspberry pi. Stop when you get to step 9 (inserting the Raspberry Pi OS) as we'll be installing Ubuntu instead.

Use https://www.balena.io/etcher/ or similar to write an Ubuntu 20.04 ARM 64-bit LTS image to one of the SSDs. We'll do the majority of setup on this drive, then clone it to the other pi's (with some changes).

Enable cgroups and SSH

Unplug and re-plug the SSD, then navigate to the boot partition and ensure there's a file labeled ssh there (if not, create a blank one). This allows us to remote in to the raspi's.

Now we will enable cgroups which are used by k3s to manage the resources of processes that are running on the cluster.

Append to /boot/firmware/cmdline.txt (see here):

cgroup_enable=memory cgroup_memory=1

Example of a correct config:

[email protected]:~$ cat /boot/firmware/cmdline.txt 
net.ifnames=0 dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc cgroup_enable=memory cgroup_memory=1

Verify installation

Plug in the SSD, then plug in power to your raspberry pi. Look on your router to find the IP address of the raspberry pi,

You should be able to SSH into it with username and password ubuntu.

While we're inside, run passwd to change away from the default password.

Run sudo shutdown now (sudo password is ubuntu) and remove power once its led stops blinking.

Clone to other pi's

Remove the SSD and use your software of choice (e.g. gparted for linux) to clone it to the other blank SSDs. For each SSD, mount it and edit /etc/hostname to be something unique (e.g. k3s1, k3s2...)

At this time, you can edit your router settings to assign static IP addresses to each raspberry pi for easier access later.

Installing k3s and linking the nodes together

We will have one server node named k3s1 and two worker nodes (k3s2 and k3s3). These instructions generally follow the installation guide from Rancher.

Set up k3s1 as master

SSH into the pi, and run the install script from get.k3s.io (see install options for more details):

export INSTALL_K3S_VERSION=v1.19.7+k3s1
curl -sfL https://get.k3s.io | sh -s - --disable servicelb --disable local-storage

Note:

We include the K3S version for repeatability.
ServiceLB and local storage are disabled to make way for MetalLB and Longhorn (distributed storage) configured later in this guide.

Before exiting k3s1, run sudo cat /var/lib/rancher/k3s/server/node-token and copy it for the next step of linking the client nodes.

Install and link the remaining nodes

To install on worker nodes and add them to the cluster, run the installation script with the K3S_URL and K3S_TOKEN environment variables. Note use of raw IP - this is more reliable than depending on the cluster DNS (Pihole) to be serving, since that service will itself be hosted on k3s.

export K3S_URL=https://<k3s1 IP address>:6443 
export INSTALL_K3S_VERSION=v1.19.7+k3s1
export K3S_TOKEN=<token from k3s1>
curl -sfL https://get.k3s.io | sh -

Where K3S_URL is the URL and port of a k3s server, and K3S_TOKEN comes from /var/lib/rancher/k3s/server/node-token on the server node (described in the prior step)

Verifying

That should be it! You can confirm the node successfully joined the cluster by running kubectl get nodes when SSH'd into `k3s1:

~ kubectl get nodes
NAME   STATUS   ROLES                  AGE    VERSION
k3s1   Ready    control-plane,master   5m   v1.21.0+k3s1
k3s2   Ready    <none>                 1m   v1.21.0+k3s1
k3s3   Ready    <none>                 1m   v1.21.0+k3s1

Set Up Remote Access

It's useful to run cluster management commands from a personal computer rather than having to SSH into the master every time.

Let's grab the k3s.yaml file from master, and convert it into our local config:

ssh ubunt[email protected] "sudo cat /etc/rancher/k3s/k3s.yaml" > ~/.kube/config

Now edit the server address to be the address of the pi, since from the server's perspective the master is localhost:

sed -i "s/127.0.0.1/<actual server IP address>/g" ~/.kube/config

Configuring the load balancer and reverse proxy

We will be using MetalLB to allow us to "publish" virtual cluster services on actual IP addresses (in our 192.168.0.2-254 range). This allows us to type in e.g. 192.168.0.10 in a browser and see a webpage hosted from our cluster, without having a device with that specific IP address.

We will also use Traefik to reverse-proxy incoming requests. This lets us different services respond to different subdomains (mqtt.mkr.house and registry.mkr.house, for instance) without having to do lots of manual IP address mapping.

MetalLB load balancing / endpoint handling

Install MetalLB onto the cluster following https://metallb.universe.tf/installation/:

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.5/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.5/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
kubectl apply -f metallb-configmap.yml
- See ./core/metallb-configmap.yml

Note: instructions say to do kubectl edit configmap -n kube-system kube-proxy but there's no such config map in k3s. This wasn't a problem for our installation.

Test whether metallb is working by starting an exposed service, then cleaning up after:

5.kubectl apply -f ./core/lbtest.yaml 6. kubectl describe service hello * Look for "IPAllocated" in event log * Visit 192.168.0.3 and confirm "Welcome to nginx!" is visible 7. kubectl delete service hello 8. kubectl delete deployment hello

Troubleshooting

Some failure modes of MetalLB cause only a fraction of the VIPs to not be responsive.

Check to see if all MetalLB pods are in state "running"

kubectl get pods -n metallb-system -o wide

  ```
  speaker-7l7kv                 1/1     Running   2          16d   192.168.1.5   pi4-1   <none>           <none>
  controller-65db86ddc6-fkpnj   1/1     Running   2          16d   10.42.0.75    pi4-1   <none>           <none>
  speaker-st749                 1/1     Running   1          16d   192.168.1.7   pi4-3   <none>           <none>
  speaker-8wcwj                 1/1     Running   0          16m   192.168.1.6   pi4-2   <none>           <none>

  ```

More details - download kubetail - see bottom of this page

./kubetail.sh -l component=speaker -n metallb-system
If you see an error like "connection refused" referencing 192.168.1.#:7946, check to see if one of the "speaker" pods isn't actually running.

Traefik configuration

Traefik is already installed by default with k3s. We still need to configure it, though.

Generate the dashboard password:

htpasswd -c passwd admin
echo ./passwd
get the part after the colon, before the trailing slash. That's $password
Update config (/var/lib/rancher/k3s/server/manifests/traefik.yaml, move it to traefik-customized.yaml):

ssl.insecureSkipVerify: true
metrics.serviceMonitor.enabled: true
dashboard.enabled: true
dashboard.serviceType: "LoadBalancer"
dashboard.auth.basic.admin: $password
loadBalancerIP: "192.168.0.2"
logLevel: "debug"

Edit /etc/systemd/system/k3s.service and add --disable traefik to disable original traefik config

sudo systemctl daemon-reload
sudo service k3s restart

Test the configuration:

kubectl apply -f ./core/default-ingress.yml
kubectl get ingress
- You should see something like hello <none> i.mkr.house 192.168.0.2 80 2m2s

Note: Attempts to query *.mkr.house internally lead to the router admin page. You'll need to use a mobile network to test external ingress properly, i.e. that with the lbtest.yaml and default-ingress.yml applied, a "Welcome to nginx!" page is displayed from outside the network.

Troubleshooting tips

You can use journalctl -u k3s to view k3s logs and look for errors.

Installing a distributed storage solution

Now we can set up a distributed storage solution, so that we can host things on any of the raspberry pi's that can move freely between them, without worring about locality of data to any particular pi.

We'll be using Longhorn, the recommended solution from Rancher.

Follow the installation guide to set it up. See core/longhorn.yaml for the MakerHouse configured version.

Note: this solution requires an arm64 architecture, NOT armhf/armv7l which is the default for Raspbian / Raspberry PI OS.

Be sure to also set it as the default storage class, or else certain helm charts will fail to provision their persistent volumes without specifying storageClass specifically:

kubectl patch storageclass longhorn -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

The Longhorn UI is not exposed by default; you can expose it with these instructions.

Setting up SSL certificate handling and dynamic DNS

Now we will set up SSL certificate handling, so that we can serve https pages without browsers complaining about "risky business".

Dynamic DNS will also be configured so that an external DNS provider (in our case, Hover) can direct web traffic to our cluster using a domain name.

Certificate Management

The following instructions are based on https://opensource.com/article/20/3/ssl-letsencrypt-k3s, but with substitutions for arm64 packages (this tutorial assumes just "arm").

Note that you will need to have ports 80 and 443 forwarded to whatever address is given by kubectl get ingress, which is what Traefik is configured to use in /var/lib/rancher/k3s/server/manifests/traefik-customized.yaml (See "Traefik configuration" above).

The first two instructions aren't needed if core/cert-manager-arm.yaml is correct for the setup:

curl -sL https://github.com/jetstack/cert-manager/releases/download/v0.11.0/cert-manager.yaml | sed -r 's/(image:.*):(v.*)$/\1-arm64:\2/g' > cert-manager-arm.yaml
grep "image:" cert-manager-arm.yaml

Now we apply the cert manager:

kubectl create namespace cert-manager
kubectl apply -f cert-manager-arm.yaml
kubectl --namespace cert-manager get pods
kubectl apply -f letsencrypt-issuer-prod.yaml
kubectl apply -f ingresstest.yaml (TODO ingress test file)
- including "annotations" and "tls" sections described here, "request a certificate for our website"
kubectl get certificate
- Should be "true", although this may take a couple seconds after init
- If not, check if i.mkr.house resolves to the current house IP. May have to update Hover manually for this portion.
kubectl describe certificate
- Should say "Certificate issued successfully"
Confirm behavior by going to https://i.mkr.house from external network and seeing the test page.

Private Registry

A private registry hosts customized containers - such as our custom NodeRed installation with specific addons for handling google sheets, google assistant etc.

This parallels the guide at https://www.linuxtechi.com/setup-private-docker-registry-kubernetes/

For "simple password" i.e. htpasswd setup (following these instructions):

sudo apt -y install apache2-utils
htpasswd -Bc htpasswd registry_htpasswd
kubectl create secret generic private-registry-htpasswd --from-file ./htpasswd
kubectl describe secret private-registry-htpasswd

Values:
- user: registry_htpasswd
- pass: <your password here>

Then start the deployment:

kubectl apply -f private-registry.yml
- This creates a persistent volume (via Longhorn), deployment/pod, an exposed service on 192.168.0.5 and a TLS certificate.
Add to pihole DNS: "registry" and "registry.lan" mapping to that IP

To test the registry, let's try tagging and pushing an image:

docker login registry.mkr.house:443
- (add username & password when prompted)
docker pull ubuntu:20.04
docker tag ubuntu:20.04 registry.mkr.house:443/ubuntu
docker push registry.mkr.house:443/ubuntu

To see what's in the registry:

curl -X GET --basic -u registry_htpasswd https://registry.mkr.house:443/v2/_catalog | python -m json.tool

To pull the image:

docker pull registry.mkr.house:443/ubuntu

Now we need to set up each node so it knows to look for the registry, following these instructions (note: not TLS)

ssh [email protected]

sudo vim /etc/rancher/k3s/registries.yaml

mirrors:
  "registry.mkr.house:443":
    endpoint:
      - "https://registry.mkr.house:443"
configs:
  "registry.mkr.house:443":
    auth:
      username: "registry_htpasswd"
      password: "r,A!U[email protected]>N^(nW!Ja-~6~h"
    tls:
      insecure_skip_verify: true

sudo service k3s restart, then logout

Let's copy it to the remaining nodes and reboot them:

scp [email protected]:/etc/rancher/k3s/registries.yaml .
scp ./registries.yaml [email protected]:/home/ubuntu/
ssh [email protected]
sudo mkdir -p /etc/rancher/k3s/
sudo mv registries.yaml /etc/rancher/k3s/
sudo service k3s-agent restart
Repeat steps 11-15 for k3s3.

Prometheus monitoring & Grafana dashboarding

We'll set up Prometheus to collect metrics for us - including timeseries data we expose from IoT devices via NodeRed.

Grafana will host dashboards showing visualizations of the data we collect.

To install Prometheus we will be using Helm, as there is a nice community provided helm "chart" that does a lot of config and setup work for us.

helm repo add prometheus-community [https://prometheus-community.github.io/helm-charts](https://prometheus-community.github.io/helm-charts)
helm upgrade --install prometheus prometheus-community/kube-prometheus-stack --values k3s-prometheus-stack-values.yaml
If you need to modify the config, you can see what changes to the *values.yaml file do by running: helm upgrade **--dry-run** prometheus prometheus-community/kube-prometheus-stack --values k3s-prometheus-stack-values.yaml

We'll set up an additional scrape config (for e.g. nodered custom metrics; see here for documentation on the config).

kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml --dry-run -oyaml > additional-scrape-configs.yaml
kubectl apply -f additional-scrape-configs.yaml

Troubleshooting

If prometheus runs out of space, the "prometheus-prometheus-kube-prometheus-prometheus-0" job will crashloop forever with an obscure stack trace. Resizing the volume that prometheus uses is somewhat tricky:

go to 192.168.0.4 (the longhorn web ui) to assess how much storage you can assign.
kubectl edit deployment prometheus-kube-prometheus-operator
- Set "replicas" to 0. The operator automatically updates other prometheus entities in kubernetes, so if it's running you can't edit replicasets etc. without them immediately being reverted.
kubectl edit statefulset prometheus-prometheus-kube-prometheus-prometheus
- Set "replicas" to 0. This generates the pod which binds to the data volume. Longhorn storage must be unbound before it can be resized.
vim ~/makerhouse/k3s/k3s-prometheus-stack-values.yaml
- Under prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage, change to e.g. "50Gi"
helm upgrade prometheus prometheus-community/kube-prometheus-stack --values k3s-prometheus-stack-values.yaml
- Longhorn should indicate the volume is being resized. You can also check with kubectl describe pvc prometheus-prometheus-prometheus-kube-prometheus-prometheus-0 and look for an event like "External resizer is resizing volume pvc-9da184ed-28f9-48d1-82ea-3e0c0a93cf1d"
- If the status of the pvc is still "Bound", run kubectl get pods | grep prometheus to see whether the prometheus operator or the main prometheus pod is still running for some reason. It should be deletable with kubectl delete pod <foo> if the deployment and statefulset are both set to 0 replicas.

If you want to delete unneeded metrics:

curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=a_bad_metric&match[]={region="mistake"}'
- See https://www.robustperception.io/deleting-time-series-from-prometheus
curl -X POST -g 'http://prometheus:9090/api/v1/admin/tsdb/delete_series?match[]={instance="192.168.1.5:6443"}'
- Deletes all metrics for a particular target/instance.
curl -X POST -g [http://prometheus:9090/api/v1/admin/tsdb/clean_tombstones](http://prometheus:9090/api/v1/admin/tsdb/clean_tombstones)
- Do this to actually garbage collect the data - note that this may grow the used disk size (up to 2X if you're deleting most things!) before it shrinks it

MQTT (NodeRed + Mosquitto)

We will be using MQTT to pass messages to and from embedded IoT and other devices, and Node-RED to set up automation flows based on messages seen.

Let's build the nodered image to include some extra plugins not provided by the default one:

cd ./nodered && docker build -t registry.mkr.house:443/nodered:latest && docker image push registry.mkr.house:443/nodered:latest

Both MQTT and NodeRed are included in the mqtt.yaml config. "mosquitto" is the specific MQTT broker we're installing.

kubectl apply -f mqtt.yaml -f configmap-mosquitto.yml

To support Google Assistant commands, we'll need a JWT file. More details on the plugin page for how to acquire this file for your particular instance.

kubectl create secret generic nodered-jwt-key --from-file=/home/ubuntu/makerhouse/k3s/secretfile.json

Maintenance Log

2021-04-30 Master node reinstall

Prep:

Set router DHCP to 8.8.8.8 DNS
Copied pihole config ("Teleporter" setting)
Saved Nodered flows
TODO Copy k3s keys

Unlisted dependency:

When setting up SSL cert-manager, certificates couldn’t be issued because the Hover IP hadn’t been updated. Manually update IP in Hover to current house IP.

2021-07-22 personal website install

Needed to extend the "SUBDOMAIN" env var in ddns-lexicon.yml, and possibly also add the record to hover.com (may be doing an update, not an upsert?) in addition to creating ingress/service/deployment k3s configs

2021-09-02 pihole out of disk

Ran pihole -g -r to recreate gravity.db, also deleted /etc/pihole/pihole-FTL.db

Public scripts, services, and configuration for running a smart home K3S network cluster

Related tags

Overview

makerhouse_network

Setup

Purchasing the Hardware

A note on earlier versions of raspbery pi:

(Optional) Network setup

Flashing the OS

Setup SSD boot

Enable cgroups and SSH

Verify installation

Clone to other pi's

Installing k3s and linking the nodes together

Set up k3s1 as master

Install and link the remaining nodes

Verifying

Set Up Remote Access

Configuring the load balancer and reverse proxy

MetalLB load balancing / endpoint handling

Troubleshooting

Traefik configuration

Troubleshooting tips

Installing a distributed storage solution

Setting up SSL certificate handling and dynamic DNS

Certificate Management

Private Registry

Prometheus monitoring & Grafana dashboarding

Troubleshooting

MQTT (NodeRed + Mosquitto)

Maintenance Log

2021-04-30 Master node reinstall

2021-07-22 personal website install

2021-09-02 pihole out of disk

Owner

Scott Martin

A collection of differentiable SVD methods and also the official implementation of the ICCV21 paper "Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?"

A basic neural network for image segmentation.

The official PyTorch implementation of Curriculum by Smoothing (NeurIPS 2020, Spotlight).

Implementation of Pooling by Sliced-Wasserstein Embedding (NeurIPS 2021)

Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.

Code for binary and multiclass model change active learning, with spectral truncation implementation.

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

Gapmm2: gapped alignment using minimap2 (align transcripts to genome)

Artstation-Artistic-face-HQ Dataset (AAHQ)

Pretrained Cost Model for Distributed Constraint Optimization Problems

Pixel-level Crack Detection From Images Of Levee Systems : A Comparative Study

Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics.

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Real-time Joint Semantic Reasoning for Autonomous Driving

This script runs neural style transfer against the provided content image.

BlueFog Tutorials

Walk with fastai

Code for ICDM2020 full paper: "Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning"

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

Generative Adversarial Networks(GANs)