20 January 2021

Adding bricks to the k8s/gluster cluster

I’ve brought a second node into the cluster but it didn’t go perfectly right off the bat so the third brick will be the proof. The reason it failed is that I had some fancy automation set up that already created an unattached volume on the second node, but adding pre-existing volumes is ludicrous, now that I think about it, and I only thought about it once I tried to do it and was told “No.”

Here’s the story.

Adding the second brick

I’ve expanded MicroK8s clusters before but never Gluster, so let’s try Gluster first. I’ve already got a volume going, and I want to extend the volume to another brick–another storage node. I’ve got the second node set up similarly, with the same size partition mounted with the same filesystem. In fact, I’ve gone the extra distance and created a volume there, exactly the same as I have done on the first node.

To add nodes to a Gluster cluster, one probes it from the first node. This adds the node to the pool of trusted peers. This failed immediately.

brick0 $ sudo gluster
gluster> peer probe brick1
peer probe: failed: brick1 is either already part of another cluster or having
volumes configured

This makes sense. So eventually I go to brick1 and issue sudo gluster stop gv0, then sudo gluster delete gv0. Now Gluster on brick1 believes the volume is gone, and I was able to connect from brick0.

gluster> peer status
Number of Peers: 0
gluster> peer probe brick1
peer probe: success.
gluster> peer status
Number of Peers: 1

Hostname: brick1
Uuid: 5d0e6754-4211-4105-b001-58bc83dc4bd6
State: Peer in Cluster (Connected)

Next I tried to add a Gluster brick and that didn’t go well, either. In three attempts, I got the following messages from Gluster:

volume add-brick: failed: Pre Validation failed on brick1. Failed to create brick directory for brick brick1:/data/brick1/gv0. Reason : No such file or directory

This again. I don’t remember if I did anything on brick1.
volume add-brick: failed: Pre Validation failed on brick1. /srv/brick1/gv0 is already part of a volume

Here’s where I go back to brick1 and try to clean up everything again. What I didn’t realize is that the previous attempt actually managed to get the two Gluster bricks in the cluster. I’m not certain of this, because I never checked on brick0: it looked like the operation had been completely unsuccessful. But, it must have, and I must have issued the command to delete gv0, because then, on the third attempt in this sequence to add brick1’s gv0:
volume add-brick: failed: Unable to get volinfo for volume name gv0

I find there are no volumes in the cluster at all, including the one on brick0.

The volume was still there, and I could even see the test data I’d placed there yesterday. I’d just stopped the volume and torn it down. I spent a little time trying to find the equivalent of mdadm’s assemble operation, such that Gluster would recognize an inactive volume and re-activate it. In the end it was this:

gluster> volume create gv0 brick0:/srv/brick1/gv0
volume create: gv0: failed: /srv/brick1/gv0 is already part of a volume
gluster> volume create gv0 brick0:/srv/brick1/gv0 force
volume create: gv0: success: please start the volume to access data
gluster> volume start gv0
volume start: gv0: success

So, back to where I was at the beginning, but I’ve got my volume back.

In the end what worked was, with brick0’s gv0 a single-brick volume, I went back to brick1, stopped and deleted its gv0 and stopped Gluster and rm -Rf’d the /srv/brick1/gv0 directory. Then I started Gluster back up, and back on brick0, saw the peer was connected again using pool status, and then:

gluster> volume add-brick gv0 replica 2 brick1:/srv/brick1/gv0
Replica 2 volumes are prone to split-brain. Use Arbiter or Replica 3 to avoid
this. See:
http://docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/.
Do you still want to continue?
 (y/n) y
volume add-brick: success

By the way, before confirming, I found documentation on this (the link given was a 404) and I think I’m fine since I will be adding another replica after this. (Split-brain is when multiple storage nodes have inconsistent views of the data–I assume with three replicas, quorum will be more effective.)

I poked around a bit and it looks like all the basics are working. I am ready to go with the third node and then go start to expand k8s similarly.

Adding the third brick

I brought it up with Gluster installed and the partition and filesystem ready, so on the first brick:

$ sudo gluster
gluster> peer probe brick2
peer probe: success.
gluster> peer status
Number of Peers: 2

Hostname: brick1
Uuid: 5d0e6754-4211-4105-b001-58bc83dc4bd6
State: Peer in Cluster (Connected)

Hostname: brick2
Uuid: dc05b74e-0d8a-4704-b1d0-c855fdbf79ca
State: Peer in Cluster (Connected)
gluster> volume add-brick gv0 replica 3 brick2:/srv/brick1/gv0
volume add-brick: success
gluster>

And that’s that.

Now back to k8s on the second brick

Bring up second node and configure it as for the first
Remember to create a new CSR template like the one for the first brick, and run sudo microk8s refresh-certs
Enable important add-ons I forgot to earlier, on each node:
```
$ microk8s enable dns storage rbac
```
From first node, add node:
```
$ microk8s add-node
```
Copy the given command and paste that into a terminal window to the second brick.
```
$ microk8s join 10.0.0.16:25000/a1ca3395f08c9796977c3acca0689a2c
```

Repeating for the third brick went the same

Pretty much.

I now have a high-availability K8s cluster:

$ microk8s status
microk8s is running
high-availability: yes
  datastore master nodes: 10.0.0.16:19001 10.0.0.17:19001 10.0.0.18:19001
  datastore standby nodes: none

And that’s the cluster.

Next steps

There’s my bricks. The next step is to deploy a few things here that I want and kick at the tires a bit before I go to the next level and invest the time in getting this going on the laptops.

The first thing to do in deploying this on the laptops will be to plan it out. For example, unlike these virtual machines, the laptops are heterogeneous, and particularly for Gluster, have different sized hard drives. Probably I’ll take maximize the disk use for Gluster on the smallest one (that is, everything but the OS partition) and that’s the size of the Gluster partition for the rest.

Choosing the right-sized partitions will be trickier with the laptops as well because I won’t be able to change them later. The laptop’s hard drives are also quite a bit bigger than I expect to need for local storage–so maybe splitting it with CockroachDB might be useful, but I don’t see needing a tonne of space for that either.