Ceph Cluster Raspian (English Version)

Fri, Nov 17, 2017
7-minute read

Warning: This post is over 365 days old. The information may be out of date.

Small howto to explain how to install Ceph with Raspian Stretch. Many tutorials exist but I didn’t find one which works well. So here is a small post to describe my installation and the encoutered difficulties.

What do you need

3 Raspberry PI 3
3 SD cards ( 8Go)
5 USB keys (I missed the sixth 😉 )
USB charger
One switch
Network cables

Here is all the good stuff:

equipment

Deployed architecture

A Ceph cluster needs some components:

monitor : supervise the cluster’s health
osd : where the files are stocked
mds : usefull only for CephFS

We will not use CephFS, so we will not deploy mds components. You will find more informations in the official documentation: Doc

A best practice is to not install mon and osd on the same machine. But we only have 3 machines so we will not follow this advice. So we will have this architecture :

This setup is not the best but it will be enough for our tests. Please note that ceph01 have an admin function. This is due to the fact that we will deploy the cluster from this machine using ceph-deploy.

Setup your machines

About the Raspian installation, I refer you to my article (Installation Raspian ).

There are two very points :

your machines must be synchronized with NTP (mandatory for the ceph’s cluster and to establish a quorum)
all your machine’s name must be resolved. So you can install and configure a DNS server or fill the /etc/hosts file on all your machines.

Install Ceph

In order to deploy our cluster, we will use ceph-deploy.

The Stretch packages are too old ( version 0.94 soit Hammer : Ceph Release ).

The Ceph project doesn’t provide (not yet ??) packages for the armhf architecture. So we will grab them in testing.

Create /etc/apt/sources.list.d/testing.list :

# echo 'deb http://mirrordirector.raspbian.org/raspbian/ testing main' > /etc/apt/sources.list.d/testing.list

Pin testing’s packages to avoid a full upgrade :

# cat << EOF > /etc/apt/preferences.d/ceph
	Package : *
	Pin : release a=stable
	Pin-Priority : 900

	Package : *
	Pin : release a=testing
	Pin-Priority : 300
EOF

Ceph’s packages in testing are injewel version. ceph-deploy isn’t packaged by Raspian, we will grab the package provided by Ceph :
```
# echo 'deb http://download.ceph.com/debian-jewel/ stretch main' > /etc/apt/sources.list.d/ceph.list
```

Get the repository key:

# wget -q -O - http://download.ceph.com/keys/release.asc | apt-key add -

Install all the packages. Be carefull, there is a trap so install them in order (or else your cluster will not work)
```
# apt-get install libleveldb1v5 ceph-deploy btrfs-progs
```
```
# apt-get install -t testing ceph rbd-nbd
```

Here are the traps :

* Install the good version of *libleveldb1v5* or else Ceph tools will not work
* Install *ceph-deploy* **before** *ceph* to avoid dependencies problems with python's packages.
* Install *btrfs-progs* to format OSD with *btrfs*. It's not mandatory, *XFS* is preferred
* Install *rbd-nbd* because the *rbd* module doesn't exist on Raspbian

Deploy Ceph

Now that Ceph is install on all your machines, we will deploy the cluster using ceph-deploy. For this, we need to follow these prerequisites :

Use a dedicated account on each machine. We will give the ceph user created by the ceph’s package.
This account must have a full sudo access
The admin (ceph01) machine must connect to each other by ssh without password with the ceph account.

Create a file to give sudo access to ceph:

# echo 'ceph ALL = (root) NOPASSWD:ALL' > /etc/sudoers.d/ceph

On ceph01 (our admin machine), log in as ceph et generate a ssh key :
```
# su -s /bin/bash - ceph 
$ ssh-keygen
```
Now change the shell of ceph user on all the machines :
```
# chsh -s /bin/bash ceph
```

This will allow the admin machine to connect on each other with ssh (even on itself).

As ceph user, copy the ssh key on all machines (even on itself) :

# su -s /bin/bash - ceph
$ for h in ceph01 ceph02 ceph03 ; do ssh-copy-id ${h} ; done

As ceph user, we will create a work dir for ceph-deploy:

$ mkdir ceph-deploy && cd ceph-deploy
$ ceph-deploy new --public-network 192.168.1.0/25 ceph01 ceph02 ceph03 # Of course, adapt the names and the network
$ ceph-deploy mon create-initial # Deploy *mon* on all the machines
$ ceph admin ceph01 ceph02 ceph03 # Copy conf on all machines

From there, you should have a functionnal cluster but without OSD (so cluster’s health at HEALTH_ERR): bash $ ceph -s Now, we need to add OSD to our cluster. For it we will use our usb keys like this: * ceph01 : 2 keys ( /dev/sda and /dev/sdb ) * ceph02 : 2 keys ( /dev/sda and /dev/sdb ) * ceph03 : 1 key ( /dev/sda )

We will initialize our keys (still as ceph user): bash $ ceph-deploy disk zap ceph01:sda ceph01:sdb ceph02:sda ceph02:sdb ceph03:sda

One initialized, we will format them. I choose BTRFS, but it’s not mandatory. By default it will be XFS : bash $ ceph-deploy osd prepare --fs-type btrfs ceph01:sda ceph01:sdb ceph02:sda ceph02:sdb ceph03:sda

This command will create two partitions on each key. One for the data and one for the journal.

Then we activate them:

$ ceph-deploy osd activate ceph01:/dev/sda1:/dev/sda2 ceph01:/dev/sdb1:/dev/sdb2 ceph02:/dev/sda1:/dev/sda2 ceph02:/dev/sdb1:/dev/sdb2 ceph03:/dev/sda1:/dev/sda2

Now our cluster should be up and in a good shape:

$ ceph -s
cluster 2a6de943-36d5-40bb-8c16-fb39b71846c0
 health HEALTH_OK
 monmap e2: 3 mons at {ceph01=192.168.1.37:6789/0,ceph02=192.168.1.38:6789/0,ceph03=192.168.1.39:6789/0}
        election epoch 68, quorum 0,1,2 ceph01,ceph02,ceph03
 osdmap e90: 5 osds: 5 up, 5 in
        flags sortbitwise,require_jewel_osds
  pgmap v16247: 64 pgs, 1 pools, 0 bytes data, 1 objects
        1169 MB used, 45878 MB / 48307 MB avail
              64 active+clean
client io 13141 B/s rd, 1812 B/s wr, 15 op/s rd, 43 op/s wr

Finish:
- In the work dir of ceph-deploy (usually /var/lib/ceph/ceph-deploy), you will find a ceph.conf file. We need to add these 2 lines:
[client] admin_socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok ``` And then, do (as user ceph) :
```
  ```bash
```
$ ceph admin ceph01 ceph02 ceph03 # on redéploie la conf ``` It’s to avoid socket’s conflict.
- By default (??), the mon service is not activated by systemd. So if you reboot your machines, the cluster will stop work. We must enable it on each machine:
systemctl enable ceph-mon.target
```
  ```
```
Change back the shell for the ceph user (on each machine):
```
# chsh -s /bin/false ceph
```

And now ??

We will verify that everthing works well. As said before, the rbd module doesn’t exist on Raspbian so we will use the rbd-nbd package.

View pools: bash # ceph osd lspools 0, rbd By default, rbd use the rbd pool

Create a new pool:

 ```bash
 # ceph osd pool create containers 256
 ```

Create an “objet”:

# rbd create -p containers --size 3G test

Check if everything is ok :

# rbd -p containers ls
test
# rbd -p containers info test
rbd image 'test':
size 3072 MB in 768 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.31d6a2ae8944a
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:

Map it :

# rbd-nbd map containers/test
/dev/nbd0

Checks:

Use fdisk :

# fdisk -l /dev/nbd0
Disk /dev/nbd0: 3 GiB, 3221225472 bytes, 6291456 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Format it:

# mkfs.btrfs /dev/nbd0
btrfs-progs v4.7.3

See http://btrfs.wiki.kernel.org for more information. Detected a SSD, turning off metadata duplication. Mkfs with -m dup if you want to force metadata duplication. Performing full device TRIM (1.00GiB) … Label: (null) UUID:
Node size: 16384 Sector size: 4096 Filesystem size: 3.00GiB Block group profiles: Data: single 8.00MiB Metadata: single 8.00MiB System: single 4.00MiB SSD detected: yes Incompat features: extref, skinny-metadata Number of devices: 1 Devices: ID SIZE PATH 1 3.00GiB /dev/nbd0 ```

Mount it and write:

```bash
# mount /dev/nbd0 /mnt && cd /mnt && echo test > test 
```

Show the mapped devices :
```
# rbd-nbd list-mapped
/dev/nbd0
```

Delete a device:

# rbd-nbd unmap /dev/nbd0 && rbd rm test
rbd-nbd: the device is not used
Removing image: 100% complete...done.

Raspian Ceph raspberry PI