Ceph Cluster Raspian (English Version)
Small howto to explain how to install Ceph with Raspian Stretch. Many tutorials exist but I didn’t find one which works well. So here is a small post to describe my installation and the encoutered difficulties.
What do you need
- 3 Raspberry PI 3
- 3 SD cards ( 8Go)
- 5 USB keys (I missed the sixth 😉 )
- USB charger
- One switch
- Network cables
Here is all the good stuff:
Deployed architecture
A Ceph cluster needs some components:
- monitor : supervise the cluster’s health
- osd : where the files are stocked
- mds : usefull only for CephFS
We will not use CephFS, so we will not deploy mds components. You will find more informations in the official documentation: Doc
A best practice is to not install mon and osd on the same machine. But we only have 3 machines so we will not follow this advice. So we will have this architecture :
|Hostname | Function | | +++ | +++ | | Ceph01 | Admin/Monitor/OSD | | Ceph02 | Monitor/OSD | | Ceph03 | Monitor/OSD |
This setup is not the best but it will be enough for our tests. Please note that ceph01 have an admin function. This is due to the fact that we will deploy the cluster from this machine using ceph-deploy.
Setup your machines
About the Raspian installation, I refer you to my article (Installation Raspian ).
There are two very points :
- your machines must be synchronized with NTP (mandatory for the ceph’s cluster and to establish a quorum)
- all your machine’s name must be resolved. So you can install and configure a DNS server or fill the /etc/hosts file on all your machines.
Install Ceph
In order to deploy our cluster, we will use ceph-deploy.
The Stretch packages are too old ( version 0.94 soit Hammer : Ceph Release ).
The Ceph project doesn’t provide (not yet ??) packages for the armhf architecture. So we will grab them in testing.
-
Create /etc/apt/sources.list.d/testing.list :
# echo 'deb http://mirrordirector.raspbian.org/raspbian/ testing main' > /etc/apt/sources.list.d/testing.list
-
Pin testing’s packages to avoid a full upgrade :
# cat << EOF > /etc/apt/preferences.d/ceph Package : * Pin : release a=stable Pin-Priority : 900 Package : * Pin : release a=testing Pin-Priority : 300 EOF
-
Ceph’s packages in testing are injewel version. ceph-deploy isn’t packaged by Raspian, we will grab the package provided by Ceph :
# echo 'deb http://download.ceph.com/debian-jewel/ stretch main' > /etc/apt/sources.list.d/ceph.list
-
Get the repository key:
# wget -q -O - http://download.ceph.com/keys/release.asc | apt-key add -
-
Install all the packages. Be carefull, there is a trap so install them in order (or else your cluster will not work)
# apt-get install libleveldb1v5 ceph-deploy btrfs-progs
# apt-get install -t testing ceph rbd-nbd
Here are the traps :
* Install the good version of *libleveldb1v5* or else Ceph tools will not work
* Install *ceph-deploy* **before** *ceph* to avoid dependencies problems with python's packages.
* Install *btrfs-progs* to format OSD with *btrfs*. It's not mandatory, *XFS* is preferred
* Install *rbd-nbd* because the *rbd* module doesn't exist on Raspbian
Deploy Ceph
Now that Ceph is install on all your machines, we will deploy the cluster using ceph-deploy. For this, we need to follow these prerequisites :
-
Use a dedicated account on each machine. We will give the ceph user created by the ceph’s package.
-
This account must have a full sudo access
-
The admin (ceph01) machine must connect to each other by ssh without password with the ceph account.
-
Create a file to give sudo access to ceph:
# echo 'ceph ALL = (root) NOPASSWD:ALL' > /etc/sudoers.d/ceph
-
On ceph01 (our admin machine), log in as ceph et generate a ssh key :
# su -s /bin/bash - ceph $ ssh-keygen
-
Now change the shell of ceph user on all the machines :
# chsh -s /bin/bash ceph
This will allow the admin machine to connect on each other with ssh (even on itself).
-
As ceph user, copy the ssh key on all machines (even on itself) :
# su -s /bin/bash - ceph $ for h in ceph01 ceph02 ceph03 ; do ssh-copy-id ${h} ; done
-
As ceph user, we will create a work dir for ceph-deploy:
$ mkdir ceph-deploy && cd ceph-deploy $ ceph-deploy new --public-network 192.168.1.0/25 ceph01 ceph02 ceph03 # Of course, adapt the names and the network $ ceph-deploy mon create-initial # Deploy *mon* on all the machines $ ceph admin ceph01 ceph02 ceph03 # Copy conf on all machines
From there, you should have a functionnal cluster but without OSD (so cluster’s health at HEALTH_ERR):
bash $ ceph -s
Now, we need to add OSD to our cluster. For it we will use our usb keys like this:
* ceph01 : 2 keys ( /dev/sda and /dev/sdb )
* ceph02 : 2 keys ( /dev/sda and /dev/sdb )
* ceph03 : 1 key ( /dev/sda )
We will initialize our keys (still as ceph user):
bash $ ceph-deploy disk zap ceph01:sda ceph01:sdb ceph02:sda ceph02:sdb ceph03:sda
One initialized, we will format them. I choose BTRFS, but it’s not mandatory. By default it will be XFS :
bash $ ceph-deploy osd prepare --fs-type btrfs ceph01:sda ceph01:sdb ceph02:sda ceph02:sdb ceph03:sda
This command will create two partitions on each key. One for the data and one for the journal.
-
Then we activate them:
$ ceph-deploy osd activate ceph01:/dev/sda1:/dev/sda2 ceph01:/dev/sdb1:/dev/sdb2 ceph02:/dev/sda1:/dev/sda2 ceph02:/dev/sdb1:/dev/sdb2 ceph03:/dev/sda1:/dev/sda2
-
Now our cluster should be up and in a good shape:
$ ceph -s cluster 2a6de943-36d5-40bb-8c16-fb39b71846c0 health HEALTH_OK monmap e2: 3 mons at {ceph01=192.168.1.37:6789/0,ceph02=192.168.1.38:6789/0,ceph03=192.168.1.39:6789/0} election epoch 68, quorum 0,1,2 ceph01,ceph02,ceph03 osdmap e90: 5 osds: 5 up, 5 in flags sortbitwise,require_jewel_osds pgmap v16247: 64 pgs, 1 pools, 0 bytes data, 1 objects 1169 MB used, 45878 MB / 48307 MB avail 64 active+clean client io 13141 B/s rd, 1812 B/s wr, 15 op/s rd, 43 op/s wr
-
Finish:
-
In the work dir of ceph-deploy (usually /var/lib/ceph/ceph-deploy), you will find a ceph.conf file. We need to add these 2 lines:
[client] admin_socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok ``` And then, do (as user ceph) :
```bash
$ ceph admin ceph01 ceph02 ceph03 # on redéploie la conf ``` It’s to avoid socket’s conflict.
-
By default (??), the mon service is not activated by systemd. So if you reboot your machines, the cluster will stop work. We must enable it on each machine:
systemctl enable ceph-mon.target
```
-
-
Change back the shell for the ceph user (on each machine):
# chsh -s /bin/false ceph
And now ??
We will verify that everthing works well. As said before, the rbd module doesn’t exist on Raspbian so we will use the rbd-nbd package.
-
View pools:
bash # ceph osd lspools 0, rbd
By default, rbd use the rbd pool -
Create a new pool:
```bash # ceph osd pool create containers 256 ```
-
Create an “objet”:
# rbd create -p containers --size 3G test
-
Check if everything is ok :
# rbd -p containers ls test # rbd -p containers info test rbd image 'test': size 3072 MB in 768 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.31d6a2ae8944a format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags:
-
Map it :
# rbd-nbd map containers/test /dev/nbd0
-
Checks:
Use fdisk :
# fdisk -l /dev/nbd0 Disk /dev/nbd0: 3 GiB, 3221225472 bytes, 6291456 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes
Format it:
# mkfs.btrfs /dev/nbd0 btrfs-progs v4.7.3
See http://btrfs.wiki.kernel.org for more information.
Detected a SSD, turning off metadata duplication. Mkfs with -m dup if you want to force metadata duplication.
Performing full device TRIM (1.00GiB) …
Label: (null)
UUID:
Node size: 16384
Sector size: 4096
Filesystem size: 3.00GiB
Block group profiles:
Data: single 8.00MiB
Metadata: single 8.00MiB
System: single 4.00MiB
SSD detected: yes
Incompat features: extref, skinny-metadata
Number of devices: 1
Devices:
ID SIZE PATH
1 3.00GiB /dev/nbd0
```
Mount it and write:
```bash
# mount /dev/nbd0 /mnt && cd /mnt && echo test > test
```
-
Show the mapped devices :
# rbd-nbd list-mapped /dev/nbd0
-
Delete a device:
# rbd-nbd unmap /dev/nbd0 && rbd rm test rbd-nbd: the device is not used Removing image: 100% complete...done.