Add LVM cache to existing LVM RAID1 mirror

The new HP Microserver gen8 is configured with Proxmox.

Proxmox is installed on a RAID1 LVM that runs on two uSD cards: on in the uSD card slot integrated into the motherboard, the other via a USB-to-uSD adapter in the internal USB A connector on the motherboard.
I totally forgot about it, until I realized why apt dist-upgrade was running at a snails pace.

For the rest there is little remarkable about the performance of the server. I do worry a bit about writing log records, and I did not configure any swap on the SD cards.

To alleviate the pressure on the memorycards, I add a single GB of LVM cache on SSD (on a total size of 18 GB for each of the mirrored partitions).

Steps to perform:

  • SSD partitioning
    • fdisk /dev/sdh
    • for a new/unused medium: g voor GPT partitielabel
    • n for a new partition, t for type lvm, w to write and save the configuration.
  • Earmark the partitions for LVM-usage
    • pvcreate /dev/sdh1
  • Add the physical volume to the root partition’s volume group
    • vgextend usbsdraid /dev/sdh1
  • Add the future caching device to the existing volume group
    • lvcreate -vn cache_mt_usbsdraid -l254 mt_usbsdraid /dev/sdh1
  • Convert the volume to a cached volume, indicating the caching volume
    • lvconvert --type cache --cachepool cache_mt_usbsdraid /mt_usbsdraid mt_prox_sys
  • IF THIS IS YOUR BOOT VOLUME : REGENERATE YOUR GRUB CONFIG AND IMAGE
    • I forgot about this, and had to revert the changes on the next power outage, as the system had become unbootable.

That is all. View the result by calling lvs mt_usbsdraid:

# lvs mt_usbsdraid
  LV          VG           Attr       LSize  Pool                       Origin              Data%  Meta%  Move Log Cpy%Sync Convert
  mt_prox_sys mt_usbsdraid Cwi-aoC--- 18.00g [cache_mt_usbsdraid_cpool] [mt_prox_sys_corig] 0.04   2.10            0.00     

Print this entry

Containers do have internet, Proxmox does not

I was unable to upgrade Proxmox. It turned out that internet access was lacking. More specific: the whole lan, with the exception of (the internal side of) the router was reachable. On closer inspection PBS, the backup server, had the same problem.

Reaching those servers from the router was not possible either. In the mean time the containers were accessible from the Internet, and backups were sent as usual.

After checking DNS, dynamic and static IP, bonds and bridges, ARP came in to view as the culprit. arping gave timeouts on the IPs of these servers, while arping on the hostname gave the right MAC but the wrong IP.

Cause of this all was the static ARP-entries that I set for a couple of machines, to have the MAC at hand for wake-on-lan.

De boosdoener: een statisch ARP record

After unchecking the check boxes, applying the changes and flushing the ARP table, everything started working immediately!

Print this entry

LVM extend, and grow ext4 file system

First run in test mode (-t)

root@linhovo:~# lvextend /dev/mapper/data-linh /dev/sda3 -tvrL +30G
TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
Executing: /sbin/fsadm --dry-run --verbose check /dev/data/linh
fsadm: "ext4" filesystem found on "/dev/mapper/data-linh".
fsadm: Skipping filesystem check for device "/dev/mapper/data-linh" as the filesystem is mounted on /home/linh
/sbin/fsadm failed: 3
Test mode: Skipping archiving of volume group.
Extending logical volume data/linh to <285.58 GiB Size of logical volume data/linh changed from <255.58 GiB (65428 extents) to <285.58 GiB (73108 extents). Test mode: Skipping backup of volume group. Logical volume data/linh successfully resized. Executing: /sbin/fsadm --dry-run --verbose resize /dev/data/linh 299450368K fsadm: "ext4" filesystem found on "/dev/mapper/data-linh". fsadm: Device "/dev/mapper/data-linh" size is 274424922112 bytes fsadm: Parsing tune2fs -l "/dev/mapper/data-linh" fsadm: Resizing filesystem on device "/dev/mapper/data-linh" to 306637176832 bytes (66998272 -> 74862592 blocks of 4096 bytes)
fsadm: Dry execution resize2fs /dev/mapper/data-linh 74862592

Notice ‘/sbin/fsadm failed: 3’, because of a mounted file system. Unmount it:

root@linhovo:~# umount /home/linh

Now run vgextend without -t, to actually have the logical volume and the file system resized:

root@linhovo:~# lvextend /dev/mapper/data-linh /dev/sda3 -tvrL +30G
TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
Executing: /sbin/fsadm --dry-run --verbose check /dev/data/linh
fsadm: "ext4" filesystem found on "/dev/mapper/data-linh".
fsadm: Filesystem has not been checked after the last mount, using fsck -f
fsadm: Dry execution fsck -f -p /dev/mapper/data-linh
Test mode: Skipping archiving of volume group.
Extending logical volume data/linh to <285.58 GiB Size of logical volume data/linh changed from <255.58 GiB (65428 extents) to <285.58 GiB (73108 extents). Test mode: Skipping backup of volume group. Logical volume data/linh successfully resized. Executing: /sbin/fsadm --dry-run --verbose resize /dev/data/linh 299450368K fsadm: "ext4" filesystem found on "/dev/mapper/data-linh". fsadm: Device "/dev/mapper/data-linh" size is 274424922112 bytes fsadm: Parsing tune2fs -l "/dev/mapper/data-linh" fsadm: Resizing filesystem on device "/dev/mapper/data-linh" to 306637176832 bytes (66998272 -> 74862592 blocks of 4096 bytes)
fsadm: Dry execution resize2fs /dev/mapper/data-linh 74862592
root@linhovo:~# lvextend /dev/mapper/data-linh /dev/sda3 -vrL +30G
Executing: /sbin/fsadm --verbose check /dev/data/linh
fsadm: "ext4" filesystem found on "/dev/mapper/data-linh".
fsadm: Filesystem has not been checked after the last mount, using fsck -f
fsadm: Executing fsck -f -p /dev/mapper/data-linh
fsck from util-linux 2.33.1
linh: 74099/16752640 files (5.8% non-contiguous), 63802054/66998272 blocks
Archiving volume group "data" metadata (seqno 4).
Extending logical volume data/linh to <285.58 GiB Size of logical volume data/linh changed from <255.58 GiB (65428 extents) to <285.58 GiB (73108 extents). Loading table for data-linh (254:3). Suspending data-linh (254:3) with device flush Resuming data-linh (254:3). Creating volume group backup "/etc/lvm/backup/data" (seqno 5). Logical volume data/linh successfully resized. Executing: /sbin/fsadm --verbose resize /dev/data/linh 299450368K fsadm: "ext4" filesystem found on "/dev/mapper/data-linh". fsadm: Device "/dev/mapper/data-linh" size is 306637176832 bytes fsadm: Parsing tune2fs -l "/dev/mapper/data-linh" fsadm: Resizing filesystem on device "/dev/mapper/data-linh" to 306637176832 bytes (66998272 -> 74862592 blocks of 4096 bytes)
fsadm: Executing resize2fs /dev/mapper/data-linh 74862592
resize2fs 1.44.5 (15-Dec-2018)
Resizing the filesystem on /dev/mapper/data-linh to 74862592 (4k) blocks.
The filesystem on /dev/mapper/data-linh is now 74862592 (4k) blocks long.

Check that the space is added. Some part of the system is smart enough to translate the mount point of the unmounted partition to the right partition:

root@linhovo:~# df -h /home/linh
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/data-home 58G 17G 39G 31% /home

UPDATE: I was quite surprised the free space for the file system was found even when not mounted. Upon rereading, I notice that it actually is not the case. The amount of free space matched my expectations, so I glanced over the details.

What actually is shown, is df for the partition on which the mount point /home/linh resides.

Print this entry

Fritz!Box configuration editing – as text

AVM offers a friendly user interface with Fritz!OS for their Fritz!Box’es to unleash a nice range of features. Many things are possible, though some things take quite a few clicks.

One of the things that take many clicks, is arranging port fortwarding for hosts on the LAN.

The easiest way, after a fashion, is to make a backup of the configuration, make changes and import the file again. The difficulty here is that the file is checksummed, and Fritz refuses to read the configuration.

While the ages, solutions have been written in various languages. I use a python script, fritzchecksum, written by Mementum/DRe.

It is an pip-installable module, pip install fritzchecksum, takes a Fritz-file as input and returns the old and new checksum for easy search and replace. The program supports automatic editing of the file as well, I have always replaced by hand.

For future reference, the zip contains the current download from github. Should not be necessary to download it here, but you never know!

Print this entry

Open vSwitch on Proxmox

Since Open vSwitch (ovs) and Linux bridging can’t mix, it is advised to switch to ovs before creating a whole lot of bridges.

# apt install openvswitch-switch

In networking of the node, I deleted vmbr0 of type=linux bridge, and added a new bridge vmbr0 of type ovs bridge. Proxmox likes to reboot after that, perhaps a restart of networking had been enough. Only restarting openvswitch-switch was not enough.

A presentation on an Open vSwitch conference suggested to edit two files:

  • /etc/default/lxc-net
    • set USE_LXC_BRIDGE=”false”
  • /etc/lxc/default.conf
    • set lxc.net.0.link = ovsbr1 ; would be vmbr0 in my case

The first is already set in /etc/default/lxc. The file lxc-net does not exist (though it is referenced in the lxc file).

The second file is almost empty, it has only the line ‘lxc.net.0.type = empty’.

Adding a new container uses vmbr0 by default; now I don’t know whether that is because I named the new bridge the same as the old one, or because OVS is being used by default.

Anyway: the container asked for a DHCP lease and received it from the router.

Print this entry

Proxmox first impressions

Once rebooted after installation I visited the IP of the server at port 8006. Unable to connect. Now what?

SSH does answer, a quick search for ‘proxmox installation complete no answer on 8006’ turned up someone without pve-manager installed. I did have it installed, but did not see proxmox repositories in /etc/apt/sources.list.

Servethehome has a checklist for proxmox 6, I took the pve-no-subscribtion repository from there:

deb http://download.proxmox.com/debian buster pve-no-subscription

Apt update complains about missing keys for jessie. Not a problem for my installation as such, but might break something that does not expect warnings (even a simple apt update && apt upgrade -y will halt), so add the key:

# wget http://download.proxmox.com/debian/key.asc
http://download.proxmox.com/debian/key.asc
Resolving download.proxmox.com (download.proxmox.com)... 2a01:7e0:0:424::249, 212.224.123.70
Connecting to download.proxmox.com (download.proxmox.com)|2a01:7e0:0:424::249|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1703 (1.7K) [application/octet-stream]
Saving to: ‘key.asc’
key.asc 100%[===========================>] 1.66K --.-KB/s in 0s
# apt-key add key.asc
OK

Retrying the web interface. Still no go: “the connection was reset”. Hmm… That is no ‘not found’, should I perhaps take the trouble of manually pointing at the https-protocol? Yes, that did it, add https:// in front of the IP and log in.

Shiny interface, just like in the screenshots 🙂

The summary view in the second-level tree-menu will stay active while clicking through the first-level tree-menu on the left hand, with the datacenter and node list. It shows different statistics and in a different representation (meters or graphs), depending on the subject, which is helpful in recognizing and remembering what I am looking at.

I now realize that I do have close to 30GB of space for the basic install (my 15GB * 2 for copies = 2), but no separate partition for /var. I’ll let it be, keep an eye on the level of free space and add space when needed. Reinstalling based on Debian Buster is well-documented, but with no experience with proxmox I’ll keep the installation close to default.

To see how much of the assigned space is used in which way, I compared the different views and some relations are not directly clear to me. I expected around 30GB at some level, and 15GB at another. What I actually see, is:

  • Bare sectors, fdisk: 61863937 * 512 ~ 29,5 GB
  • At ‘datacenter’ level: 2.2GB of 54.0GB used
  • At node level: 2.2GB of 28.0GB used
  • Two storage objects:
    • local: 2.2GB of 28.0GB used
    • local-ZFS: 0 of 25.8GB used

I can see the relation between 28 and 54, and between 28 and 25.8, but not between 54GB on the datacenter and 29.5GB actual usage on disk.

After defining ‘copies = 2’, I had expected all downstream values to be half of what I see here. Some thinking further: is the read value of fdisk is inside the node, not on the actual hardware? Then I would not see the total size of the SSD. I posted a question on the proxmox-forum, awaiting moderation and a reaction.

Print this entry

Proxmox

Once I decided on visualization and settled on Proxmox as the host platform for the new NAS, I stranded in decisions to make with regard to the installation.

The first problem was the installation medium itself: I wrote the installer to a uSD card, put it in an adapter to regular SD, that into SD2USB and finally in the USB3 port of the NAS.

Booting OK, installer menu OK, but starting the installer: not OK.

I tried a few times, but the image seemed broken. The kernel would halt somewhere between loading the initial ramdisk and actually starting the installer.

I checked the SHA256-checksum, reflashed on another memory card, used another SD-card-reader: all to no avail. In the end I started trying other USB ports; not just USB3 in front, but first USB3 on the back, and in the end USB2 on the backside.

That worked. The card readers are USB3, but maybe the uSD cards get upset at some point. Anyway, on to the installation.

Target harddisk: SSD or HDD? Which file system? Am I happy with automatic partitioning?

System disk: SSD. I want to be able to spin down as many HDD’s as soon as possible, as long as possible. If I start building the system on one of the spinning disks, it is more difficult to replace it in the future and am I stuck with it. Besides that, the base system should take only tens of GB’s, so there is enough space left for caching, swap and other things that also benefit from increased I/O.

Which file system? As long as I can remember, I have used LVM, the last ten years in combination with EXT4 on a md-stacks. It is very flexible and offers live resizing of nearly anything.

Proxmox does not do LVM out of the box, but it does offer ZFS (ZoL, ZFS on Linux). I have been postponing ZFS usage for a couple of years now, and am still hesitant to take the plunge. How many disks? How many pools? What size? Shit, non-ECC-RAM; and how about mirrors with differently sized disks? What about TRIM on the SSD?

I started reading about ZFS on a single device pool, retraced my steps and considered a pool with SSD and spinning disk mirrored. I found one viable option is to combine the 1TB spinning disk with the SSD for some reasons:

  • It is the only non-SMR-disk I have available at the moment;
  • It is similar in size to the 1TB SSD;
  • I am not committed to keeping either the 4TB disk or the 6TB disk in the machine:
    • I might take one or the other out and use it in a USB case for external back up
    • I might swap either for a disk the size of the other, to create another mirrored set.

I hope that the pool will be smart enough not to spin up the HDD for read, that would nullify the benefit of the non-spinning SSD. If that is the case, I’ll investigate how to make a single device pool out of it.

The other option is running the pool on a single device with copies = 2; I’d use double the space, but there is no spinning disk involved.

Then there’s the question of partitioning. Just the root partition could do with some 10GB. I’d put at least /var on a separate partition, so that log files and temporary files don’t cause / to fill to 100%, some 3GB. ZFS copies = 2 means some 30GB. With swap equal to RAM is another 32GB, and then a couple of GB for ZFS ZIL. That leaves roughly 900GB of unpartitoned space.

It is all a bit theoretical up front; I hesitate to just take the defaults and see down the road. Those times I did that, I regretted it later on. Now in particular, with the quad-level flash disk and less than 400 rewrites per cell (360, according to Samsung), I prefer not to start on the wrong footing. Anyway, it can’t be helped so I just have to bite the bullet. All things considered, I take the SSD-only way for the initial installation.

Phew, that took just four nights to get through the first screen of the installer!

That was the single choice to make then; from there to installation complete and reboot took about five minutes.

Print this entry

A new NAS

For my birthday I got parts for a new server/NAS, complete up to a 1-TB-SSD. After adding some HDD’s that where not in use elsewhere, the hardware is like:

  • Intel J5005, 4 single thread cores at 2GHz, give or take
  • 32GB of DDR4 memory
  • 1 TB of SSD storage
  • 11 TB of spinning storage, spread over
    • 6TB SMR
    • 4TB SMR
    • 1TB regular PMR
  • 1GBit ethernet
  • 1 PCIe 2x connection

The total box should run at 5W or less when idle, with HDD’s powered spinned down. At boot, when the disks spin up, peak usage could hit 100W, while active use with spinning disks should cost about 20W.

All is powered by a 90W power brick and a 120W PicoPSU; no software is installed yet, so actual measurements have to wait.

The server comes just in time: the current home server is running out of both disk space and SATA ports. The NAS will have multiple roles:

  • Provide space for photos and videos;
  • Provide a backup target for laptops;
  • Replace a host of ARM-powered Yunohosts
  • Provide reverse proxying for Yunohosts that will not get virtualized and Yunohosts that are in the household temporarily (while being installed or while waiting for an appointment with the host-household)
  • Domotica-related things (suse.AI, sensors, remote switching), VPN, firewall, the list goes on!

I am somewhat nervous about running backups and web servers on the same hardware, but after playing with virtualization ‘on premise’ and at hosting providers, I feel confident it is a viable combination.

As the NAS is a step up from the old one memory wise, virtualization should run fine. Still, 32GB is not a huge amount, so my first thoughts went to LXC and Docker. After diving into stories of others’ installations, I found Proxmox.

There are some caveats; the advanced options of the Debian installer are not available (option: install Debian, add Proxmox), being bound to the decisions the Proxmox team makes with regard to development of the platform and, unhelpful enough: root-login needs to be enabled.

Most things I read about Proxmox are positive. Management of storage space is flexible, LXC hosts and VM’s are easily managed and there is a large community. It goes without saying that the platform is Free/Libre Open Source software. All in all, the benefits made me prepare an installation medium.

Not knowing exactly what I want, what I need and what the result will be in the long run (software maintenance, storage options), made the installation less than straightforward.

Print this entry

Fediverse with Nextcloud/nginX on Yunohost deel II

The result of part I was being able to find users on the same system. Nice, but not the goal. Over the past weeks I spent some evenings glancing over the problem, hoping to read a post with a nice oneliner as a solution. No such luck.

Today I spent more time than I hoped, but did find part of the solution. The result is now that the webfinger testtool returns a result:

https://webfinger.net/lookup/?resource=wbk%40online.osba.nl

returns

Request log

18:20:13 Looking up WebFinger data for acct:wbk@online.osba.nl
18:20:13 GET https://online.osba.nl/.well-known/webfinger?resource=acct%3Awbk%40online.osba.nl

JSON Resource Descriptor (JRD)

{
  "subject": "acct:wbk@online.osba.nl",
  "links": [
    {
      "rel": "self",
      "type": "application/activity+json",
      "href": "http://online.osba.nl/nextcloud/nextcloud/index.php/apps/social/@wbk"
    },
    {
      "rel": "http://ostatus.org/schema/1.0/subscribe"
    }
  ]
}

It used to return an error, 404, not found.

More details are at the Yunohost forum, but the important bit is the configuration and where to put it. The config file is the Nginx-configuration at

/etc/nginx/conf.d/online.osba.nl.conf

There are two servers defined, one on port 80 and one on port 443. The first one had a preconfigured redirect, while there was none in the secure section. That was strange, but it didn’t raise any alarms with me. It turns out that the redirect has to be available over TSL as well, and that it has to be a combined redirect/rewrite.

The combined stanza looks like:

location = /.well-known/webfinger {
  rewrite ^ https://online.osba.nl/nextcloud/public.php?service=webfinger&$1 last; # $1 will use the first parameter (?resource=…)
}

Put this bit between the include-stanza’s and the log-definitions.

Even though the test at webfinger.org succeeds, I have no such succes when contacting this Nextcloud from a Nextcloud on another Yunohost.

To be continued…

Print this entry

Not enough swap

Installing lxml (for Synapse/Matrix) gave problems on one of the Yunohosts ended in an error all the time.

The error indicated not enough memory available at some point. The machines are fairly limited in RAM, so that explains. Extra swap space to the rescue!

Armbian creates a swap partition and resizes the system partition (and the file system) to fit the uSD-card during first boot, so there is no space available to increase swap space.

Unfortunately, shrinking an ext4 file system is not as trivial as growing/extending it. It helps that the uSD-card can easily be put in another system, avoiding the need to resize a (semi-) online root partition. One bit of information made me rethink my shrink-root, grow-swap strategy: ostensibly a lot of writing might go on while resizing:

The filesystems considered here tend to disperse data throughout the underlying block device, so a large amount of copying may be needed even if the filesystem has never been filled beyond the size it is being reduced to. Do not start resize2fs unless you are confident you can allow it to finish. You can request a progress bar using the -p option.

That is not what I expected or had taken in account.

Since I do expect it to be a temporary measure, I decided to create a swap file instead of a swap partition. Arch, as usual, has a very helpful page on their wiki. There are some warnings against using swap files for other file system types especially with non-contigous swap files, but ext4 has no special mention.

The steps for creating and using a swap file:

fallocate -l 512M /swapfile && chmod 600 /swapfile && mkswap /swapfile && swapon /swapfile

I used a 1GB file, and when lxml still wouldn’t compile, I added another gig of swap.

“yunohost app install synapse” is running. If it ends in an error once more, I need to check whether the cause actually is still (out of) memory-related.


Print this entry