Close

ZFS deduplication, loop device, sparse file, LVM, and tmpfs

// fun with ZFS deduplication, loop device, sparse file, LVM, and tmpfs
// Fun/interesting and practical uses of the above. A little
// mini(/micro?)-project / demonstration

// Lines starting with // are my comments/explanations – in case that
// isn’t obvious
// LVM – Logical Volume Manager
// http://www.rawbw.com/~mp/linux/lvm/whatis.html
// http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)
// tmpfs http://en.wikipedia.org/wiki/Tmpfs
// “tmpfs is a common name for a temporary file storage facility on
// many Unix-like operating systems. It is intended to appear as a
// mounted file system, but stored in volatile memory instead of a
// persistent storage device.”
// I’ll add to that, that the “stored in volatile memory” isn’t
// necessarily limited to RAM, but, e.g., may also include swap. E.g.
// on Linux, tmpfs will for the most part preferentially use RAM, but if
// memory gets tight, it will also use swap – as needed and available.
// sparse file http://en.wikipedia.org/wiki/Sparse_file
// On Unix/BSD/Linux, a sparse file is a file with one or more blocks
// in the file which are logically present, but not physically
// allocated. Such blocks are read as all ASCII nulls, and only
// become allocated when they are written. Such blocks are created
// when file is open for write or append, and a seek is done into or
// over blocks that have not yet been written to. Native
// Unix/BSD/Linux filesystems support sparse files, but not all
// supported filesystem types support sparse files (e.g. FAT does
// not).
// ZFS http://en.wikipedia.org/wiki/ZFS
// “ZFS is a combined file system and logical volume manager”
// deduplication http://en.wikipedia.org/wiki/Data_deduplication
// “deduplication is a specialized data compression technique for
// eliminating duplicate copies of repeating data”
// loop device http://en.wikipedia.org/wiki/Loop_device
// “loop device” … “is a pseudo-device that makes a file accessible
// as a block device”

// I’m not going to give a whole overview of these technologies here,
// nor point out all, or even necessarily many, of their key advantages
// and disadvantages. Here I’ll just show examples using them, and
// point out some of their key features/benefits.

// So, setting the stage a bit. With USB and (micro)SD(HC) devices,
// when I newly obtain them, I generally save a compressed full image of
// them. Why? In case I ever want to lay down precisely the original
// format they had, I can again do so. As this format and block layout
// (most notably alignment), is generally fairly well optimized for the
// media. Improper block alignment can negatively impact both
// performance and life of the media. I also do this as a quite
// sure-fire way of replicating the file format and layout (which I may
// or may not otherwise have a highly simple way of reliably doing,
// particularly for any and all arbitrary filesystem formats,
// construction, and layout), and it also backs up any and all actual
// data on the device, in case I ever want that back at some point.

// So, here, have bit of example of such collection.
# pwd
/home/r/root/projects/reference_images
# ls -al
total 29960
drwx—— 3 root root 4096 Sep 19 13:45 .
drwx—— 10 root root 4096 Nov 24 2012 ..
-r——– 1 root root 11194 Apr 14 03:59 Kingston_16G_microSDHC.bz2
-rw——- 1 root root 1576380 Jun 17 07:38 Lexar_8GB.xz
-r——– 1 root root 870 May 14 2012 Oracle_MySQL_1GB.bz2
-r——– 1 root root 7278 Feb 4 2012 PNY_8GB.original.bz2
-rw——- 1 root root 12 Sep 19 02:43 SanDisk_16GB.size_in_bytes
-rw——- 1 root root 28739284 Feb 23 2012 SanDisk_16GB.xz
-r——– 1 root root 12032 Feb 23 2012 microSDHC_16GB.bz2
-r——– 1 root root 255187 Feb 8 2012 pen.bz2
-rw——- 1 root root 4171 Sep 18 13:33 suse.bz2
drwx—— 2 root root 4096 Sep 18 06:21 usb
#

// But I notice SanDisk_16GB.xz seems atypically large – and it’s not a
// matter of the xz compression format (which happened to be smaller
// than bzip2 (bz2), for that particular data set).
// Now, to optimize such compressions, before I actually snag that
// image from the device, I generally mount the filesystem, remove any
// and/or all files present, if I’ve absolutely zero interest in
// retaining them, 100% fill the free filesystem space with large files
// consisting of nothing but ASCII nulls. Why? As, otherwise, some of
// the data on the device – even if not in any files, may be random, or
// not so efficiently compressed. Whereas a lot of contiguous ASCII
// nulls compresses very efficiently and tightly. So, I create those
// files. Then I remove them – so they’re not there if/when I restore
// such image, but the blocks thus written are lots of contiguous ASCII
// nulls, so the filesystem image compresses quite well. I then
// unmount that filesystem, and save a compressed image of it.
// So, with the larger SanDisk_16GB.xz file, did I possibly forget to do
// my customary writing of ASCII nulls like that? Possibly, but
// probably not. But in any case, I decide to go through the exercise,
// to see if repeating that process gives me a (much?) smaller resultant
// compressed image.

// But do I have the space to conveniently plop down an approximately
// 16 GiB image?
# vgdisplay tigger
— Volume group —
VG Name tigger
System ID
Format lvm2
Metadata Areas 8
Metadata Sequence No 334
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 44
Open LV 34
Max PV 0
Cur PV 8
Act PV 8
VG Size 148.78 GiB
PE Size 4.00 MiB
Total PE 38088
Alloc PE / Size 34219 / 133.67 GiB
Free PE / Size 3869 / 15.11 GiB
VG UUID vKj9LG-KchE-HO12-uVSx-EyuA-7b88-m6EB3H

#
// No, not really, I’ve only got about 15 GiB free, not 16, and would
// need a bit more anyway, for filesystem overhead – presuming I write
// image to file within filesystem, and not some “raw” (block) device.
// But, that’s not *quite* what I had in mind anyway.

// tmpfs uses RAM and swap. This host has 8 GiB of RAM – not itself
// enough (most of the time running with >~=5 GiB free). But what’s the
// picture with swap?
# swapon -s && free
Filename Type Size Used Priority
/dev/mapper/tigger-swap1 partition 1048572 246800 -1
/dev/mapper/tigger-swap2 partition 1048572 0 -2
/dev/mapper/tigger-swap3 partition 1048572 0 -3
/dev/mapper/tigger-swap4 partition 1048572 4 -4
/dev/mapper/tigger-swap5 partition 1048572 4 -5
/dev/mapper/tigger-swap6 partition 1048572 0 -6
/dev/mapper/tigger-swap7 partition 1048572 0 -7
/dev/mapper/tigger-swap8 partition 1048572 0 -8
total used free shared buffers cached
Mem: 8179656 5088444 3091212 0 176952 1445276
-/+ buffers/cache: 3466216 4713440
Swap: 8388576 246808 8141768
// 8 GiB of swap (in 1G LVM LV pieces), so, RAM + existing swap would
// still not be quite enough, though close.

// Well, exactly how many bytes do I need to write out the full image of
// that file? And I do the nice, just to go a bit easier on the CPU –
// at least if much of anything else wants many of those CPU cycles, as
// the decompression is fairly CPU intensive – but (much) less so than
// the compression. But, whereas decompression might otherwise
// bottleneck on writing to media, and thus not hammer CPU, in this
// case I’m not (yet?) writing to media, so it burns through CPU until
// it’s done.
# nice -19 xz -d SanDisk_16GB.size_in_bytes
# cat SanDisk_16GB.size_in_bytes
16008609792
#
// Yup, … and like disk manufactures, for flash, when they’re talking
// G, they do not mean GiB.
// So, lets plan to (temporarily) add quite a bit more swap – should be
// fairly clear later why, but perhaps you can already guess at this point.
// Already saw the swap names earlier from swapon -s, but peeking again, in
// slightly different way – and here shown without the mapper bit –
// there’s symbolic links ‘n all that – but I didn’t bother to show that
// here.
# ls /dev/tigger/*swap*
/dev/tigger/swap1 /dev/tigger/swap3 /dev/tigger/swap5 /dev/tigger/swap7
/dev/tigger/swap2 /dev/tigger/swap4 /dev/tigger/swap6 /dev/tigger/swap8
// And the names were much earlier picked to be quite clear on what they
// are. Those are all LVs under LVM. And why did I do it like that,
// with LVM and all, rather than the more typical practice of just
// using a partition? Because I can dynamically change it – very
// easily adding or removing swap, on-the-fly – and we’ll soon see
// example of bit of that.
// So, I create an LV, using *all* of the remaining VG space! Yes,
// only to be used for a relatively short temporary bit, so I’m not
// particularly concerned about using all of it and leaving zero free.
// Were there an issue in that short period of time where I needed more
// space in the VG for something else, I could quickly give up that
// additional swap space and return it to the VG. And, again, naming
// clearly to indicate nature of and intent of the LV
# lvcreate -l 3869 -n swap-tmp tigger
Logical volume “swap-tmp” created
# ls /dev/tigger/*swap*
/dev/tigger/swap-tmp /dev/tigger/swap3 /dev/tigger/swap6
/dev/tigger/swap1 /dev/tigger/swap4 /dev/tigger/swap7
/dev/tigger/swap2 /dev/tigger/swap5 /dev/tigger/swap8
// Let’s do something consistent on the swap labeling – what have I
// already got?
# blkid /dev/tigger/*swap*
/dev/tigger/swap1: LABEL=”swap1″ UUID=”8e3b03f0-a84d-4ae8-aa77-cc0defae12e6″ TYPE=”swap”
/dev/tigger/swap2: LABEL=”swap2″ UUID=”98adf266-fe75-4f54-bc15-ad0a0d7d22a6″ TYPE=”swap”
/dev/tigger/swap3: LABEL=”swap3″ UUID=”3ce54a18-8f3e-4ac0-8343-b053ea6fdf6f” TYPE=”swap”
/dev/tigger/swap4: LABEL=”swap4″ UUID=”29e068fe-88d7-4166-b062-8bf6a1c2ae94″ TYPE=”swap”
/dev/tigger/swap5: LABEL=”swap5″ UUID=”1d8ffb30-7f8e-46cc-8f47-6a2c5bba8f17″ TYPE=”swap”
/dev/tigger/swap6: LABEL=”swap6″ UUID=”fdeab129-1983-48f2-804a-6836667054ca” TYPE=”swap”
/dev/tigger/swap7: LABEL=”swap7″ UUID=”1f770a80-ef29-4bc2-994c-be2a07585120″ TYPE=”swap”
/dev/tigger/swap8: LABEL=”swap8″ UUID=”8ad37575-eb4f-4413-9e5b-728317984c8b” TYPE=”swap”
# mkswap -L swap-tmp /dev/tigger/swap-tmp
mkswap: /dev/tigger/swap-tmp: warning: don’t erase bootbits sectors
on whole disk. Use -f to force.
Setting up swapspace version 1, size = 15847420 KiB
LABEL=swap-tmp, UUID=d961e6ff-0630-48b3-9cf7-3da7e9831395
// And activate it and have a peek.
# swapon /dev/tigger/swap-tmp
// and what does our available swap/RAM look like?
# swapon -s && free
Filename Type Size Used Priority
/dev/mapper/tigger-swap1 partition 1048572 246800 -1
/dev/mapper/tigger-swap2 partition 1048572 0 -2
/dev/mapper/tigger-swap3 partition 1048572 0 -3
/dev/mapper/tigger-swap4 partition 1048572 4 -4
/dev/mapper/tigger-swap5 partition 1048572 4 -5
/dev/mapper/tigger-swap6 partition 1048572 0 -6
/dev/mapper/tigger-swap7 partition 1048572 0 -7
/dev/mapper/tigger-swap8 partition 1048572 0 -8
/dev/mapper/tigger-swap–tmp partition 15847420 0 -9
total used free shared buffers cached
Mem: 8179656 5094252 3085404 0 177028 1445332
-/+ buffers/cache: 3471892 4707764
Swap: 24235996 246808 23989188
// About 4.7 G of RAM free (much of which Linux is opportunisticly
// using for buffers/cache until it’s otherwise needed), and about 24 G
// of swap – highly ample now for a potentially much larger tmpfs.
// And, what’s our /tmp filesystem space look like?
# df -k /tmp
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 524288 204 524084 1% /tmp
// Definitely nowhere near 16+ G free, however tmpfs not only can be
// grown dynamically, it can even be shrunk dynamically! So, really no
// problem or issue in this case to temporarily give our tmpfs on /tmp
// much more space – as we can very easily take it back later – even
// while it’s still mounted and all – so long as the space is later
// available in /tmp, we can remove that unused storage from our tmpfs.
// And now that we’ve got about 24 G available in swap (and about 4.7 G
// free in RAM), we bump /tmp up from being 512 MiB in size to 18 GiB in
// size – quite adequate for what we’ll be doing.
# mount -o remount,size=$(expr 18 ‘*’ 1024 ‘*’ 1024 ‘*’ 1024) /tmp
# df -k /tmp && df -h /tmp
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 18874368 532 18873836 1% /tmp
Filesystem Size Used Avail Use% Mounted on
tmpfs 18G 532K 18G 1% /tmp
// And, how many bytes is 17.5 GiB?
# echo ‘17.5*1024*1024*1024’ | bc -l
18790481920.0
# >>/tmp/zdata
# ls -ld /dev/sda
brw-rw—T 1 root disk 8, 0 Jul 17 08:09 /dev/sda
# chown root:disk /tmp/zdata
# chmod ug+rw,+t /tmp/zdata && ls -ld /tmp/zdata
-rw-rw—T 1 root disk 0 Sep 25 12:15 /tmp/zdata
// And how many MiB is 17.5 GiB?
# echo ‘1024*17.5’ | bc -l
17920.0
// Why do I pick that number? Because I’ll create a file, up to
// potentially 17.5 GiB in size, on an 18 GiB filesystem.
// But, I can be *much* more efficient than actually writing out all
// that data – especially to SSD, where the VG, and swap reside.
// So, I’ll bring in a few things to make that much more efficient.
// First, I don’t allocate all the blocks to that file … in fact I
// allocate zero blocks to my file – I just make it logically 17.5 GiB
// in size. This is then a sparse file – not all the blocks for the
// file’s logical size are allocated. This is perfectly legal for Unix
// and Linux – at least for any and all filesystems on such that in fact
// support such. With a sparse file, any blocks not present are read as
// all ASCII nulls, and when written, the blocks are allocated at that
// time (at least presuming they can be allocated and nothing else
// prevents such).
# dd if=/dev/null of=/tmp/zdata bs=$(expr 1024 ‘*’ 1024) seek=17920
0+0 records in
0+0 records out
0 bytes (0 B) copied, 2.5802e-05 s, 0.0 kB/s
# ls -ls /tmp/zdata
0 -rw-rw—T 1 root disk 18790481920 Sep 25 12:16 /tmp/zdata
// Notice in the above, the 0 on the left – that’s the space for the
// blocks … none, thus far, in this case. That -s option to ls shows
// size – but not the logical size, but rather the size from allocated
// blocks – how much space the file is actually consuming (but not
// including some other wee bits of overhead).
# ls -lhs /tmp/zdata
0 -rw-rw—T 1 root disk 18G Sep 25 12:16 /tmp/zdata
// And now, I bring ZFS into the picture. Why? Because one of the
// capabilities it has, is deduplication. My aim/hope here, is with
// deduplication, since I expect a lot of redundancy (many blocks of
// ASCII nulls), I may need nowhere near 16 G physically.
# zpool create pool /tmp/zdata
# ls -lhs /tmp/zdata
1.1M -rw-rw—T 1 root disk 18G Sep 25 12:17 /tmp/zdata
// note above that our ZFS pool is thus far only using 1.1M of space
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
pool 17.4G 94K 17.4G 0% 1.00x ONLINE –
// Note also above, it shows a DEDUPlication ratio – presently at 1 (no
// deduplication yet, as it’s just barely been initialized, and just
// has some metadata there).
// Let’s make a mountpoint (in this case, I don’t want ZFS’s
// default behavior on mount point location)
# (umask 022 && mkdir /tmp/myzfs)
// And now, create a zfs filesystem, with deduplication enabled:
// Note that in many cases, lines starting with “> ” are PS2
// (continuation prompt issued by shell, not literally entered text)
# zfs create -o dedup=verify -o mountpoint=/tmp/myzfs -o setuid=off
> -o devices=off pool/myzfs
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
pool 17.4G 190K 17.4G 0% 1.00x ONLINE –
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
pool 111K 17.1G 21K /pool
pool/myzfs 21K 17.1G 21K /tmp/myzfs
# df -k /tmp/myzfs
Filesystem 1K-blocks Used Available Use% Mounted on
pool/myzfs 17934246 21 17934225 1% /tmp/myzfs
# df -h /tmp/myzfs
Filesystem Size Used Avail Use% Mounted on
pool/myzfs 18G 21K 18G 1% /tmp/myzfs
// So, now we have our approximately 18G filesystem on ZFS in a ZFS pool
// that is on a sparse file on tmpfs.
// Why all the bother with making tmpfs that big? Mostly as
// demonstration and “just in case”. If, for any reason, ZFS needs to
// allocate those physical blocks, tmpfs has the space to handle it.
# ls -al /tmp/myzfs
total 2
drwxr-xr-x 2 root root 2 Sep 25 12:19 .
drwxrwxrwt 21 root root 700 Sep 25 12:20 ..
# pwd
/home/r/root/projects/reference_images
# ls
Kingston_16G_microSDHC.bz2 PNY_8GB.original.bz2 microSDHC_16GB.bz2 usb
Lexar_8GB.xz SanDisk_16GB.size_in_bytes pen.bz2
Oracle_MySQL_1GB.bz2 SanDisk_16GB.xz suse.bz2
// So, lets write out our uncompressed image on that ZFS filesystem
# xz -d /tmp/myzfs/SanDisk_16GB
// In this exercise, for demonstration purposes, we used a pipe and cat
// to prevent xz from creating the file as sparse – which it would have
// otherwise have done by default in this case if we’d merely redirected
// standard output of our xz -d command to ordinary file.
// Now, we see, logically, and physically, *within* the filesystem
// it’s about 15G.
# ls -onsh /tmp/myzfs/SanDisk_16GB
15G -rw——- 1 0 15G Sep 25 12:24 /tmp/myzfs/SanDisk_16GB
// but physically for the entire ZFS pool that contains it …
# ls -onsh /tmp/zdata
70M -rw-rw—T 1 0 18G Sep 25 12:24 /tmp/zdata
// Sweet … only 70M
// A lot of deduplication, as we didn’t write that file out sparse at all
// – though it is on a filesystem who’s backing storage is sparse – but
// that’s independent of the deduplication we see here.
// Looking a bit more closely at the ZFS pool
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
pool 17.4G 50.3M 17.3G 0% 318.06x ONLINE –
// Some serious deduplication going on there
// For comparison, here’s size of our compressed file on an ext3
// filesystem:
# ls -al SanDisk_16GB.??
-rw——- 1 root root 28739284 Feb 23 2012 SanDisk_16GB.xz
# ls -alsh SanDisk_16GB.??
28M -rw——- 1 root root 28M Feb 23 2012 SanDisk_16GB.xz
// not as tight/efficient with ZFS deduplication, but xz compression is
// very high on CPU consumption, whereas ZFS deduplication is lighter
// enough to be very feasible for use on filesystem.
# cd /tmp/myzfs
# ls
SanDisk_16GB
// But that’s an image of a partitioned device, what do those partitions
// look like?
# sfdisk -uS -l *
Disk SanDisk_16GB: cannot get geometry

Disk SanDisk_16GB: 1946 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
SanDisk_16GB1 32 31266815 31266784 c W95 FAT32 (LBA)
SanDisk_16GB2 0 – 0 0 Empty
SanDisk_16GB3 0 – 0 0 Empty
SanDisk_16GB4 0 – 0 0 Empty
// I want to mount that partition, let’s create a mountpoint)
# (umask 022 && mkdir SanDisk_16GB1)
// I want to mount the partition, but it’s an ordinary file, not a block
// device, and the partition does not start at the very beginning of
// that file. What to do? Enter use of loop device, with offset
// option, and (for safety/sanity sake) the sizelimit option
# mount -o loop,nosuid,nodev,offset=$(expr 32 ‘*’
> 512),sizelimit=$(expr 31266784 ‘*’ 512) SanDisk_16GB SanDisk_16GB1
// We now have the filesystem within the “partition” of that image file
// mounted as filesystem
# df -k *1/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/loop2 15625744 52224 15573520 1% /tmp/myzfs/SanDisk_16GB1
# cd *1/
// And we have a look at what’s in it.
# ls
RunClubSanDisk.exe SanDiskSecureAccess club_application
RunSanDiskSecureAccess_Win.exe autorun.inf
// I want to keep those files
# pwd
/tmp/myzfs/SanDisk_16GB1
# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/loop2 15G 51M 15G 1% /tmp/myzfs/SanDisk_16GB1
// But I want to very efficiently compress the rest of the space on the
// filesystem, so I write ASCII nulls to files, which I subsequently
// remove before doing the image compression
# (n=1; while :; do >>/dev/null 2>&1 dd if=/dev/zero of=zero$n
> bs=$(expr 1024 ‘*’ 1024); df -k . | >>/dev/null grep ‘ 0 100%’ &&
> break; n=$(expr $n + 1); done); ls -ons zero?*
4194304 -rwxr-xr-x 1 0 4294967295 Sep 25 12:34 zero1
4194304 -rwxr-xr-x 1 0 4294967295 Sep 25 12:37 zero2
4194304 -rwxr-xr-x 1 0 4294967295 Sep 25 12:39 zero3
2990608 -rwxr-xr-x 1 0 3062382592 Sep 25 12:41 zero4
// Now the filesystem is 100% full – no more free blocks – let’s peek
# df -k /tmp/myzfs/SanDisk_16GB1 /tmp/myzfs
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/loop2 15625744 15625744 0 100% /tmp/myzfs/SanDisk_16GB1
pool/myzfs 33267795 15635440 17632355 47% /tmp/myzfs
// Note that our ZFS filesystem claims it’s used about half it’s space
// … but has it really? Some things about ZFS aren’t fully POSIX,
// etc. compliant, so some things might get slightly confused … e.g.
// df, as, in some cases ZFS has no way to fully and completely
// represent the information in a fully POSIX compliant way, so some
// things may only show as an approximation of reality.
# zpool list && zfs list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
pool 17.4G 57.9M 17.3G 0% 278.21x ONLINE –
NAME USED AVAIL REFER MOUNTPOINT
pool 14.9G 16.8G 21K /pool
pool/myzfs 14.9G 16.8G 14.9G /tmp/myzfs
// Note the now very nice high DEDUPlication ratio shown – 278.21x – way
// up from our starting point of 1 – but a bit from our earlier peek we
// saw of 318.06x

// Note that for our ZFS filesystem, it shows 16.8G available – even
// though it’s about a 17G filesystem, with an about 16G file on it.
// Lots of deduplication going on.
// And the file that holds the entire ZFS pool and its filesystem?
# ls -ons /tmp/zdata
151700 -rw-rw—T 1 0 18790481920 Sep 25 12:42 /tmp/zdata
# ls -onsh /tmp/zdata
149M -rw-rw—T 1 0 18G Sep 25 12:42 /tmp/zdata
// Only about 149 M of physical storage used to hold all of that.
// So, remember, we’re using tmpfs, and had about 4.7 G of free RAM,
// so, that ought be pretty much all in RAM, and not even hardly if at
// all touch swap.
# swapon -s && free
Filename Type Size Used Priority
/dev/mapper/tigger-swap1 partition 1048572 246788 -1
/dev/mapper/tigger-swap2 partition 1048572 0 -2
/dev/mapper/tigger-swap3 partition 1048572 0 -3
/dev/mapper/tigger-swap4 partition 1048572 4 -4
/dev/mapper/tigger-swap5 partition 1048572 4 -5
/dev/mapper/tigger-swap6 partition 1048572 0 -6
/dev/mapper/tigger-swap7 partition 1048572 0 -7
/dev/mapper/tigger-swap8 partition 1048572 0 -8
/dev/mapper/tigger-swap–tmp partition 15847420 0 -9
total used free shared buffers cached
Mem: 8179656 8022476 157180 0 183340 4154136
-/+ buffers/cache: 3685000 4494656
Swap: 24235996 246796 23989200

// So, as hoped, pretty much all in RAM – or nearly so. We’ve got
// roughly 256 MiB less free RAM (about 4.7 G vs. about 4.5 G – but
// that’s with ZFS in use with deduplication, and dear knows what else
// the system might be up to. And swap – hardly touched … before had
// used of … well, it actually showed *more* swap in use earlier, so
// we’re down in the noise level, if it’s even touching swap at all.
// So, we’re working with our image file of about 16G within only about
// 150 M of actual used space – and mostly or entirely in RAM – which is
// quite good, considering the host only has 8 GiB of RAM total.
# pwd
/tmp/myzfs/SanDisk_16GB1
// Let’s get rid of those files of nulls – but their zeroed blocks
// remain (which compress well).
# rm *zero*
# df -k .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/loop2 15625744 52224 15573520 1% /tmp/myzfs/SanDisk_16GB1
# cd ..
# umount SanDisk_16GB1 && rmdir SanDisk_16GB1
# cd /home/r/root/projects/reference_images
// And, let’s recompress, and see if we got any actual space savings
# nice -n 40 xz -9 SanDisk_16GB-2.xz &&
> ls -ons SanDisk*xz
28096 -rw——- 1 0 28737228 Sep 25 13:19 SanDisk_16GB-2.xz
28100 -rw——- 1 0 28739284 Feb 23 2012 SanDisk_16GB.xz
// Yes, … barely, anyway.
// Good test/demonstration of ZFS deduplication
// and some additional tools and technology
// Can we get better compression on these with bzip2?
# (for tmp in SanDisk_16GB*.xz; do b=$(basename “$tmp” .xz); {
> xz -d “$b”.bz2 && touch -m -r “$tmp” “$b”.bz2
> }; done); ls -ons San*.xz San*.bz2
29268 -rw——- 1 0 29931918 Sep 25 13:19 SanDisk_16GB-2.bz2
28096 -rw——- 1 0 28737228 Sep 25 13:19 SanDisk_16GB-2.xz
29268 -rw——- 1 0 29933205 Feb 23 2012 SanDisk_16GB.bz2
28100 -rw——- 1 0 28739284 Feb 23 2012 SanDisk_16GB.xz
// Not for these particular files. Good exercise, but time to say
// goodbye to those
# rm San*.bz2 && touch -m -r SanDisk_16GB.xz SanDisk_16GB-2.xz &&
> mv -f SanDisk_16GB-2.xz SanDisk_16GB.xz
// And I save the slightly smaller xz file, and I set the mtime to match
// the older one – not that the contents exactly correspond to then, but
// better correspond to when I first grabbed image of that device
# ls -ons /tmp/myzfs/*
15635413 -rw——- 1 0 16008609792 Sep 25 12:49 /tmp/myzfs/SanDisk_16GB
# ls -onsh /tmp/myzfs/*
15G -rw——- 1 0 15G Sep 25 12:49 /tmp/myzfs/SanDisk_16GB
// On the ZFS filesystem, it’s a 15G image, with logically all those
// blocks allocated – as we did write them out from within that filesystem.
# zpool list && zfs list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
pool 17.4G 50.0M 17.3G 0% 319.72x ONLINE –
NAME USED AVAIL REFER MOUNTPOINT
pool 14.9G 16.8G 21K /pool
pool/myzfs 14.9G 16.8G 14.9G /tmp/myzfs
# ls -ons /tmp/zdata
152284 -rw-rw—T 1 0 18790481920 Sep 25 13:40 /tmp/zdata
# ls -onsh /tmp/zdata
149M -rw-rw—T 1 0 18G Sep 25 13:40 /tmp/zdata
// But, deduplication – it hardly uses any physical space at all – about
// 149 M, and zpool tells us it achieved 319.72x deduplication ratio
// Anyway, done with our exercise … wrap it up …
// Note also, I never altered /etc/fstab – all this stuff just a
// temporary exercise, thus no need, and I wasn’t worried about losing
// any of that temporary stuff, as it could all be easily recreated.
# zfs destroy pool/myzfs && rmdir /tmp/myzfs && zpool destroy pool
# zpool list
no pools available
# df -h /tmp
Filesystem Size Used Avail Use% Mounted on
tmpfs 18G 150M 18G 1% /tmp
// let’s get /tmp back as it was –
// what size do I normally have it mounted as?
# fgrep /tmp /etc/fstab | fgrep size= | sed -e ‘s/[ ]{1,}/ /g’
tmpfs /tmp tmpfs rw,nosuid,nodev,size=536870912 0 0
// and down to 512 MiB /tmp again
# mount -o remount,size=536870912 /tmp && df -h /tmp
Filesystem Size Used Avail Use% Mounted on
tmpfs 512M 150M 363M 30% /tmp
# swapon -s && free
Filename Type Size Used Priority
/dev/mapper/tigger-swap1 partition 1048572 246788 -1
/dev/mapper/tigger-swap2 partition 1048572 0 -2
/dev/mapper/tigger-swap3 partition 1048572 0 -3
/dev/mapper/tigger-swap4 partition 1048572 4 -4
/dev/mapper/tigger-swap5 partition 1048572 4 -5
/dev/mapper/tigger-swap6 partition 1048572 0 -6
/dev/mapper/tigger-swap7 partition 1048572 0 -7
/dev/mapper/tigger-swap8 partition 1048572 0 -8
/dev/mapper/tigger-swap–tmp partition 15847420 0 -9
total used free shared buffers cached
Mem: 8179656 5285796 2893860 0 176712 1613580
-/+ buffers/cache: 3495504 4684152
Swap: 24235996 246796 23989200
// remove our temporarily added swap
# swapoff /dev/tigger/swap-tmp
// And lets clobber any label bits on there, so it doesn’t possibly get
// misidentified later if something else reuses that storage –
// especially if it starts from that same point.
# dd if=/dev/zero of=/dev/tigger/swap-tmp bs=$(expr 1024 ‘*’ 1024)
> count=8
8+0 records in
8+0 records out
8388608 bytes (8.4 MB) copied, 0.131853 s, 63.6 MB/s
// and let’s get rid of the LV, so we regain that space to the VG
# lvchange -a n /dev/tigger/swap-tmp && lvremove /dev/tigger/swap-tmp
Logical volume “swap-tmp” successfully removed
// And we get rid of the file that held all our ZFS data, including the
// ZFS pool itself.
# rm /tmp/zdata
#
// I’ll also note, that there are other filesystem deduplication
// technologies available for Linux and/or in development. I picked ZFS
// in this particular case because I happened to have already had it
// installed, and I hadn’t yet tried out deduplication on ZFS, so I was
// rather curious to give it a go.

1 thought on “ZFS deduplication, loop device, sparse file, LVM, and tmpfs

  1. Well, on the plus side, lots of detailed info in this post.
    IMHO, though, the comments+commands+results are difficult to slog thru and fully comprehend. Maybe some good use of standard HTML tags or WordPress-specific formatting would be better called-for rather than the “tcpdump-style” release of 500+ lines of info (http://www.tcpdump.org/ ) and the overuse use of # and // characters?? Just a humble suggestion here….

Leave a Reply