Mostly unknown OpenSSH tricks

I’m not really keen on writing “tricks” kind of posts for software, especially widely known as OpenSSH, but this stuff tends to be quite intriguing now – to me at least – and I think most people wouldn’t know about that.

If you’ve used cvs, svn or other SSH-connected services in the recent past you probably know that one of the most time-wasting tasks for these systems is to connect the ssh session itself. Thankfully, quite a bit of time ago, OpenSSH introduced the “master” connection feature: the first ssh connection you open to a host (with a given username) creates a Unix socket descriptor, which is used by the following sessions. This was really a godsend for when committing a huge number of packages more or less at the same time, when I was still in the KDE team.

Unfortunately, this still required the first connection to be a persistent one; to solve that, I used to start a screen session that connected a few ssh connections, after asking me for the key passphrase. This didn’t make it any nice on the system or on the security, among other things. But fear not, OpenSSH 5.6 brought magic into the mix!

The new ControlPersist option allows for the master connection to be created in a detached SSH process when first connecting to a box, so that there is no need to preventively prepare for processes to be kept around. Basically all you have to do to make use of master connections now is something like this in your ~/.ssh/config:

Host *.gentoo.org
ControlMaster auto
ControlPersist yes

and you’re set: ssh takes care of creating the first master connection if not present, and to delete and recreate it if it’s dropped for whatever reason; you can otherwise force the connection to be closed by using ssh $host -O exit. Lovely!

There is one more trick that I wish to share though, although this time around I’m not sure which OpenSSH version introduced it. You can have from time to time the need to connect to a box that is behind a not-too-hostile firewall, which you have also access to. This is the case at a customer’s of mine where only the main router has a (dynamic) IPv6 address and I have to go through that to connect to the other boxes. The usual trick to follow in such a situation is to use the ProxyCommand option, using ssh and nc (or in my case nc6) to get a raw connection to the other side. I was always a bit bothered by having to do it this way to be honest. Once again, recent versions of OpenSSH solve the problem with the -W option:

Host router.mycustomer.example.com
ProxyCommand none

Host *.mycustomer.example.com
ProxyCommand ssh router.mycustomer.example.com -W %h:%p

With this method, it will be the ssh on my customer’s router to take care of connecting to the box further down the net and redirect that to the input/output streams, without the need for a nc process to be spawned. While this doesn’t look like a biggie, it’s still one less package I have to keep installed on the system, among other things.

A smarter netcat?

Today I restored data from the hard disk of the previous laptop of a friend of mine, who had her laptop burn out (almost literally) a few months ago. To do so, my usual first task is to back up the whole content of the drive itself, so that even if I screw up with the data there is no data loss on the original drive. Having more than 2TB addressable on this workstation certainly helps.

Now, I’m really not sure whether the copy operation was network-bound or disk-bound, and if it was disk whether it was the reading or the writing, but it certainly wasn’t exactly blazing fast, around an hour to copy 120GB. Sure it’s an average of 2GB/min so it’s not really slow either, but still I see room for improvement.

The host reading the data (Enterprise living again) booted from SysRescueCD, so the CPU was not doing anything else but that, so certainly it’s not a problem with CPU power; Yamato was building, but still it wasn’t bound on the CPU, that seemed obvious. The source was a 2.5” SATA drive, while the destination was an ext4 partition in LVM, on one of my SATA-II disks. The medium, classic nc over Gigabit Ethernet point-to-point connection (I borrowed the connection I use for the laptop’s iSCSI to back it up with Time Machine).

Now, I admit this is not something I do that often, but it’s something I do from time to time nonetheless, so I think that spending a couple of hours thinking of a solution could help, especially since I can see similar situations happening in the future where I’m much tighter with time. The one thing that seemed obvious to me was a lack of a compression layer. Since I wasn’t using a tunnel of any sort, netcat was sending the data through the TCP connection as it was read from the disk, without any kind of compression.

While data deduplication, as was suggested to me on a previous related post would be cool, it might be a bit of overkill in this situation, especially considering that Partimage support for NTFS is still experimental (and as you might guess I tend to have this kind of needs with non-Linux systems, mostly Windows and much more rarely Mac OS X). But at least being able to compress the vast areas of “zeros” in the disk would have had a major gain.

I guess that even just hooking up a zlib filter could have reduced the amount of traffic, but I wonder why isn’t there a simple option to handle the compression/decompression transparently rather than via a pipe filter. While filters are cool when you have to do advanced mojo on your inputs/outputs, tar has shown how it’s much easier to just have an option to handle that. I guess I’ll have to investigate alternatives to netcat, although I’d like it if the alternative would just allow me to use a gzip filter to handle the stream with a standard non-smart nc command.

I guess I’ll try my filter idea next time I use the network to transfer an hard disk image, for now the next image I have to take care of is of my mother’s iBook, which supports the IEEE1394 (Firewire) target mode, which is so nice to have.