Working in bioinformatics we find many tools that cannot decompress sequence files automatically. When the tool in question only takes a single sequence file we can typically use a standard shell pipe, for example :
zcat seq.fa.gz | wc
However, when the tool takes multiple files we need another approach. Imagine we have a tool, "seqs", which takes two file parameters "-file1" and "-file2" but we want to give it the compressed files without decompressing them to the filesystem. We could do this using mkfifo :
mkfifo pipe1 pipe2
zcat seq1.fa.gz > pipe1 &
zcat seq2.fa.gz > pipe2 &
seqs -file1 pipe1 -file2 pipe2
rm pipe1 pipe2
There is a much nicer way to do this using Process substitution
seqs -file1 <(zcat seq1.fa.gz) -file2 <(zcat seq2.fa.gz)
It is also possible to use this for a programs output. This is often useful in conjunction with the tee tool to send a programs output to several different programs. Here's an example, from the tee manual that demonstrates download a file, and computing the sha1 hash and md5 hash at the same time :
wget -O - http://example.com/dvd.iso \
| tee >(sha1sum > dvd.sha1) \
>(md5sum > dvd.md5) \
> dvd.iso
>(md5sum > dvd.md5) \
> dvd.iso
Process substitution is supported by bash and zsh, but not by csh based shells.
Awesome!
ReplyDelete