Quantcast

Fast manipulation of tar files using 7zip

tar always reads every byte in an archive (never calles seek()) and is very slow when trying to extract a single file from a large archive.

One solution is to use 7z instead, which is found in the p7zip-full debian package. For some operations, 7z is three orders of magnitude faster than tar. Here are some timings that illustrate how much faster 7z is:

#4.5GB archive listing using tar
time tar tvf fifteenthcensus00reel2149_jp2.tar 
 
(output suppressed, timing stable whether file cache is warmed or not)
 
real	2m1.876s
user	0m0.444s
sys	0m7.740s
 
#4.5GB archive listing using 7z with cold file cache
time 7z l fifteenthcensus00reel2149_jp2.tar 
 
(output suppressed)
 
real	0m10.419s
user	0m0.080s
sys	0m0.124s
5:28 PM
 
#4.5GB archive listing using 7z with hot file cache
time 7z l fifteenthcensus00reel2149_jp2.tar 
 
(output suppressed)
 
real	0m0.145s
user	0m0.052s
sys	0m0.040s
 
 
#extraction of last file in a 4.5GB archive using tar
time tar xvf fifteenthcensus00reel2149_jp2.tar fifteenthcensus00reel2149_jp2/fifteenthcensus00reel2149_0185.jp2
 
real	2m3.545s
user	0m0.436s
sys	0m7.824s
 
#extraction of last file in a 4.5GB archive using 7z and a hot file cache
time 7z e fifteenthcensus00reel2149_jp2.tar fifteenthcensus00reel2149_jp2/fifteenthcensus00reel2149_0185.jp2
 
real	0m0.104s
user	0m0.036s
sys	0m0.036s

One Response to “Fast manipulation of tar files using 7zip”

  1. shag
    February 8th, 2010 | 1:51 pm

    maybe you can file a bug with john gilmore the next time you see him :-)

Leave a reply