Tiny Hack—Extracting Small File from a Large Zip File in the Cloud

Today I had a problem. I have a large (2GB) file in the cloud (Dropbox) which I suspect has a small file that I need from it. I don’t want to download the whole 2GB file to my local machine and I definitely don’t want to extract the whole file to my local hard disk just to see if the file is there.

I discovered that Dropbox actually has a nice feature where it will open largish zip files in preview to extract files, but it’s limited to .5GB, so not quite large enough for my needs.

I found with Expandrive (highly-recommended), I can mount my Dropbox folder without downloading any content, so at least that exposes the target zip file to my local workstation without downloading it.

Now I just need an application to open the zip file locally and explore/extract its contents. On Windows, I would have just opened the file in Explorer, but since I’m on macOS, I’ve got to track down an alternative.

Since I’m on macOS, I tried Fuse. I installed Fuse and through Homebrew installed fuse-zip. From there, I was able to attempt to mount the file, but when I ran fuse-zip file.zip /Volumes/file, it blocked for a very long time, as if it had to read the whole file just to set up the mount :(.

I realized I could download some third-party Zip file explorer, but searching online and in the mac App Store, I found no obvious reputable solution. Then I remembered that Python 3.8 has a new routine to make it easier to traverse zip files like pathlib objects, and since my shell is already running under Python 3.8, I just started inspecting the file:

$ import zipfile
$ root = zipfile.Path('/Volumes/Dropbox/path/to/myfile.zip')
$ list(root.iterdir())
[Path('/Volumes/Dropbox/path/to/myfile.zip', 'Takeout/')]
$ [x.name for x in (root / 'Takeout').iterdir()]
['archive_browser.html',
 'Contacts',
 'Calendar',
 'Tasks',
 'Hangouts',
 'Drive',
 'Mail']

Once I found the file I needed, I wanted to save it, so I wrote this quick function:

$ def save_text(path):
.     with open(path.name, 'wb') as strm:
.         strm.write(path.read_bytes())
.

Then ran it:

$ save_text(root / 'Takeout/Drive/some doc.docx')

And then I had a copy of the file I needed, transmitting only the bytes of the file and some overhead metadata. Hooray!

Written on October 3, 2019