Renaming Images to Their EXIF Timestamps

EXIF is a metdata specification largely used by modern versions of JPEG and TIFF. It allows you embed a wealth of descriptive information into a picture taken by your camera. Notably, this also, usually, includes one or more timestamps. In the event that you find yourself with a directory of anonymously-named images that you’d like to rename with their timestamps, I’ve written a quick Python script to automate such a task. This script assumes that you have the exif tool installed. It’s readily available for both Linux and Mac OS.

Code

import os
import subprocess
import glob
import datetime

_PICTURE_PATH = '/my/pictures/are/here'

_EXIF_COMMAND = 'exif'
_FILESPEC = '*.jpg'
_NEW_FILENAME_TEMPLATE = '{timestamp_phrase}.jpg'
_COMMON_EXIF_TIMESTAMP_FIELD_NAME = 'Date and Time'
_EXIF_TIMESTAMP_FORMAT = '%Y:%m:%d %H:%M:%S'
_OUTPUT_TIMESTAMP_FORMAT = '%Y%m%d-%H%M%S'

def _get_exif_info(filepath):
    cmd = [_EXIF_COMMAND, '-m', filepath]
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    exif_tab_delimited = p.stdout.read()
    r = p.wait()
    if r != 0:
        raise ValueError("EXIF command failed: %s" % (cmd,))

    lines = exif_tab_delimited.strip().split('n')[1:]
    pairs = [l.split('t') for l in lines]
    return dict(pairs)

def _get_filepaths(path):
    full_pattern = os.path.join(path, _FILESPEC)
    for filepath in glob.glob(full_pattern):
        yield filepath

def _main():
    for original_filepath in _get_filepaths(_PICTURE_PATH):
        exif = _get_exif_info(original_filepath)

        try:
            exif_timestamp_phrase = exif[_COMMON_EXIF_TIMESTAMP_FIELD_NAME]
        except KeyError:
            print("ERROR: {0}: Missing timestamp field".format(original_filepath))
            print('')

            continue

        timestamp_dt = 
            datetime.datetime.strptime(
                exif_timestamp_phrase, 
                _EXIF_TIMESTAMP_FORMAT)

        output_timestamp_phrase = 
            timestamp_dt.strftime(_OUTPUT_TIMESTAMP_FORMAT)

        new_filename = _NEW_FILENAME_TEMPLATE.format(
                        timestamp_phrase=output_timestamp_phrase)

        new_filepath = os.path.join(path, new_filename)

        print("{0} => {1}".format(original_filepath, new_filepath))
        os.rename(original_filepath, new_filepath)

if __name__ == '__main__':
    _main()

Usage

  1. Save the script to a file.
  2. Update the value for _PICTURE_PATH to the path of your pictures.
  3. Optionally, update _FILESPEC to the correct pattern/casing of your files.
  4. Optionally, update _COMMON_EXIF_TIMESTAMP_FIELD_NAME to the correct EXIF field-name if the device that created the pictures used a different field-name (you can use the exif tool directly to explore your images).
  5. If the timestamp is not formatted using the standard colon-delimited EXIF timestamp (e.g. 2012:04:29 20:51:32), update _EXIF_TIMESTAMP_FORMAT to reflect the proper format.
  6. Run using whatever name you gave the script:
    $ python rename_images.py
    

Output

Success output will look something like:

./IMG_3871.jpg => ./20131127-143832.jpg
./IMG_3872.jpg => ./20131127-143836.jpg
./IMG_3879.jpg => ./20131127-144045.jpg
./IMG_3880.jpg => ./20131127-144105.jpg
./IMG_3927.jpg => ./20131128-172021.jpg
...

Uploading Massive Backups to Amazon Glacier via boto

This is an example of how to use the boto library in Python to perform large, multipart, concurrent uploads to Amazon Glacier.

Notes

  1. The current version of the library (2.38.0) is broken for Python 2.7, for multipart uploads.
  2. The version of the library that we’re using for multipart uploads (2.29.1) is broken for Python 3, as are all other adjacent versions.
  3. Because of (1) and (2), we’re using version 2.29.1 under Python 2.7 and suggest that you do the same.

Example

#!/usr/bin/env python2.7

import os.path

import boto.glacier.layer2

def upload(access_key, secret_key, vault_name, filepath, description):
l = boto.glacier.layer2.Layer2(
aws_access_key_id=access_key,
aws_secret_access_key=secret_key)

v = l.get_vault(vault_name)

archive_id = v.concurrent_create_archive_from_file(
filepath,
description)

print(archive_id)

if __name__ == '__main__':
access_key = 'XXX'
secret_key = 'YYY'
vault_name = 'images'
filepath = '/mnt/array/backups/big_archive.xz'
description = os.path.basename(filepath)

upload(access_key, secret_key, vault_name, filepath, description)

Amazon Glacier for Massive Long-Term Backups, for Cheap

Amazon Glacier is a backup service that trades cost for convenience. It’s built around the concept of archive-files: Upload a single archive to your vault immediately, request the download of an archive from a vault and wait for four-hours for it to be fulfilled, or request an inventory of what you currently have stored in a particular vault (which also takes four-hours to fulfill). You can provide an Amazon Simple Notification Service resource in order to get an email or other type of notification when your inventory or download is ready.

Notes on File Organization

Before proceeding, it’s worth mentioning the advantages of uploading several large archives rather than many small ones:

  • The fast, multipart uploads only work with files that are at least 1M large (because the parts have to be at least 1M).
  • You’ll have to submit a download-request for each file you want to download. You’ll likely not want to submit a request for each file that was on your hard-drive or storage-array.
  • You will generally want to keep track of your archive-IDs. Even if you don’t want to and prefer to just retrieve a list of what you have backed-up, a couple of hours before you’re ready to request the backups, it requires considerably more labor to manage an inventory that has hundreds of thousands of entries in it.
  • It’s not very practical to want to have access to every individual document/image from your hard drive or array when it takes four-hours to gain access to each. This is a long-term backup strategy, not an external hard-drive.

The tool

We’re using a tool called glacier_tool to perform the upload. This is a tool to perform fast, multipart, concurrent uploads. It was written because the author had an issue finding anything that already existed and was still maintained. Most of what other tools were found were either UI-based or didn’t seem to support multipart uploads.

Installing

The backend Amazon library currently has issues. One of them is buggy Python 3 support. So, make sure you have Python 2.7 and install via PyPI:

$ sudo pip install glacier_tool

Usage

$ export AWS_ACCESS_KEY=XXX
$ export AWS_SECRET_KEY=YYY

$ gt_upload_large -em 11.33 image-backups /mnt/tower/backups/images-main-2010-20150617-2211.tar.xz images-main-2010-20150617-2211.tar.xz
Uploading: [/mnt/array/backups/images-main-2010-20150617-2211.tar.xz]
Size: (15.78) G
Start time: [2015-07-05 01:22:01]
Estimated duration: (3.17) hours => [2015-07-05 04:32:11] @ (11.33) Mbps
Archive ID: [IEGZ8uXToCDIgO3pMrrIHBIcJs...YyNlPigEwIR2NA]
Duration: (3.16) hours @ (11.37) Mbps

$ gt_upload_large -em 11.37 image-backups /mnt/tower/backups/images-main-2011-20150617-2211.tar.xz images-main-2011-20150617-2211.tar.xz
Uploading: [/mnt/array/backups/images-main-2011-20150617-2211.tar.xz]
Size: (26.66) G
Start time: [2015-07-05 10:07:58]
Estimated duration: (5.33) hours => [2015-07-05 15:28:03] @ (11.37) Mbps

Note that the output of one upload will print the approximate rate at which the upload was performed. This can be fed into subsequent commands to estimate how long they will take to complete.

Notes

  • This tool is only useful for massive, multipart, concurrent uploads. A tool will be provided in the near-future for massive, multipart, concurrent downloads. If you struggle with finding any solutions for this, post an issue on the project-website at Github and it’ll help move things along.