Split a Media File by a List of Time Offsets

We’ll split a single audio file containing the whole Quake soundtrack using SplitMedia.

The list file:

0:00:00 Quake Theme
0:05:08 Aftermath
0:07:34 The Hall of Souls
...
1:08:21 Scourge of Armagon 4
1:11:34 Scourge of Armagon 5
...
1:39:57 Dissolution of Eternity 6 
1:43:01 Dissolution of Eternity 7
1:46:07 Dissolution of Eternity 8

The command:

$ splitmedia Quake\ Soundtrack.m4a list_file.quake quake_output
OFF 000:00:00.000 DUR 000308.000 01_QuakeTheme.m4a
OFF 000:05:08.000 DUR 000146.000 02_Aftermath.m4a
OFF 000:07:34.000 DUR 000500.000 03_TheHallofSouls.m4a
...
OFF 001:08:21.000 DUR 000193.000 14_ScourgeofArmagon4.m4a
OFF 001:11:34.000 DUR 000193.000 15_ScourgeofArmagon5.m4a
...
OFF 001:39:57.000 DUR 000184.000 24_DissolutionofEternity6.m4a
OFF 001:43:01.000 DUR 000186.000 25_DissolutionofEternity7.m4a
OFF 001:46:07.000 DUR 000000.000 26_DissolutionofEternity8.m4a

Calculating a Hash for a Path (Recursively)

PathFingerprint allows you to recursively generate hashes for a directory structure. While doing this, it builds a catalog in a separate directory to serve as a cache. Subsequent runs of large directories will run much quicker. You can also do simple lookups against an existing catalog and generate/print a report of what has changed since the last run.

Build a test directory:

$ mkdir -p scan_path/subdir1
$ mkdir -p scan_path/subdir2
$ touch scan_path/subdir1/aa
$ touch scan_path/subdir1/bb

Calculate the hash (with reporting enabled):

$ pfhash -s scan_path -c catalog_path -R - 
create file subdir1/aa
create file subdir1/bb
create path subdir1
create path subdir2
create path .
0df9bc5a7657b7d481c219656441f10d21fd5668

Run again with a couple of changes (with reporting enabled):

$ touch scan_path/subdir1/aa
$ touch scan_path/subdir2/new_file

$ pfhash -s scan_path -c catalog_path -R - 
update file subdir1/aa
create file subdir2/new_file
update path subdir2
update path .
e700843c1b5c2f40a68098e1df96ef08b6081fe8

Lookup the hash using the lookup tool:

$ pflookup -c catalog_path
e700843c1b5c2f40a68098e1df96ef08b6081fe8

$ pflookup -c catalog_path -r subdir1
426a98d313a0a740b8445daa5102b3ed6dd7f4ed

$ pflookup -c catalog_path -r subdir1/aa
da39a3ee5e6b4b0d3255bfef95601890afd80709

The Numb-Nuts Tutorial to the Celery Distributed Task Queue (using Python)

Celery is a distributed queue that is very easy to pick-up. I’ll do two quick examples: one that sends a job and returns and another that sends a job and then retrieves a result. I’m going to use SQLite for this example (which is interfaced via SQLAlchemy). Since Celery seems to have some issues importing SQLite under Python 3, we’ll use Python 2.7 . Make sure that you install the “celery” and “sqlalchemy” Python packages.

Without Results

Define the tasks module and save it as sqlite_queue_without_results.py:

import celery

_BACKEND_URI = 'sqla+sqlite:///tasks.sqlite'
_APP = celery.Celery('sqlite_queue_without_results', broker=_BACKEND_URI)

@_APP.task
def some_task(incoming_message):
    return "ECHO: {0}".format(incoming_message)

Start the server:

$ celery -A sqlite_queue_without_results worker --loglevel=info

Execute the following Python to submit a job:

import sqlite_queue_without_results

_ARG1 = "Test message"
sqlite_queue_without_results.some_task.delay(_ARG1)

That’s it. You’ll see something similar to the following in the server window:

Celery (Without Result)

With Results

This time, when we define the tasks module, we’ll provide Celery a results backend. Call this module sqlite_queue_with_results.py:

import celery

_RESULT_URI = 'db+sqlite:///results.sqlite'
_BACKEND_URI = 'sqla+sqlite:///tasks.sqlite'

_APP = celery.Celery('sqlite_queue_with_results', broker=_BACKEND_URI, backend=_RESULT_URI)

@_APP.task
def some_task(incoming_message):
    return "ECHO: {0}".format(incoming_message)

Start the server:

$ celery -A sqlite_queue_with_results worker --loglevel=info

Execute the following Python to submit a job:

import sqlite_queue_with_results

_ARG1 = "Test message"
r = sqlite_queue_with_results.some_task.delay(_ARG1)
value = r.get(timeout=2)

print("Result: [{0}]".format(value))

Since we’re using a traditional DBMS (albeit a fast, local one) to store our results, we’ll be internally polling for a state change and then fetching the result. Therefore, it is a more costly operation and I’ve used a two-second timeout to accommodate this.

The server output will be similar to the following:

Celery (With Result)

The client output will look like:

Result: [ECHO: Test message]

Celery has many more features not explored by this tutorial, including:

  • exception propagation
  • custom task states (including providing metadata that can be read by the client)
  • task ignore/reject/retry responses
  • HTTP-based tasks (for calling your tasks in another place or language)
  • task routing
  • periodic/scheduled tasks
  • workflows
  • drawing visible graphs in order to inspect behavior

For more information, see the user guide.