Geometric Hashing

Like clustering (but in the backwards Bizarro World):

This principle is widely used in computer graphics, computational geometry and many other disciplines, to solve many proximity problems in the plane or in three-dimensional space, such as finding closest pairs in a set of points, similar shapes in a list of shapes, similar images in an image database, and so on. In these applications, the set of all inputs is some sort of metric space, and the hashing function can be interpreted as a partition of that space into a grid of cells. The table is often an array with two or more indices (called a grid file, grid index, bucket grid, and similar names), and the hash function returns an index tuple. This special case of hashing is known as geometric hashing or the grid method. Geometric hashing is also used in telecommunications (usually under the name vector quantization) to encode and compress multi-dimensional signals.

I could not find a sample/pseudocode implementation quickly.

 

 

Hash function

 

Advertisements

Ridiculously Simple Google Hash Function

A Go implementation based on the paper “A Fast, Minimal Memory, Consistent Hash Algorithm”, describing an evenly distributed hashing function called the “Jump” algorithm:

func Hash(key uint64, numBuckets int) int32 {
    var b int64 = -1
    var j int64

    for j < int64(numBuckets) {
        b = j
        key = key*2862933555777941757 + 1
        j = int64(float64(b+1) * (float64(int64(1)<<31) / float64((key>>33)+1)))
    }

    return int32(b)
}

If you’re curious about that constant, it is “known to produce a good random number list” for 64-bit generators:

Calculating a Hash for a Path (Recursively)

PathFingerprint allows you to recursively generate hashes for a directory structure. While doing this, it builds a catalog in a separate directory to serve as a cache. Subsequent runs of large directories will run much quicker. You can also do simple lookups against an existing catalog and generate/print a report of what has changed since the last run.

Build a test directory:

$ mkdir -p scan_path/subdir1
$ mkdir -p scan_path/subdir2
$ touch scan_path/subdir1/aa
$ touch scan_path/subdir1/bb

Calculate the hash (with reporting enabled):

$ pfhash -s scan_path -c catalog_path -R - 
create file subdir1/aa
create file subdir1/bb
create path subdir1
create path subdir2
create path .
0df9bc5a7657b7d481c219656441f10d21fd5668

Run again with a couple of changes (with reporting enabled):

$ touch scan_path/subdir1/aa
$ touch scan_path/subdir2/new_file

$ pfhash -s scan_path -c catalog_path -R - 
update file subdir1/aa
create file subdir2/new_file
update path subdir2
update path .
e700843c1b5c2f40a68098e1df96ef08b6081fe8

Lookup the hash using the lookup tool:

$ pflookup -c catalog_path
e700843c1b5c2f40a68098e1df96ef08b6081fe8

$ pflookup -c catalog_path -r subdir1
426a98d313a0a740b8445daa5102b3ed6dd7f4ed

$ pflookup -c catalog_path -r subdir1/aa
da39a3ee5e6b4b0d3255bfef95601890afd80709