Geonames API for RDF/Semantic Web Queries

Geonames has a sprawling set of APIs to search for all manner of geographical information.

Their Search API also has the option of returning the RDF-compatible XML. RDF is often used for Semantic Web-related applications.

The geonames_rdf Python library not only provides you the ability to query the API from your Python code but also provides a tool that can often perform identical queries from the command-line. Used as a library, you can get the raw RDF document, a list of XML nodes, or a simple 2-tuple list of keys and values. Used as a command-line tool, you can print the raw RDF result or a flat list of keys and values.

To install, use PyPI:

$ sudo pip install geonames_rdf

Sourcecode Example

Code fragment

sa = geonames.adapters.search.Search('username')
result = sa.query('detroit').country('us').max_rows(2).execute()
for id_, name in result.get_flat_results():
    # make_unicode() is only used here for Python version-compatibility.
    print(geonames.compat.make_unicode("[{0}]: [{1}]").format(id_, name))

Output

[http://sws.geonames.org/4990729/]: [Detroit]
[http://sws.geonames.org/6955112/]: [Detroit-Warren-Livonia]

Command-line example

Simple list:

Pass the exact same parameter names and values to zero or more “-p” parameters:

$ gn_search dsoprea -p query detroit -p country us -p max_rows 2

Output:

[http://sws.geonames.org/4990729/]: [Detroit]
[http://sws.geonames.org/6955112/]: [Detroit-Warren-Livonia]

Raw RDF response:

Pass the “-x” parameter:

$ gn_search dsoprea -p query detroit -p country us -p max_rows 2 -x

Output:

<rdf:RDF xmlns:cc="http://creativecommons.org/ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:gn="http://www.geonames.org/ontology#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:wgs84_pos="http://www.w3.org/2003/01/geo/wgs84_pos#">
<gn:Feature rdf:about="http://sws.geonames.org/4990729/">
<rdfs:isDefinedBy rdf:resource="http://sws.geonames.org/4990729/about.rdf"/>
<gn:name>Detroit</gn:name>
<gn:alternateName xml:lang="af">Detroit</gn:alternateName>
<gn:alternateName xml:lang="ar">ديترويت</gn:alternateName>
<gn:alternateName xml:lang="az">Detroyt</gn:alternateName>
<gn:alternateName xml:lang="be">Горад Дэтройт</gn:alternateName>
<gn:alternateName xml:lang="bg">Детройт</gn:alternateName>
<gn:alternateName xml:lang="bn">ডেট্রয়েট</gn:alternateName>
<gn:alternateName xml:lang="bs">Detroit</gn:alternateName>
<gn:alternateName xml:lang="ca">Detroit</gn:alternateName>
<gn:alternateName xml:lang="ce">Детройт</gn:alternateName>
<gn:alternateName xml:lang="cs">Detroit</gn:alternateName>
<gn:alternateName xml:lang="da">Detroit</gn:alternateName>
<gn:alternateName xml:lang="de">Detroit</gn:alternateName>
<gn:alternateName xml:lang="el">Ντιτρόιτ</gn:alternateName>
<gn:alternateName xml:lang="en">Detroit</gn:alternateName>
<gn:alternateName xml:lang="eo">Detroit</gn:alternateName>
<gn:alternateName xml:lang="es">Detroit</gn:alternateName>
<gn:alternateName xml:lang="et">Detroit</gn:alternateName>
<gn:alternateName xml:lang="fa">دیترویت</gn:alternateName>
<gn:alternateName xml:lang="fi">Detroit</gn:alternateName>
<gn:alternateName xml:lang="fr">Détroit</gn:alternateName>
<gn:alternateName xml:lang="gl">Detroit</gn:alternateName>
<gn:alternateName xml:lang="he">דטרויט</gn:alternateName>
<gn:alternateName xml:lang="hi">डेट्राइट</gn:alternateName>
<gn:alternateName xml:lang="hu">Detroit</gn:alternateName>
<gn:alternateName xml:lang="hy">Դետրոյթ</gn:alternateName>
<gn:alternateName xml:lang="id">Detroit</gn:alternateName>
<gn:alternateName xml:lang="io">Detroit</gn:alternateName>
<gn:alternateName xml:lang="is">Detroit</gn:alternateName>
<gn:alternateName xml:lang="it">Detroit</gn:alternateName>
<gn:alternateName xml:lang="ja">デトロイト</gn:alternateName>
<gn:alternateName xml:lang="ka">დეტროიტი</gn:alternateName>
<gn:alternateName xml:lang="kk">Детройт</gn:alternateName>
<gn:alternateName xml:lang="ko">디트로이트</gn:alternateName>
<gn:alternateName xml:lang="la">Detroitum</gn:alternateName>
<gn:alternateName xml:lang="lt">Detroitas</gn:alternateName>
<gn:alternateName xml:lang="lv">Detroita</gn:alternateName>
<gn:alternateName xml:lang="mk">Детроит</gn:alternateName>
<gn:alternateName xml:lang="mr">डेट्रॉईट</gn:alternateName>
<gn:alternateName xml:lang="mrj">Детройт</gn:alternateName>
<gn:alternateName xml:lang="mzn">دیترویت</gn:alternateName>
<gn:alternateName xml:lang="nl">Detroit</gn:alternateName>
<gn:alternateName xml:lang="nn">Detroit</gn:alternateName>
<gn:alternateName xml:lang="no">Detroit</gn:alternateName>
<gn:alternateName xml:lang="oc">Detroit</gn:alternateName>
<gn:alternateName xml:lang="pl">Detroit</gn:alternateName>
<gn:alternateName xml:lang="pt">Detroit</gn:alternateName>
<gn:alternateName xml:lang="ro">Detroit</gn:alternateName>
<gn:alternateName xml:lang="ru">Детройт</gn:alternateName>
<gn:alternateName xml:lang="sah">Детройт</gn:alternateName>
<gn:alternateName xml:lang="sk">Detroit</gn:alternateName>
<gn:alternateName xml:lang="sr">Детроит</gn:alternateName>
<gn:alternateName xml:lang="sv">Detroit</gn:alternateName>
<gn:alternateName xml:lang="ta">டிட்ராயிட்</gn:alternateName>
<gn:alternateName xml:lang="te">డెట్రాయిట్</gn:alternateName>
<gn:alternateName xml:lang="tg">Детройт</gn:alternateName>
<gn:alternateName xml:lang="th">ดีทรอยต์</gn:alternateName>
<gn:alternateName xml:lang="tr">Detroit</gn:alternateName>
<gn:alternateName xml:lang="ug">Détroyt</gn:alternateName>
<gn:alternateName xml:lang="uk">Детройт</gn:alternateName>
<gn:alternateName xml:lang="vi">Detroit</gn:alternateName>
<gn:alternateName xml:lang="xmf">დეთროითი</gn:alternateName>
<gn:alternateName xml:lang="yi">דעטרויט</gn:alternateName>
<gn:alternateName xml:lang="zh">底特律</gn:alternateName>
<gn:featureClass rdf:resource="http://www.geonames.org/ontology#P"/>
<gn:featureCode rdf:resource="http://www.geonames.org/ontology#P.PPLA2"/>
<gn:countryCode>US</gn:countryCode>
<gn:population>713777</gn:population>
<gn:postalCode>48258</gn:postalCode>
<wgs84_pos:lat>42.33143</wgs84_pos:lat>
<wgs84_pos:long>-83.04575</wgs84_pos:long>
<wgs84_pos:alt>183</wgs84_pos:alt>
<gn:parentCountry rdf:resource="http://sws.geonames.org/6252001/"/>
<gn:nearbyFeatures rdf:resource="http://sws.geonames.org/4990729/nearby.rdf"/>
<gn:locationMap rdf:resource="http://www.geonames.org/4990729/detroit.html"/>
<gn:wikipediaArticle rdf:resource="http://en.wikipedia.org/wiki/Detroit"/>
<rdfs:seeAlso rdf:resource="http://dbpedia.org/resource/Detroit"/>
</gn:Feature>
<gn:Feature rdf:about="http://sws.geonames.org/6955112/">
<rdfs:isDefinedBy rdf:resource="http://sws.geonames.org/6955112/about.rdf"/>
<gn:name>Detroit-Warren-Livonia</gn:name>
<gn:featureClass rdf:resource="http://www.geonames.org/ontology#L"/>
<gn:featureCode rdf:resource="http://www.geonames.org/ontology#L.RGNE"/>
<gn:countryCode>US</gn:countryCode>
<gn:population>4425110</gn:population>
<wgs84_pos:lat>42.34231</wgs84_pos:lat>
<wgs84_pos:long>-83.07175</wgs84_pos:long>
<gn:parentCountry rdf:resource="http://sws.geonames.org/6252001/"/>
<gn:nearbyFeatures rdf:resource="http://sws.geonames.org/6955112/nearby.rdf"/>
<gn:locationMap rdf:resource="http://www.geonames.org/6955112/detroit-warren-livonia.html"/>
</gn:Feature>
</rdf:RDF>

For more information, go to the project homepage.

Open Health APIs with SMART on FHIR

SMART on FHIR is an initiative to create open-standard health APIs (SMART) on open-standard health data-formats (FHIR).

Here is a tutorial with general SMART/FHIR notes and a sample project to query the SMART API on the public sandbox server and plot the data using Seaborn:

SMARTOnFHIRExample

The tutorial also has information on how to boot a sandbox server with Vagrant.

Screenshots

Community diastolic blood-pressure:

Community diastolic blood-pressure

Community systolic blood-pressure:

Community systolic blood-pressure

Build Recursive Patches Against an Application Directory

PathManifest is a utility that deposits a manifest of an application directory into its root, and allows you to build differential patches over time. As the tools can also print JSON-encoded data on their completion, they can be readily integrated from other tools/applications.

Example usage:

$ pm_write_manifest /application/root
$ pm_check_for_changes /application/root
New
---
new_directory/new_file2
new_file

Updated
-------
updated_file

$ pm_make_differential_patch /application/root 201507282031 /tmp
Created/Updated Files
---------------------

new_directory/new_file2
updated_file
new_file

Patch file-path:

/tmp/pm-patch-201507282031.tar.bz2

To apply the patch, simply expand. Note that this doesn’t support file removal but will in the future (still with a basic archive, but with the aid of an additional tool):

$ tar xjf /tmp/pm-patch-201507282031.tar.bz2 

Display applied patches:

$ pm_show_applied_patches /application/root
Applied Patches
---------------

201507282031

Affected Files
--------------

new_directory/new_file2
updated_file
new_file

Using NetworkX to Plot Graphs

I’ve previously mentioned graphviz for plotting graphs. In truth, these resemble flowcharts. To create something that looks like a more traditional vertex and edge representation, you might consider NetworkX.

Whereas graphviz is a fairly general purpose utility that is not specific to Python and is developed around the well-defined DOT-format, NetworkX is Python specific but creates very nice graphics. It’s also significantly easier to get something that’s acceptable while probably minimizing the amount of time that you have to monkey with it. With that said, there are multiple layout algorithms that you can invoke to calculate the positions of the elements in the output image, and the only apparent way to get a consistent, well-organized/balanced representation seems to arrange them using the circular layout.

Digraph example:

import networkx as nx
import matplotlib.pyplot as plt

def _main():
    g = nx.DiGraph()

    g.add_edge(2, 3, weight=1)
    g.add_edge(3, 4, weight=5)
    g.add_edge(5, 1, weight=10)
    g.add_edge(1, 3, weight=15)

    g.add_edge(2, 7, weight=1)
    g.add_edge(13, 6, weight=5)
    g.add_edge(12, 5, weight=10)
    g.add_edge(11, 4, weight=15)

    g.add_edge(9, 2, weight=1)
    g.add_edge(10, 13, weight=5)
    g.add_edge(7, 5, weight=10)
    g.add_edge(9, 4, weight=15)

    g.add_edge(10, 3, weight=1)
    g.add_edge(11, 2, weight=5)
    g.add_edge(9, 6, weight=10)
    g.add_edge(10, 5, weight=15)

    pos = nx.circular_layout(g)

    edge_labels = { (u,v): d['weight'] for u,v,d in g.edges(data=True) }

    nx.draw_networkx_nodes(g,pos,node_size=700)
    nx.draw_networkx_edges(g,pos)
    nx.draw_networkx_labels(g,pos)
    nx.draw_networkx_edge_labels(g,pos,edge_labels=edge_labels)

    plt.title("Graph Title")
    plt.axis('off')

    plt.savefig('output.png')
    plt.show()

if __name__ == '__main__':
    _main()

Notice that NetworkX depends on matplotlib to do the actual drawing. The boots (highlighted parts on the edges) represent directedness.

Output:

NetworkX

As I said before, it’s easier to get a nicer representation, but it appears that this is at the cost of flexibility. Notice that in the image, there’s a a tendency to overlap. In fact, all of the edge-labels are dead-center. Since the nodes are arranged in a circle, all edges that cross from one side to another will have labels that overlap in the middle. Technically you can adjust whether the label is left, middle, or right, but it’s limited to that (rather than being calculated on the fly).

A Complete Huffman Encoder Implementation

I’ve written a Huffman implementation for the purpose of completely showing how to build the frequency-table, Huffman tree, encoding table, as well as how to serialize the tree, store the tree and data to a file, restore both structures from a file, decode the data using the tree, and how to make this more fun using Python.

This is the test code (test_steps):

clear_bytes = test_get_data()
_dump_hex("Original data:", clear_bytes)

tu = TreeUtility()

# Build encoding table and tree.

he = Encoding()
encoding = he.get_encoding(clear_bytes)

print("Weights:n{0}".format(pprint.pformat(encoding.weights)))
print('')

print("Tree:")
print('')

tu.print_tree(encoding.tree)
print('')

flat_encoding_table = { 
    (hex(c)[2:] + ' ' + chr(c).strip()): b
    for (c, b) 
    in encoding.table.items() }

print("Encoding:n{0}".format(pprint.pformat(flat_encoding_table)))
print('')

# Encode the data.

print("Encoded characters:nn{0}n".
      format(encode_to_debug_string(encoding.table, clear_bytes)))

encoded_bytes = encode(encoding.table, clear_bytes)
_dump_hex("Encoded:", encoded_bytes)

# Decode the data.

decoded_bytes_list = decode(encoding.tree, encoded_bytes)
decoded_bytes = bytes(decoded_bytes_list)

assert 
    clear_bytes == decoded_bytes, 
    "Decoded does not equal the original."

_dump_hex("Decoded:", decoded_bytes)

print("Decoded text:")
print('')
print(decoded_bytes)
print('')

# Serialize and unserialize tree.

serialized_tree = tu.serialize(encoding.tree)
unserialized_tree = tu.unserialize(serialized_tree)

decoded_bytes_list2 = decode(unserialized_tree, encoded_bytes)
decoded_bytes2 = bytes(decoded_bytes_list2)

assert 
    clear_bytes == decoded_bytes2, 
    "Decoded does not equal the original after serializing/" 
    "unserializing the tree."

This is its output:

(Dump) Original data:

54 68 69 73 20 69 73 20 61 20 74 65 73 74 2e 20
54 68 61 6e 6b 20 79 6f 75 20 66 6f 72 20 6c 69
73 74 65 6e 69 6e 67 2e 0a

Weights:
{10: 1,
 32: 7,
 46: 2,
 84: 2,
 97: 2,
 101: 2,
 102: 1,
 103: 1,
 104: 2,
 105: 4,
 107: 1,
 108: 1,
 110: 3,
 111: 2,
 114: 1,
 115: 4,
 116: 3,
 117: 1,
 121: 1}

Tree:

LEFT>
. LEFT>
. . LEFT>
. . . VALUE=(69) [i]
. . RIGHT>
. . . VALUE=(73) [s]
. RIGHT>
. . LEFT>
. . . LEFT>
. . . . VALUE=(54) [T]
. . . RIGHT>
. . . . VALUE=(65) [e]
. . RIGHT>
. . . LEFT>
. . . . LEFT>
. . . . . VALUE=(66) [f]
. . . . RIGHT>
. . . . . VALUE=(72) [r]
. . . RIGHT>
. . . . LEFT>
. . . . . VALUE=(6c) [l]
. . . . RIGHT>
. . . . . VALUE=(a) []
RIGHT>
. LEFT>
. . LEFT>
. . . LEFT>
. . . . VALUE=(6f) [o]
. . . RIGHT>
. . . . VALUE=(61) [a]
. . RIGHT>
. . . LEFT>
. . . . VALUE=(74) [t]
. . . RIGHT>
. . . . VALUE=(6e) [n]
. RIGHT>
. . LEFT>
. . . VALUE=(20) []
. . RIGHT>
. . . LEFT>
. . . . LEFT>
. . . . . LEFT>
. . . . . . VALUE=(6b) [k]
. . . . . RIGHT>
. . . . . . VALUE=(79) [y]
. . . . RIGHT>
. . . . . VALUE=(68) [h]
. . . RIGHT>
. . . . LEFT>
. . . . . VALUE=(2e) [.]
. . . . RIGHT>
. . . . . LEFT>
. . . . . . VALUE=(75) [u]
. . . . . RIGHT>
. . . . . . VALUE=(67) [g]

Encoding:
{'20 ': bitarray('110'),
 '2e .': bitarray('11110'),
 '54 T': bitarray('0100'),
 '61 a': bitarray('1001'),
 '65 e': bitarray('0101'),
 '66 f': bitarray('01100'),
 '67 g': bitarray('111111'),
 '68 h': bitarray('11101'),
 '69 i': bitarray('000'),
 '6b k': bitarray('111000'),
 '6c l': bitarray('01110'),
 '6e n': bitarray('1011'),
 '6f o': bitarray('1000'),
 '72 r': bitarray('01101'),
 '73 s': bitarray('001'),
 '74 t': bitarray('1010'),
 '75 u': bitarray('111110'),
 '79 y': bitarray('111001'),
 'a ': bitarray('01111')}

Encoded characters:

0100 11101 000 001 110 000 001 110 1001 110 1010 0101 001 1010 11110 110 0100 11101 1001 1011 111000 110 111001 1000 111110 110 01100 1000 01101 110 01110 000 001 1010 0101 1011 000 1011 111111 11110 01111

(Dump) Encoded:

4e 83 81 d3 a9 4d 7b 27 66 f8 dc c7 d9 90 dc e0
69 6c 5f fe 7c

(Dump) Decoded:

54 68 69 73 20 69 73 20 61 20 74 65 73 74 2e 20
54 68 61 6e 6b 20 79 6f 75 20 66 6f 72 20 6c 69
73 74 65 6e 69 6e 67 2e 0a

Decoded text:

b'This is a test. Thank you for listening.\n'

PriorityQueue versus heapq

Python’s queue.PriorityQueue queue is actually based on the heapq module, but provides a traditional Python queue interface. The difference appears to largely be the interface: OO vs. passing a list (which heapq can act on directly).

The documentation for PriorityQueue is a little misleading, at least when you didn’t take a moment to think about how the sorting works. This is what it says:

A typical pattern for entries is a tuple in the form: (priority_number, data)

I ran into an issue where I was getting an error when the second parameter (the actual item) couldn’t be used to sort. Whereas the documentation implies that there’s a convention the expects the priority to be in the first spot, it looks like the sort is just evaluating the entire tuple. This means that, when I was trying to insert with a priority that was already in the queue, the second item of both was being compared (this is how tuples are sorted). Curiously, I guess most of my previous use-cases involved priorities (such as timestamps) that were either sparse enough or the data happened to be sortable. Crap.

Now, looking back at the documentation for heapq, I’ve noticed one of the examples:

>>> h = []
>>> heappush(h, (5, 'write code'))
>>> heappush(h, (7, 'release product'))
>>> heappush(h, (1, 'write spec'))
>>> heappush(h, (3, 'create tests'))
>>> heappop(h)
(1, 'write spec')

So, it turns out that heapq also [hastily] recommends using tuples, but we now know that this comes with a lazy assumption: It only works if you’re willing to allow it to sort by the item itself if two or more items share a priority.

So, in conclusion, the nicest strategy is to use an object that has the “rich-comparison methods” defined on it (e.g. __lt__ and __eq__) rather than tuples. This will allow you to constrain the comparison operations.