Processing Text for Sentiment and Other Good Stuff

textblob integrates nltk and pattern. It allows you to easily extract and derive information from a passage of text.

To install:

$ sudo pip install textblob

Based on the example, here:

import textblob
import pprint

text = '''
The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be darned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant.
'''

blob = textblob.TextBlob(text)

# Get parts of speech.
blob.tags

# Get list of individual noun-phrases.
blob.noun_phrases

# Print sentence and sentiment polarity:
for sentence in blob.sentences:
    print(sentence)
    print('')
    print(sentence.sentiment.polarity)
    print('')
    print('--')
    print('')

# Convert to Spanish.
blob.translate(to="es")

Output:

>>> import textblob
>>> import pprint
>>> 
>>> text = '''
... The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be darned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant.
... '''
>>> 
>>> blob = textblob.TextBlob(text)
>>>
>>> # Get parts of speech.
>>> blob.tags
[(u'The', u'DT'), (u'titular', u'JJ'), (u'threat', u'NN'), (u'of', u'IN'), (u'The', u'DT'), (u'Blob', u'NNP'), (u'has', u'VBZ'), (u'always', u'RB'), (u'struck', u'VBD'), (u'me', u'PRP'), (u'as', u'IN'), (u'the', u'DT'), (u'ultimate', u'JJ'), (u'movie', u'NN'), (u'monster', u'NN'), (u'an', u'DT'), (u'insatiably', u'RB'), (u'hungry', u'JJ'), (u'amoeba-like', u'JJ'), (u'mass', u'NN'), (u'able', u'JJ'), (u'to', u'TO'), (u'penetrate', u'VB'), (u'virtually', u'RB'), (u'any', u'DT'), (u'safeguard', u'VB'), (u'capable', u'JJ'), (u'of--as', u'JJ'), (u'a', u'DT'), (u'doomed', u'VBN'), (u'doctor', u'NN'), (u'chillingly', u'RB'), (u'describes', u'VBZ'), (u'it', u'PRP'), (u'assimilating', u'VBG'), (u'flesh', u'NN'), (u'on', u'IN'), (u'contact', u'NN'), (u'Snide', u'NNP'), (u'comparisons', u'NNS'), (u'to', u'TO'), (u'gelatin', u'NN'), (u'be', u'VB'), (u'darned', u'JJ'), (u'it', u'PRP'), (u"'", u'POS'), (u's', u'PRP'), (u'a', u'DT'), (u'concept', u'NN'), (u'with', u'IN'), (u'the', u'DT'), (u'most', u'RBS'), (u'devastating', u'JJ'), (u'of', u'IN'), (u'potential', u'JJ'), (u'consequences', u'NNS'), (u'not', u'RB'), (u'unlike', u'IN'), (u'the', u'DT'), (u'grey', u'JJ'), (u'goo', u'NN'), (u'scenario', u'NN'), (u'proposed', u'VBN'), (u'by', u'IN'), (u'technological', u'JJ'), (u'theorists', u'NNS'), (u'fearful', u'JJ'), (u'of', u'IN'), (u'artificial', u'JJ'), (u'intelligence', u'NN'), (u'run', u'VB'), (u'rampant', u'JJ')]
>>>
>>> # Get list of individual noun-phrases.
>>> blob.noun_phrases
WordList([u'titular threat', 'blob', u'ultimate movie monster', u'amoeba-like mass', 'snide', u'potential consequences', u'grey goo scenario', u'technological theorists fearful', u'artificial intelligence run rampant'])
>>>
>>> # Print sentence and sentiment polarity:
>>> for sentence in blob.sentences:
...     print(sentence)
...     print('')
...     print(sentence.sentiment.polarity)
...     print('')
...     print('--')
...     print('')
... 

The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact.

0.06

--

Snide comparisons to gelatin be darned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant.


-0.341666666667

--

>>>
>>> # Convert to Spanish.
>>> blob.translate(to="es")
TextBlob("La amenaza titular de The Blob siempre me ha parecido como el último monstruo de la película: una, la masa insaciablemente hambriento ameba capaz de penetrar prácticamente cualquier salvaguardia, capaz de - como médico condenado escalofriantemente lo describe - "asimilar carne en contacto. comparaciones Snide a la gelatina ser condenados, es un concepto con el más devastador de las posibles consecuencias, no muy diferente del escenario plaga gris propuesta por los teóricos tecnológicos temerosos de la inteligencia artificial ejecutar rampante.")

Awesome, right?

Advertisements

Build an R-Tree in Python for Fun and Profit

There might come a time when you will prefer to stylishly load spatial data into a memory-structure rather than clumsily integrating a database just to quickly answer a question over a finite amount of data. You can use an R-tree by way of the rtree Python package that wraps the libspatialindex native library.

It’s both Python 2 and 3 compatible.

Building libspatialindex:

  1. Download it (using either Github or an archive.
  2. Configure, build, and install it (the shared-library won’t be created unless you do the install):
$ ./configure
$ make
$ sudo make install
  1. Install the Python package:
$ sudo pip install rtree
  1. Run the example code, which is based on their example code:
import rtree.index

idx2 = rtree.index.Rtree()

locs = [
    (14, 10, 14, 10),
    (16, 10, 16, 10),
]

for i, (minx, miny, maxx, maxy) in enumerate(locs):
    idx2.add(i, (minx, miny, maxx, maxy), obj={'a': 42})

for distance in (1, 2):
    print("Within distance of: ({0})".format(distance))
    print('')

    r = [
        (i.id, i.object) 
        for i 
        in idx2.nearest((13, 10, 13, 10), distance, objects=True)
    ]

    print(r)
    print('')

Output:

Within distance of: (1)

[(0, {'a': 42})]

Within distance of: (2)

[(0, {'a': 42}), (1, {'a': 42})]

NOTE: You need to represent your points as bounding-boxes, which is the basic structure of an R-tree (polygons inside of polygons inside of polygons).

In this case, we assign arbitrary objects that are associated with each bounding box. When we do a search, we get the objects back, too.