Some notes about using pyjnius to access Java libraries from Python.
Have a look at this post on my blog for more information on what I did with it.
Pretty straight-forward:
pip install cython
pip install git+git://github.com/kivy/pyjnius.git
Just use autoclass() to import java classes inside Python:
from jnius import autoclass
Tika = autoclass('org.apache.tika.Tika')
Metadata = autoclass('org.apache.tika.metadata.Metadata')
FileInputStream = autoclass('java.io.FileInputStream')
And then use them right away:
tika = Tika()
meta = Metadata()
text = tika.parseToString(FileInputStream(filename), meta)
Warning
I’m currently experiencing memory leaks when using Tika like that.. I’m currently debugging the thing, but maybe I missed something related to garbage collection..