PYLUCENE 3.0 IN 60 SECONDS — TUTORIAL and SAMPLE CODE

facebooktwittergoogle_plusredditpinterestlinkedinmail

I pro­vide a basic index­ing and retrieval code using the PyLucene 3.0 API.Lucene In Action (2nd Ed) cov­ers Lucene 3.0, but the PyLucene code sam­ples for have not been updated for the 3.0 API, only the Java ones. Unfor­tu­nately, there is cur­rently lit­tle (no?) exam­ple PyLucene code in blo­gos­phere. If you have links to more Lucene 3.0 tuto­ri­als and sam­ples, please share them in the comments.

SAM­PLE PYLUCENE 3.0 CODE

In the spirit of Lingpipe’s Lucene 2.4 in 60 sec­onds, here are rel­e­vant PyLucene 3.0 code snip­pets from my biased-text-sample project, for index­ing and retrieval.

INDEX­ING

import lucene
from lucene import \
    SimpleFSDirectory, System, File, \
    Document, Field, StandardAnalyzer, IndexWriter, Version

if __name__ == "__main__":
    lucene.initVM()
    indexDir = "/Tmp/REMOVEME.index-dir"
    dir = SimpleFSDirectory(File(indexDir))
    analyzer = StandardAnalyzer(Version.LUCENE_30)
    writer = IndexWriter(dir, analyzer, True, IndexWriter.MaxFieldLength(512))

    print >> sys.stderr, "Currently there are %d documents in the index..." % writer.numDocs()

    print >> sys.stderr, "Reading lines from sys.stdin..."
    for l in sys.stdin:
        doc = Document()
        doc.add(Field("text", l, Field.Store.YES, Field.Index.ANALYZED))
        writer.addDocument(doc)

    print >> sys.stderr, "Indexed lines from stdin (%d documents in index)" % (writer.numDocs())
    print >> sys.stderr, "About to optimize index cialis le moins cher of %d documents..." % writer.numDocs()
    writer.optimize()
    print >> sys.stderr, "...done optimizing index of %d documents" % writer.numDocs()
   prix cialis en pharmacie  print >> sys.stderr, "Closing index of %d documents..." % writer.numDocs()
    writer.close()
    print >> sys.stderr, "...done closing index of %d documents" % writer.numDocs()

RETRIEVAL

import lucene
from lucene import \
    SimpleFSDirectory, System, File, \
    Document, Field, StandardAnalyzer, IndexSearcher, Version, QueryParser

if __name__ == "__main__":
    lucene.initVM()
    indexDir = "/Tmp/REMOVEME.index-dir"
    dir = SimpleFSDirectory(File(indexDir))
    analyzer = StandardAnalyzer(Version.LUCENE_30)
    searcher = IndexSearcher(dir)

    query = QueryParser(Version.LUCENE_30, "text", analyzer).parse("Find this sentence please")
    MAX = 1000
    hits = searcher.search(query, MAX)

    print "Found %d document(s) that matched query '%s':" % (hits.totalHits, query)

    for hit in hits.scoreDocs:
        print hit.score, hit.doc, hit.toString()
        doc = searcher.doc(hit.doc)
        print doc.get("text").encode("utf-8")

 

facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *