Search this Site
Contact us

Phone: +1 (727) 498-0222

E-Mail: info@appliedrelevance.com

Request a Demo

Brochures

Please take one of our lovely brochures.  They are suitable for framing.  Not only decorative, they are also enlightening.

AR.Taxonomy Data Sheet

DataFocus for Oracle SES

AR.Taxonomy for Sharepoint 2007

Wand Foundation TaxonomiesDataFocus for MOSS 2007

About AR.Lucene

AR.Lucene is a simple, free utility from Applied Relevance that allows you to explore Lucene.Net indexes.  It provides some similar functionality as Luke and Lucli, except it is for Lucene.Net only.

AR.Lucene  interacts with a Lucene index.   You can:

  • List all documents
  • Search for documents
  • List all fields
  • Display search results
  • Show count of documents
  • Find and delete duplicate documents

AR.Lucene is easy to install and use (for a Windows command line utility, anyway).   All input is through command-line arguements and output is plain text to the console, making it something of a RESTful command line utility. 

AR.Lucene is very basic, but if there is enough interest, new features will be added over time.  

System Requirements

AR.Lucene is a Microsoft .Net 2.0 application that will run on modern Microsoft operating systems.  

It has only been tested with Lucene.Net 2.3.1.  It is possible, but not guaranteed that it will work with newer indexes and indexes created with the Java version of Lucene.

Installation

First, download the AR.Lucene executable arlucene.exe. AR.Lucene is NOT an installer - it is a command line executable without a Windows interface.  You should download it to your local hard drive and run it from the command prompt.  If you try to run it directly from your browser, you will get an error message.

AR.Lucene is a stand-alone Windows executable.  To install it, simply download it and put the executable in a convenient place on your hard drive.   For best results, place it somewhere on your search path. For information on setting your system path, see http://support.microsoft.com/kb/310519.

Usage

First, you need a Lucene index.  AR.Lucene cannot yet add documents to a Lucene index, so you will have to build one some other way for now.   For best results, the index should have been created with a current version of Lucene.Net.   

Help

To find out the current options of AR.Lucene, add -help or -? to the command line:

>arlucene -?

ARLucene  version 1.0.0.0

Copyright (c) 2009 Applied Relevance.

Options:

   -?, -help          Displays this help text

   -e, -erasedups     Erase duplicate entries in the index.

   -f, -fields        Display the specified field values.  You may specify

                      multiple -f options to include multiple fields.

   -m, -maxdocs       Maximum number of documents to return from the query.

                       Default is 10.  0 means return all matching documents

Commands:

   -a,                List all documents.  If you really want all

   -alldocuments      documents make sure to set MaxDocs to 0.

   -d, -duplicates    List duplicate documents based on the given -f field

   -l, -listfields    List all field names in the index.

   -q, -query         The query string to search for.

Required:

   -c, -collection    The path to the Lucene index directory to open.

 

Specifying the Index Path

All commands except -help require a path to a valid Lucene index.   Use the -c option to specify the path to the index (collection).

> arlucene -c c:\ardata\arindex

Errors:

   * At least one of the option "a", "d", "l", "q" must be specified

Listing Fields in the index

To list available fields, use the -L option

>arlucene -c c:\ardata\arindex -l

Opening collection c:\ardata\arindex

DC.FORMAT

DC.TYPE

DC.TYPE.INTERACTIVITYLEVEL

DC.IMAGESEARCH.CODE

DC.VODCAST.URL

urn:schemas.microsoft.com:fulltextqueryinfo:xmlfilter:rss/version

DC.SUBJECT.MISSION

DC.SUBJECT

DC.IDENTIFIER

DC.AUDIENCE.LEVEL

DC.RIGHTS

DC.IMGGALLERY.TNURL

DC.IMAGESEARCH.IMAGE_URL

DC.VODCAST.LABEL

LANGUAGE

DC.PODCAST.LABEL

Path

Querying the Index

To search the index, use the -q query command.

>arlucene -c c:\ardata\arindex -q nasa

Opening collection c:\ardata\arindex

Rank: 1

Score: 0.2328081

ID: 1339

AUTHOR: 65001

CMS DOCUMENT ID: 87028

CONTENT-TYPE: text/html; charset=UTF-8

DC.DATE.MODIFIED: 2008-02-12

<EOD>

...

Rank: 10

Score: 0.2055809

ID: 1412

AUTHOR: 65001

CMS DOCUMENT ID: 72082

CONTENT-TYPE: text/html; charset=UTF-8

DC.AUDIENCE: General Public, Informal Education, Press and Media, Parents, Students, Teachers

DC.CONTRIBUTOR: Brian Dunbar

<EOD>

Your search for body:nasa found 4557 documents in 31 ms.

Retrieved top 10 documents

 

Setting the number of documents to retrieve

It can take a long time to retrieve a large result set.  If your query returns thousands of documents, you could wait for hours for all the results to be streamed to your terminal window.   The -m maxdocs option allows you to set the maximum number of documents to retrieve from the search results.  The default value is 10.  Specify a value of 0 to bring back all results.

Specifying Fields to retrieve

By default, the q command returns all fields for the first 10 documents.   You can specify which fields to return in the results with the -f (field) option.  More than one field may be specified by adding additional -f field options to the command parameters.

>arlucene -c c:\ardata\arindex -q data -m 3 -f TITLE -f AUTHOR -f DC.SUBJECT

Opening collection c:\ardata\arindex

Rank: 1

Score: 0.2035873

ID: 452

TITLE: NASA - News - Highlights

AUTHOR: 65001

DC.SUBJECT: news, events

<EOD>

Rank: 2

Score: 0.2035873

ID: 567

TITLE: NASA - Missions Index Page

AUTHOR: 65001

DC.SUBJECT: NULL

<EOD>

Rank: 3

Score: 0.2035873

ID: 578

TITLE: NASA - About NASA

AUTHOR: 65001

DC.SUBJECT: NULL

<EOD>

Your search for body:data found 4032 documents in 15 ms.

Retrieved top 3 documents

Find duplicate documents

AR.Lucene can find and optionally delete duplicate documents based on any field in the index.   The default field name is "Path".   The -d (duplicates) option looks for documents with the same value for the given field. You can specify the field to compare with the -f field option.   

If you specify more than one field to compare, AR.Lucene will use the first one only.

 

>arlucene -c c:\ardata\arindex -d

Opening collection c:\ardata\arindex

http://www.nasa.gov/sitemap/sitemap_nasa.html (2)

http://www.nasa.gov/audience/forkids/kidsclub/flash/index.html (2)

...

C:\play\M Divisions Orientation Manual.pdf (8)

C:\play\mng_rc2_tp_email_jun2006.pdf (17)

C:\play\DMSDR1S-#3158827-v11-M Divisions Orientation Manual.pdf (4)

C:\play\powerpoint.ppt (6)

C:\play\word.doc (2)

Found 4961 documents

Found 212 duplicates.

Deleting Duplicate Documents

Duplicates located with the -d option can be deleted from the index with the -e erase option.  There is no guarantee about the order in which the extra documents will be deleted.

WARNING: Duplicate documents are deleted with extreme predjudice.   You are given no warning and  no opportunity to cancel the process.   You probably want to make a backup of the index before using the -e option.   This is the only warning you will receive.

>arlucene -c c:\ardata\arindex -d -e

Opening collection c:\ardata\arindex

http://www.nasa.gov/sitemap/sitemap_nasa.html (2)

http://www.nasa.gov/audience/forkids/kidsclub/flash/index.html (2)

http://www.nasa.gov/centers/kennedy/multimedia/index.html (2)

http://www.nasa.gov/centers/glenn/technology/index.html (2)

http://www.nasa.gov/centers/kennedy/stationpayloads/index.html (2)

...

Deleted document 4955

Deleted document 4956

Deleted document 4958

Deleted document 4959

Deleted document 4960

Found 4961 documents

Found 212 duplicates.

Deleted 249 documents.