Class PdfLibrary
-
- All Implemented Interfaces:
public final class PdfLibrary
PdfLibrary implements a SQLite-based full-text-search engine. You can register documents to be indexed in the background and then search for keywords within that collection. There can be multiple libraries, although usually one is enough for the common use case.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public @interface
PdfLibrary.Tokenizer
-
Field Summary
Fields Modifier and Type Field Description public final static String
PORTER_TOKENIZER
public final static String
UNICODE_TOKENIZER
-
Method Summary
Modifier and Type Method Description static PdfLibrary
get(@NonNull() String path)
Returns a library for a given path. static PdfLibrary
get(@NonNull() String path, @NonNull() String tokenizer)
Returns a library for a given path. boolean
getSaveReverseText()
Indicates whether saving the reverse text is enabled. void
setSaveReverseText(boolean saveReverseText)
Will save a reversed copy of the original page text. void
search(@NonNull() String searchString, @Nullable() QueryOptions options, @NonNull() QueryResultListener resultListener)
Query the database for a match of searchString. boolean
isIndexing()
Indicates whether the indexing is in progress or not. List<String>
getQueuedUIDs()
Returns list of UIDs of documents queued for indexing. List<String>
getIndexedUIDs()
Returns list of UIDs of documents currently indexed. int
size()
Returns number of indexed documents in this library. LibraryIndexStatus
getIndexStatusForUID(@NonNull() String uid)
Returns indexing status for a document with passed UID. void
enqueueDocuments(@NonNull() List<PdfDocument> documents)
Queues an array of documents for indexing. void
enqueueDocuments(@NonNull() List<PdfDocument> documents, @NonNull() IndexingOptions indexingOptions)
Queues an array of documents for indexing. void
enqueueDocumentSources(@NonNull() List<DocumentSource> documentSources)
Queues an array of documents for indexing. void
enqueueDocumentsWithMetadata(@NonNull() List<Pair<PdfDocument, Array<byte>>> documents)
Queues an array of documents for indexing together with passed free-form metadata. void
enqueueDocumentsWithMetadata(@NonNull() List<Pair<PdfDocument, Array<byte>>> documents, @NonNull() IndexingOptions indexingOptions)
Queues an array of documents for indexing together with passed free-form metadata. void
enqueueDocumentSourcesWithMetadata(@NonNull() List<Pair<DocumentSource, Array<byte>>> documentSources)
Queues an array of documents for indexing together with passed free-form metadata. Array<byte>
getMetadataForUID(@NonNull() String uid)
Returns metadata appended to document with enqueueDocumentsWithMetadata call. void
removeDocuments(@NonNull() List<String> documentUIDs)
Invalidates index for documents. void
clearIndex()
Completely clears the index for this library. void
stopSearch()
Stops search and all in-progress preview text generator tasks. void
addLibraryIndexingListener(@NonNull() LibraryIndexingListener listener)
Adds a LibraryIndexingListener to monitor document indexing status. void
removeLibraryIndexingListener(@NonNull() LibraryIndexingListener listener)
Removes a registered LibraryIndexingListener added with addLibraryIndexingListener. -
-
Method Detail
-
get
static PdfLibrary get(@NonNull() String path)
Returns a library for a given path. If no library exists for this path yet, this method will create and return one.
- Parameters:
path
- Writable path to library database file.
-
get
static PdfLibrary get(@NonNull() String path, @NonNull() String tokenizer)
Returns a library for a given path. If no library exists for this path yet, this method will create and return one.
- Parameters:
path
- Writable path to library database file.tokenizer
- The tokenizer to use, one of PORTER_TOKENIZER or UNICODE_TOKENIZER.
-
getSaveReverseText
boolean getSaveReverseText()
Indicates whether saving the reverse text is enabled.
- Returns:
true
if saving reverse text is enabled,false
otherwise.
-
setSaveReverseText
void setSaveReverseText(boolean saveReverseText)
Will save a reversed copy of the original page text. If enabled the index database will be about 2x bigger, but ends-with matches will be enabled.
- Parameters:
saveReverseText
-true
to save reversed text to index,true
by default.
-
search
void search(@NonNull() String searchString, @Nullable() QueryOptions options, @NonNull() QueryResultListener resultListener)
Query the database for a match of searchString. Only direct matches, begins-with and ends-with matches are supported. Returns a map of document UIDs to set of pages matching inside that document.
- Parameters:
searchString
- String to search for.options
- Options object determining search behaviour.resultListener
- Callback listener which will be called with search results.
-
isIndexing
boolean isIndexing()
Indicates whether the indexing is in progress or not.
- Returns:
true
if indexing is in progress,false
otherwise.
-
getQueuedUIDs
@NonNull() List<String> getQueuedUIDs()
Returns list of UIDs of documents queued for indexing.
- Returns:
List of queued UIDs.
-
getIndexedUIDs
@NonNull() List<String> getIndexedUIDs()
Returns list of UIDs of documents currently indexed.*
- Returns:
List of indexed UIDs.
-
size
int size()
Returns number of indexed documents in this library.
- Returns:
number of documents that have finished indexing.
-
getIndexStatusForUID
@NonNull() LibraryIndexStatus getIndexStatusForUID(@NonNull() String uid)
Returns indexing status for a document with passed UID.
- Parameters:
uid
- UID of the document- Returns:
Indexing status of a document with provided UID.
-
enqueueDocuments
void enqueueDocuments(@NonNull() List<PdfDocument> documents)
Queues an array of documents for indexing. Any documents already queued or fully indexed will be ignored.
NOTE: This call requires all documents to be opened when indexing and will most likely lead to out of memory conditions if a lot of documents are passed. Prefer to use enqueueDocumentSources if possible!
- Parameters:
documents
- List of documents to index.
-
enqueueDocuments
void enqueueDocuments(@NonNull() List<PdfDocument> documents, @NonNull() IndexingOptions indexingOptions)
Queues an array of documents for indexing. Any documents already queued or fully indexed will be ignored.
- Parameters:
documents
- List of documents to index.indexingOptions
- Options for indexing the given documents.
-
enqueueDocumentSources
void enqueueDocumentSources(@NonNull() List<DocumentSource> documentSources)
Queues an array of documents for indexing. Any documents already queued or fully indexed will be ignored. This call will avoid opening documents until they're indexed and it's thus significantly more memory friendly than enqueueDocuments.
- Parameters:
documentSources
- List of document sources to index.
-
enqueueDocumentsWithMetadata
void enqueueDocumentsWithMetadata(@NonNull() List<Pair<PdfDocument, Array<byte>>> documents)
Queues an array of documents for indexing together with passed free-form metadata. Metadata can be retrieved after indexing with getMetadataForUID method call.
NOTE: This call requires all documents to be opened when indexing and will most likely lead to out of memory conditions if a lot of documents are passed. Prefer to use enqueueDocumentSources if possible!
Any documents already queued or fully indexed will be ignored.
- Parameters:
documents
- List of documents to index with metadata to be stored.
-
enqueueDocumentsWithMetadata
void enqueueDocumentsWithMetadata(@NonNull() List<Pair<PdfDocument, Array<byte>>> documents, @NonNull() IndexingOptions indexingOptions)
Queues an array of documents for indexing together with passed free-form metadata. Metadata can be retrieved after indexing with getMetadataForUID method call.
Any documents already queued or fully indexed will be ignored.
- Parameters:
documents
- List of documents to index with metadata to be stored.indexingOptions
- Options for indexing the given documents.
-
enqueueDocumentSourcesWithMetadata
void enqueueDocumentSourcesWithMetadata(@NonNull() List<Pair<DocumentSource, Array<byte>>> documentSources)
Queues an array of documents for indexing together with passed free-form metadata. This call will avoid opening documents until they're indexed and it's thus significantly more memory friendly than enqueueDocumentsWithMetadata.
Metadata can be retrieved after indexing with getMetadataForUID method call.
Any documents already queued or fully indexed will be ignored.
- Parameters:
documentSources
- List of document sources to index.
-
getMetadataForUID
@Nullable() Array<byte> getMetadataForUID(@NonNull() String uid)
Returns metadata appended to document with enqueueDocumentsWithMetadata call.
- Parameters:
uid
- UID of the document.- Returns:
Metadata for the passed document or null if no metadata was found.
-
removeDocuments
void removeDocuments(@NonNull() List<String> documentUIDs)
Invalidates index for documents.
- Parameters:
documentUIDs
- List of document UIDs to be invalidated.
-
clearIndex
void clearIndex()
Completely clears the index for this library.
-
stopSearch
void stopSearch()
Stops search and all in-progress preview text generator tasks.
-
addLibraryIndexingListener
void addLibraryIndexingListener(@NonNull() LibraryIndexingListener listener)
Adds a LibraryIndexingListener to monitor document indexing status. If the listener has already been added previously, this method will be a no-op. Adding
null
is not allowed, and will result in an exception.- Parameters:
listener
- LibraryIndexingListener that should be notified.
-
removeLibraryIndexingListener
void removeLibraryIndexingListener(@NonNull() LibraryIndexingListener listener)
Removes a registered LibraryIndexingListener added with addLibraryIndexingListener. Upon calling this method the
listener
will no longer be notified of any changes. If the listener has not been added, this method will be a no-op. Addingnull
is not allowed,and will result in an exception.- Parameters:
listener
- LibraryIndexingListener that should be removed.
-
-
-
-