Activate (or de-activate) this option by selecting the checkbox.
When you choose to allow the users to access the index at the same time as an index update, it will typically degrade performance of both the query and update processes by about 15%. However, this is offset by the benefit of updating the index without having to log off your query users before running the update.
Note: You must not change this option while there are any users using the index. Take the index offline before changing this setting.
Use this option to select the language that documents to be indexed by Perceptive Enterprise Search are written in. Selecting Korean, Chinese, Hong Kong Chinese or Japanese enables multi-byte character support in Perceptive Enterprise Search. This will not change the language of the documents being indexed.
Setting this option indexes all characters as Unicode, using the encoding of the document where available (Formats that specify encoding include Microsoft Office formats and Adobe Acrobat files). For documents that do not specify their encoding, such as text files, the encoding set in the option above will be used.
This allows a single index to contain documents from multiple languages, e.g. French, Chinese, Russian and English in a single index.
Setting this option will lead to larger index file.
The language pack controls language specific settings for the index, such as common words on synonyms.
When this option is enabled, Perceptive Enterprise Search will try to determine the language of the document. It will inject a metadata property called ISYS_LANG that includes the two character language code. You can then use this value to search on document in a given language, or to refine a result list to a particular language.
These are individual characters which convey meaning and are an important part of a word. Although only the numbers 0 through 9 appear in this field the letters A to Z, in both upper and lower case, and the international character set are also considered significant and are automatically placed in this category. You may want to specify other characters to suit your purposes. For instance, the dollar sign ($) if you need to search prices. As an example, if the hyphen was made significant then post-graduate would be indexed as post-graduate and a query for postgraduate would not return post-graduate and vice versa.
To add a new significant character enter it into the Significant field. To delete a significant character simply delete it from the field.
You must perform a Reindex
for any changes made to significant characters to take effect.
These are individual characters which are part of a word but are not regarded as important. Insignificant characters are treated as if they are invisible. For instance, if the hyphen is made insignificant then words which are hyphenated will be treated as one word, for example post-graduate would be indexed as postgraduate.
The default insignificant characters are the apostrophe ('), underscore(_) and single quote(' '). To add a new insignificant character enter it in the insignificant field. To delete an insignificant character simply delete it from the field.
Intelligent recognition options allows Perceptive Enterprise Search to automatically identify parts of the text as either dates, numbers or entities.
By selecting this option Perceptive Enterprise Search will be able to recognize a variety of date formats in queries and documents. Examples of valid dates are:
Dates are located regardless of the form in which they are expressed in the query or in the document. For the purposes of proximity searching the date is considered to be a single hit, even if the date is actually expressed in three or four words. The exact location of the hit is taken to be that of the last component of the date. This is why only the final portion of the date sequence is highlighted in the Browse window.
Selecting this option will slightly increase the size of your indexes. Unless your
data is mostly dates, the increase in index size should not be significant. Enabling
the feature may also slightly reduce indexing performance.
If you intend to use the Intelligent Date Handling option on highly numeric data files, such as financial transactions that contain primarily numeric data intermixed with dates, and if you expect the dates to be in YY-MM-DD format or MM/DD/YY format, it is recommended that you configure your index with the "-"or "/" character defined as either significant or insignificant (see below). That is, do not take the default of "-" being interpreted as a word delimiter (punctuation character). This will greatly assist Perceptive Enterprise Search in making the correct interpretation of the dates when they appear in highly numeric data.
Where a date is ambiguous, for example 1-4-94, Perceptive Enterprise Search resolves the day/month ambiguity according to the regional settings that were in effect when you started Perceptive Enterprise Search (i.e. the Regional options under Windows Control Panel).
Note that any ambiguity in the indexed data is resolved at the time the data is indexed, whereas ambiguities in a query are resolved at the time of the query. This means that you can, for example, index your source data with ambiguities resolved according to your REGIONAL settings, then ship the index to another user running a different REGIONAL setting. Perceptive Enterprise Search automatically normalizes and remaps the ambiguities accordingly.
This option works in similar way to the Intelligent Date recognition option but for numbers only. For example, the number 1,029 could be expressed in any of the following ways:
1029
1,029
one thousand and twenty nine
one thousand, twenty nine
1,029.00
1 029
one zero two nine
a thousand and twenty nine
Perceptive Enterprise Search will interpret each of the above examples as the same number.
The Index dots when embedded in words or numbers option will be automatically enabled when Intelligent Number recognition is used.
When this option is selected, Perceptive Enterprise Search will identify the "who, what and where" entities involved in documents. The entities Perceptive Enterprise Search recognizes include:
It does this automatically using a combination of heuristic and dictionary based techniques.
When queries are performed Perceptive Enterprise Search will find all the documents which match your search terms. As well as displaying the found documents, it will also provide an outline of the main entities involved in the cluster of found documents. This can be used to identify the key players related to your search term.
You can then drill-down by clicking on one of the entities. The presence of an entity may even suggest a whole new line of inquiry. Sometimes the entity itself will constitute the answer for which you are looking (for example, what company does John Smith work for).
Perceptive Enterprise Search Entity detection works very well without any sort of user configuration. However, because it is being done by a computer rather than a person, it will occasionally mis-categorize entities. For example, "John Street". In general, Perceptive Enterprise Search tries to avoid making judgments which should require human knowledge to make properly.
Other times, Perceptive Enterprise Search may fail to recognize entities about which you care greatly. For example, you may be involved with organizations whose names are not being reliably detected by Perceptive Enterprise Search, or dealing with individuals with unusual names. In these scenarios, you can augment the standard Perceptive Enterprise Search lexicon to include your own local knowledge.
Choosing this option will allow Perceptive Enterprise Search to index the summary information in formats that support it. These formats include Microsoft Word, Excel, WordPerfect, Acrobat PDF and HTML. This information can then be searched on and will appear at the top of each document when browsed. This information can also be used in Field Level Searching: see Named Sections in the Query help.
Store a copy of the metadata information in the index for faster retrieval and processing. Allows for metadata to be used for document categories. Recommended if Index document metadata is enabled.
If selected Perceptive Enterprise Search will automatically detect and index any text note annotations that have been created for documents in the index.
Because this option affects which files will and will not be indexed (much like an indexing rule), rather than affecting how previously indexed documents should be read (like Intelligent Date Handling or Smart Dot Handling), you can change it without Reindexing the index. Changes will be reflected in the next Update run just like a rule change.
Note that only the text annotations are indexed - not hyperactivities, images or linked queries.
If you choose to index filenames you can then search for files by their names, extensions or any portion thereof. For example:
TEST.DOC
TEST
TE*
MYDIR \\ ASC
Perceptive Enterprise Search indexes each portion of the filename as though it were a word located at the beginning of the document.
When this option is checked, dots occurring in the middle of a string of characters are not treated as word separators. Dots are considered significant in cases like 3.2.12 but not significant at the end or start of a word.
Also known as Fuzzy Pre-compensation.
The use of Fuzzy Pre-compensation is appropriate when you know your source data is likely to have a high proportion of errors as a result of being captured by means of Optical Character Recognition (OCR) scanning.
When Fuzzy pre-compensation is chosen, Perceptive Enterprise Search queries will automatically and transparently retrieve words that it considers may be OCR scanning errors or other typographical errors. For example, searches for "duck" will also retrieve "cluck", as it is possible that the "d" was slightly broken and misread as a "cl" by the OCR process.
Perceptive Enterprise Search uses sophisticated heuristic, algorithmic and statistical means to determine which words are likely errors of other words. In some cases, Perceptive Enterprise Search may incorrectly suggest that one word is a misspelling of another. There will always be some degree of 'false alarms'.
It is recommended you inform your users when an index has been configured with the fuzzy pre-compensation feature enabled. This should avoid any possible confusion as to why additional words are being returned during their query activities.
Note that use of this feature will slightly increase your index size and indexing time.
This option treats pure numbers or words which are made exclusively of numerals (e.g. 765390) as common words. This means these numbers or words will not be indexed, and thus not be searchable. Note that this option has no effect on alphanumeric strings, e.g. A1234B, which are always indexed.
Note that changing this option will require you to perform a Reindex.
Creates a list of similar words for the index. This allows for 'Did you mean?' prompts for queries.
Cache the NTFS file security descriptor at indexing time. This dramatically increases the speed at which result lists can be filtered to show only documents that are accessible to the current user.
The cached security information will be updated when documents are updated in the index. To force an update of the cache use Refresh Security. This can also be updated via a Scheduled Update task.
Note that some files may be shown in the result list that the user no longer has access to, though the document itself will not be able to be viewed.
Indicates that the documents should be de-duplicated at query time. De-duplication is based on the text content of the document, it is possible that Perceptive Enterprise Search may recognize the same document being stored in two different formats.
To fully enable this option, go to WebSites --> Default --> Search Settings --> Search Defaults and perform the following steps:
Indicates the quality of the documents within this index. This value is used when calculating document relevance across multiple indexes. Documents from an index rated as high quality should appear higher in the results than documents in an index rated as low quality.
Note: this value is used as a factor in the relevance algorithm and does not guarantee that document will appear higher.
Use title from document metadata or summary information if available.
If Perceptive Enterprise Search finds metadata in your indexed documents it will use the title included
in this metadata as the document title.
This option requires metadata indexing to be enabled.
On or after line - Perceptive Enterprise Search will use the first non-blank line of the document after the line number you specify as the document title.
That contains - Perceptive Enterprise Search will use the first line of the document that contains text you specify as the document title.
You can combine the two parameters if required. For example if the string Re: appears somewhere after line number 4, then enter 4 into the After line number field and Re: into the That contains field.
You must perform a Reindex for any changes made to Document titles to take effect.
Option to automatically keep multi-generational backups of indexes. Backups are always placed in a sub-folder of the current index location called ISYSIndexBackups with individual folders for each version.
The index is backed up every time the index is modified. This can result in multiple backups for a single index update run.
Sometimes the slowest part of an index update can be removing references to changed
or deleted documents. Perceptive Enterprise Search lets you defer this work until later. Then Perceptive Enterprise Search only
'marks' those documents as no longer existing, or no longer existing in their old
form. The benefit is your index updates may go much faster where changed or deleted
documents are involved. The deferred documents remain in the index taking up space,
but will never be found by your queries.
Documents marked for deferred deindexing are automatically purged when the amount
of space used by them reaches about 20% of your total index size. This way you never
have to worry about deferred deindexing consuming large amounts of space (the actual
proportion is determined by many factors including the number of words and documents
in the index).
Additionally you may manually purge deferred deleted items at any time using the
Purge Deleted option.
These options determine how indexing of documents occur. These settings apply for all email attachments, SQL BLOBS, Lotus Notes attachments and other non-file system objects. The options can be overridden for File and FTP indexing rules.
Document caching stores the textual content of each document within the index. This gives greater performance when displaying the document content on the website. It is most useful when using the {CONTEXT} tag, or for complex document formats such as PDF.
Note: This option approximately doubles the size of your index.