In the ISYS.CFG file, which directs the Update process, each entry is on a separate line. Keywords are followed by at least one space. Blank lines and lines with an asterisk '*' as the first character are ignored. Each line is an entry, which may be a configuration definition or an indexing rule. The ISYS.CFG file may be encoded in either ANSI or UTF8 Unicode.
Within the ISYS.CFG file, each indexing rule applies to the document area identifier they follow. There may be more than one indexing rule for each document area identifier.
An indexing rule consists of three parts:
Additional Optional processing instructions can be added to the Folder and file patterns
The format keyword identifies the format in which the documents are stored, or that the documents should not be indexed. The file and directory specification pattern gives a directory and file pattern that documents must match. The optional processing indicates any additional requirements.
When comparing a document with the indexing rules, Perceptive Search works in the order in which the rules have been presented. The first configuration rule under the appropriate document area identifier that matches the file specification is the rule used for the document.
A list of ISYS.CFG keywords in included at the bottom for reference.
Sample of a segment of an ISYS.CFG:
Version 9 NAME My Documents Index FORMATS ASCII HTMLRAW MSG EXCEL RTF WINWORD PDF POWERPOINT WINWRITE WordPerfect FORMATS EML SIGNIFICANT 0123456789ŠŒšœŸÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿINSIGNIFICANT '/_`‘Concurrent DateRecognition SummaryIndexing FileNameIndexing MetaTitles SpellingTips CacheMetaData UNDER C:\ AUTO "DOCUMENTS AND SETTINGS\ISYSUSER\MY DOCUMENTS\**\*.*"
In the ISYS.CFG file, file formats included in the index are specified by Format keywords. These are:
Format | Description |
AMIPRO | AmiPro |
ASCII | Plain ASCII text files |
AUTO | Select from the formats listed in the FORMATS statements |
AUTOCAD | AutoCAD DWG files |
BYPASS | Files matching the file specification should not be indexed |
CHM | Microsoft compiled HTML help files, requires a HTML format to be included |
COMPUSERVE | CompuServe/e-mail, WinCIM 1 & 2 only. |
DBASEIII | dBASE III Plus & IV and FoxPro indexes |
DCA | IBM Document Content Architecture (Revisable Form Text) |
DW4 | DisplayWrite 4 |
DW5 | DisplayWrite 5 |
EML | Microsoft Outlook Express saved Emails |
EXCEL | Microsoft Excel |
EXTERNAL | Used for Perceptive Search OEM customers only |
FFT | IBM Document Content Architecture (Final Form Text) |
FLASH | Macromedia Flash files, requires the Flash viewer to be installed |
FW3 | FrameWork III |
HTML | HTML with codes hidden |
HTMLMETAONLY | HTML metadata only |
HTMLRAW | HTML with codes visible, recommend HTML format |
IFILTER | Include any additional format available via an IFilter |
JPEG | Index JPEG EXIF metadata |
MAILBOX | Sendmail format mailboxes, such as MBX files |
MANUSCRIPT | Lotus Manuscript version 2 |
MASS11 | MASS 11 version 8 |
MHT | Microsoft Internet Explorer web archive single file |
MP3 | MP3 IDv3 metadata |
MS_MDI | Microsoft Document Imaging file, indexes embedded text streams |
MSG | Microsoft Outlook saved email message format |
MSWORD | Microsoft Word for DOS versions 4 and 5 |
MSWORKS | Microsoft Works version 2, 3, 4 and 95 |
MULTIMATE | MultiMate version 3, Advantage, and 4 |
ONENOTE | Microsoft Onenote documents, textual content only |
OPENACCESS | Open Access II |
OPENOFFICE | OpenOffice and StarOffice formats |
Adobe Acrobat PDF | |
POWERPOINT | Microsoft PowerPoint |
PROWRITE | Professional Write II |
PST | Microsoft Outlook personal storage file for emails, requires Outlook to be installed |
Q&A | Q&A Write version 3.0, limited Q&A Write for Windows |
RTF | Microsoft RTF (Rich Text Format) |
SGML | SGML (excluding user defined entities) |
SOURCE | ASCII source file, where each line is considered a separate paragraph; this is designed for use with programming source code |
SPREADSHEET | Generic spreadsheet |
TIFF | Textual metadata tags include in TIFFs |
TRANSCRIPT | ASCII Transcript, requires specially formatted files, cannot be used with AUTO rules |
UNIPLEX | Uniplex |
VCARD | vCard, electronic business card format |
VISIO | Microsoft Visio |
WANGWP | Wang IWP version 3.0 |
WANGWPPLUS | Wang WP Plus |
WINMEDIA | Microsoft Windows Multimedia |
WINWORD | Microsoft Word for Windows |
WINWRITE | Windows Write |
WordPerfect | WordPerfect for DOS versions 5, 5.1 and WordPerfect for Windows |
WordPerfect42 | WordPerfect for DOS version 4.2 |
WORDPRO | Lotus WordPro, European language support only |
WORDSTAR | WordStar word processor |
WORDSTAR2000 | WordStar 2000 III Plus |
WORDSTAR4 | WordStar version 4 |
WORDSTAR5 | WordStar version 5 |
XPS | XML Paper Specification files |
XML | XML record files, for XML files containing multiple records |
XMLDOC | XML document files, for XML files contain a single record |
XYWRITE | XyWrite Plus 3.53 |
ZIP | Include files in ZIP files |
See the list of Perceptive Search supported File Types for more information on these file formats.
In the ISYS.CFG file, the document area identifier tells Perceptive Search to which drive the indexing rules apply. You cannot specify a directory with the document area identifier; it must be specified in the indexing rules.
An example of a document identifier is UNDER F:\
There may be zero or more document area identifiers in the ISYS.CFG file; these can be a drive letter or the word "NETWORK". If there is no identifier or the identifier UNDER HERE is used, Perceptive Search assumes the directory in which the ISYS.CFG file is located is the only directory to be indexed.
The order in which the document area identifiers are specified is unimportant. However, the indexing rules that specify which documents and directories to index or ignore must follow the document area identifier to which they apply.
Where the document area identifier is "UNDER NETWORK" the directory patterns make use of UNC's to unmapped network shares.
In the ISYS.CFG file, the file specification is a pattern used to determine whether the format applies to any given file. The DOS wild card characters ("*" and "?") may be used in the file name.
FolderName\FileName.ext
Include or exclude just this folder and file pattern
FolderName\**\FileName.ext
Include or exclude documents matching the file pattern in and under this folder
FolderName\*\FileName.ext
Include or exclude documents matching this file pattern under, but not in, this folder.
FolderName\*\temp\*\FileName.ext
Include or exclude all matching files in the sub-folders called temp under the FolderName folder.
The following are keywords that may appear in the ISYS.CFG file under particular circumstances:
VENTURA | Indicates that documents may contain Ventura Publisher paste up markers. |
IDENT | Combined with the DBASEIII format keyword, this is used to nominate a field to be used for identification, for XML record files this indicates the level at which to start new documents. |
FULLREC | For indexing both memo and non-memo fields in dBASE and FoxPro indexes. |
WIDELINES | Documents contain lines formatted wider than 78 characters. |
OEM | Characters should be interpreted using the current MS DOS OEM code page instead of the ANSI code page |
HARDSPACE | Indicates that each line of the document ends with a hard return and that two hard returns indicate a new paragraph. |
DOUBLESPACED | Indicates that the entire document is double-spaced and that three hard returns indicate a new paragraph. |
TITLE_FIELD | Used for XML record files to specify the field name to be used as the title. |
PLAIN | Ignore the current Default Option parameters for this rule. |
Examples:
ASCII *.TXT HARDSPACE
This indicates everything in the current directory that has an extension of *.TXT is to be indexed as an ASCII text document and all lines in the documents end with a hard return.
UNDER F:\
BYPASS REVENUE\**\DRAFT*.*
EXCEL REVENUE\**\*.XLS
AUTO REVENUE\**\*.*
Exclude all files that have a name starting with DRAFT under the REVENUE folder and all sub-folders; index all files with an XLS extension as Excel documents; all remaining file in the REVENUE folders are to index using AUTO logic with the specified file format.
UNDER F:\
XML REVENUE\*\**\ANNSUMM.XML IDENT 2 TITLE_FIELD FINCODE
All ANNSUMM.XML files under, but not in, the REVENUE sub-folder of F drive should be indexed as XML record file, where the level 2 layer is used to identify new records and the FINCODE value should be used as the record title.
The following keywords may be used in the ISYS.CFG file:
AnnotationDrive | Look for the annotations on the specified drive. e.g. AnnotationDrive F |
AnnotationIndexing | Index annotations so that they are searchable |
AutoIndexBackup | The number of automatic backups of the index that should be rotated |
CacheDocuments | The textual content of documents will be compressed and cached in the index to make document browsing and context results perform more rapidly |
CacheMetaData | A copy of the document meta data should be stored in the index for faster retrieval |
CacheSecurityDescriptors | Store a copy of NTFS file security information in the index for faster filtering |
CloseIndexEvery | Commit the working files into the Perceptive Search index every n Doc or Words. This option should be used with care as it can have a profound effect on indexing performance. |
Concurrent | Whether updating may be performed at the same time as queries are performed. Do not include this entry if you do not need concurrent updating mode |
DateRecognition | Enables dates to be handled intelligently in queries. Also allows dates to be used in range queries |
DefaultOption | Indexing rule options applied to all files |
DeferredDeleteCache | Number, a percentage figure for the maximum size deleted items can occupy in the index. The actual maximum in terms of number of documents will depend on the index. This number represents what percentage of maximum capacity you want. |
DotHandling | Dots occurring in the middle of a string of characters which appear to be forming a paragraph number (eg "3.2.12") are not treated as word separators. |
EntityRecogntion | Specifies that Perceptive Search should recognize and index entities (people, organizations, websites, email addresses and locations). |
ExplicitExtension | Over-rides or assists the standard automatic file format detection in Perceptive Search. The keyword is followed by a file extension, then a document format keyword, for example, "ExplicitExtension TXT ASCII" would tell Perceptive Search that any file with a TXT extension should be interpreted as an ASCII file if it cannot be positively identified as anything else. |
FileNameIndexing | Whether file names are to be indexed |
Formats | The list of possible formats for indexed documents. For more information, see Format keywords |
FrontPage | Path to a document used as a front page for the index |
FuzzyPrecompensation | Attempt to compensate for OCR and typographical errors |
ImagesWithDocs | Forces Perceptive Search to assume that associated images are in the same locations as documents. Should only be used by CD publishers who use the AnnotationDrive option |
Include | Points to another index for chaining indexes together |
IndexType | Set for website spider indexes to Spider |
Insignificant | The list of characters to be considered insignificant |
Language |
The language of the documents in the index, may be one of the following European,
InsignifAccents, Korean, ChineseTraditional, ChineseHK, Japanese, Arabic, ChineseSimplified,
Cyrillic, Greek, Turkish, Hebrew, Vietnamese, Baltic, CentralEuropean or Unicode.
You can also specify an ISO639 2 digit language code after the primary language, this will control which common word set or other language specific options will be used during indexing. |
LanguageDetect | Enabled language detection for the index, all documents will have a metadata field called ISYS_LANG injected that stores the ISO 639 2 digit language code. |
Latency | Set the latency period to a number of minutes. e.g. Latency 120, This determines that documents will not be added to the index until they have remained unchanged for the specified number of minutes. |
Load | Specifies an External Access Module (DLL) to load, applicable to SDK. |
MaxWordLength | Maximum length of a word, default is 20, maximum is 64, increases the space required proportionally. |
MetaDataLimit | Controls the maximum amount of metadata which will be stored by the CacheMetaData keyword. Defaults to 6kb, and may never exceed 64k per document. |
MetaTitles | Document title should be set to the TITLE information in the document meta data, requires SummaryIndexing. |
Name | The index name. |
NGRAM | Enables NGRAM indexing; choices are On, Off, or Asian. There is an optional setting for the NGRAM minimum/maximum range; the default is 2/6. (Example: "NGRAM On 5 8"). The Asian option enables NGRAM indexing for Asian text only. For more information about NGRAMS, click here. |
NonUnicodeCodePage | The code page to use for non-Unicode documents (such as ASCII Text files) in a Unicode index. |
NullParasCount | Normally empty paragraphs are ignored by Perceptive Search setting this option allows the use of empty paragraph in positional searching. |
NumberRecognition | Enables numbers to be handled intelligently in queries, including numerical ranges. |
NumbersCommon | Whether pure numbers are to be considered common. Do not include this entry if you want to index pure numbers. |
PathToDocs | Allows relative indexes to be moved for distribution. |
ResetLastAccessDates | After documents have been indexed, Perceptive Search will reset the operating system "Last Accessed" date to make it appear the document was not accessed. |
Significant | The list of characters to be considered significant. |
SortAccelerator | Sort accelerators cache extra information in the index that allows for faster sorting. You can apply an accelerator to any standard Perceptive Search sort method, or any metadata field. The syntax is a semicolon list of fields. A numeric value will be considered to be one of the Perceptive Search standard sort methods, a string value is considered to be a metadata field. You can force a metadata field to be treated as a specific data type by prefixing it with i: for integer or f: for float. Example: 4;7;f:Cost |
SpellingTips | Create spelling tips for this index. |
Static | Entries indicating those directories to be considered static- e.g. not to be included in Updates. |
SummaryIndexing | Index document meta data or summary for Word, Excel, Powerpoint, PDF or WordPerfect. |
SuppressEmailFileAttachments | Do not index attachments in file system emails, such as MSG, EML and MBX files. |
TitleAfter | Consider the title of a document to start after the nth line. For instance, TitleAfter 4. |
TitleContains | Consider the title of a document to be the first line after the TitleAfter line that contains the string of characters, e.g. TitleContains Re:. |
Version | The version of Perceptive Search, used to create this ISYS.CFG file. |
WideMeans | When using WIDELINES as a rule modifier the default width is 128 characters. Here this may be increased to be up to 255 characters. |
WorkFiles | Store the Perceptive Search temporary files for this index in the specified directory. By default, Perceptive Search will automatically either place the workfiles in your Windows temporary directory, or in the index directory, whichever is more efficient. |