Help > Reference > ISYS.CFG Format

ISYS.CFG Format

In the ISYS.CFG file, which directs the Update process, each entry is on a separate line. Keywords are followed by at least one space. Blank lines and lines with an asterisk '*' as the first character are ignored. Each line is an entry, which may be a configuration definition or an indexing rule. The ISYS.CFG file may be encoded in either ANSI or UTF8 Unicode.

Within the ISYS.CFG file, each indexing rule applies to the document area identifier they follow. There may be more than one indexing rule for each document area identifier.

An indexing rule consists of three parts:

Format keyword

Document area identifiers

Folder and file pattern

Additional Optional processing instructions can be added to the Folder and file patterns

The format keyword identifies the format in which the documents are stored, or that the documents should not be indexed. The file and directory specification pattern gives a directory and file pattern that documents must match. The optional processing indicates any additional requirements.

When comparing a document with the indexing rules, Perceptive Search works in the order in which the rules have been presented. The first configuration rule under the appropriate document area identifier that matches the file specification is the rule used for the document.

A list of ISYS.CFG keywords in included at the bottom for reference.

ISYS.CFG syntax

Sample of a segment of an ISYS.CFG:

Version 9
NAME My Documents Index

FORMATS ASCII HTMLRAW MSG EXCEL RTF WINWORD PDF POWERPOINT WINWRITE WordPerfect
FORMATS EML

SIGNIFICANT 0123456789ŠŒšœŸÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿINSIGNIFICANT '/_`‘Concurrent
DateRecognition
SummaryIndexing
FileNameIndexing
MetaTitles
SpellingTips
CacheMetaData

UNDER C:\
  AUTO            "DOCUMENTS AND SETTINGS\ISYSUSER\MY DOCUMENTS\**\*.*"

Format keywords

In the ISYS.CFG file, file formats included in the index are specified by Format keywords. These are:

FormatDescription
AMIPRO AmiPro
ASCII Plain ASCII text files
AUTO Select from the formats listed in the FORMATS statements
AUTOCAD AutoCAD DWG files
BYPASS Files matching the file specification should not be indexed
CHM Microsoft compiled HTML help files, requires a HTML format to be included
COMPUSERVE CompuServe/e-mail, WinCIM 1 & 2 only.
DBASEIII dBASE III Plus & IV and FoxPro indexes
DCA IBM Document Content Architecture (Revisable Form Text)
DW4 DisplayWrite 4
DW5 DisplayWrite 5
EML Microsoft Outlook Express saved Emails
EXCEL Microsoft Excel
EXTERNAL Used for Perceptive Search OEM customers only
FFT IBM Document Content Architecture (Final Form Text)
FLASH Macromedia Flash files, requires the Flash viewer to be installed
FW3 FrameWork III
HTML HTML with codes hidden
HTMLMETAONLY HTML metadata only
HTMLRAW HTML with codes visible, recommend HTML format
IFILTER Include any additional format available via an IFilter
JPEG Index JPEG EXIF metadata
MAILBOX Sendmail format mailboxes, such as MBX files
MANUSCRIPT Lotus Manuscript version 2
MASS11 MASS 11 version 8
MHT Microsoft Internet Explorer web archive single file
MP3 MP3 IDv3 metadata
MS_MDI Microsoft Document Imaging file, indexes embedded text streams
MSG Microsoft Outlook saved email message format
MSWORD Microsoft Word for DOS versions 4 and 5
MSWORKS Microsoft Works version 2, 3, 4 and 95
MULTIMATE MultiMate version 3, Advantage, and 4
ONENOTE Microsoft Onenote documents, textual content only
OPENACCESS Open Access II
OPENOFFICE OpenOffice and StarOffice formats
PDF Adobe Acrobat PDF
POWERPOINT Microsoft PowerPoint
PROWRITE Professional Write II
PST Microsoft Outlook personal storage file for emails, requires Outlook to be installed
Q&A Q&A Write version 3.0, limited Q&A Write for Windows
RTF Microsoft RTF (Rich Text Format)
SGML SGML (excluding user defined entities)
SOURCE ASCII source file, where each line is considered a separate paragraph; this is designed for use with programming source code
SPREADSHEET Generic spreadsheet
TIFF Textual metadata tags include in TIFFs
TRANSCRIPT ASCII Transcript, requires specially formatted files, cannot be used with AUTO rules
UNIPLEX Uniplex
VCARD vCard, electronic business card format
VISIO Microsoft Visio
WANGWP Wang IWP version 3.0
WANGWPPLUS Wang WP Plus
WINMEDIA Microsoft Windows Multimedia
WINWORD Microsoft Word for Windows
WINWRITE Windows Write
WordPerfect WordPerfect for DOS versions 5, 5.1 and WordPerfect for Windows
WordPerfect42 WordPerfect for DOS version 4.2
WORDPRO Lotus WordPro, European language support only
WORDSTAR WordStar word processor
WORDSTAR2000 WordStar 2000 III Plus
WORDSTAR4 WordStar version 4
WORDSTAR5 WordStar version 5
XPS XML Paper Specification files
XML XML record files, for XML files containing multiple records
XMLDOC XML document files, for XML files contain a single record
XYWRITE XyWrite Plus 3.53
ZIP Include files in ZIP files

See the list of Perceptive Search supported File Types for more information on these file formats.

Document area identifier

In the ISYS.CFG file, the document area identifier tells Perceptive Search to which drive the indexing rules apply. You cannot specify a directory with the document area identifier; it must be specified in the indexing rules.

An example of a document identifier is UNDER F:\

There may be zero or more document area identifiers in the ISYS.CFG file; these can be a drive letter or the word "NETWORK". If there is no identifier or the identifier UNDER HERE is used, Perceptive Search assumes the directory in which the ISYS.CFG file is located is the only directory to be indexed.

The order in which the document area identifiers are specified is unimportant. However, the indexing rules that specify which documents and directories to index or ignore must follow the document area identifier to which they apply.

Where the document area identifier is "UNDER NETWORK" the directory patterns make use of UNC's to unmapped network shares.

File and folder specification pattern

In the ISYS.CFG file, the file specification is a pattern used to determine whether the format applies to any given file. The DOS wild card characters ("*" and "?") may be used in the file name.

FolderName\FileName.ext

Include or exclude just this folder and file pattern

FolderName\**\FileName.ext

Include or exclude documents matching the file pattern in and under this folder

FolderName\*\FileName.ext

Include or exclude documents matching this file pattern under, but not in, this folder.

FolderName\*\temp\*\FileName.ext

Include or exclude all matching files in the sub-folders called temp under the FolderName folder.

Optional processing keywords for file and folder patterns

The following are keywords that may appear in the ISYS.CFG file under particular circumstances:

VENTURA Indicates that documents may contain Ventura Publisher paste up markers.
IDENT Combined with the DBASEIII format keyword, this is used to nominate a field to be used for identification, for XML record files this indicates the level at which to start new documents.
FULLREC For indexing both memo and non-memo fields in dBASE and FoxPro indexes.
WIDELINES Documents contain lines formatted wider than 78 characters.
OEM Characters should be interpreted using the current MS DOS OEM code page instead of the ANSI code page
HARDSPACE Indicates that each line of the document ends with a hard return and that two hard returns indicate a new paragraph.
DOUBLESPACED Indicates that the entire document is double-spaced and that three hard returns indicate a new paragraph.
TITLE_FIELD Used for XML record files to specify the field name to be used as the title.
PLAIN Ignore the current Default Option parameters for this rule.

Examples:

    ASCII *.TXT HARDSPACE

This indicates everything in the current directory that has an extension of *.TXT is to be indexed as an ASCII text document and all lines in the documents end with a hard return.

UNDER F:\
    BYPASS  REVENUE\**\DRAFT*.*
    EXCEL   REVENUE\**\*.XLS
    AUTO    REVENUE\**\*.*

Exclude all files that have a name starting with DRAFT under the REVENUE folder and all sub-folders; index all files with an XLS extension as Excel documents; all remaining file in the REVENUE folders are to index using AUTO logic with the specified file format.

UNDER F:\
    XML REVENUE\*\**\ANNSUMM.XML IDENT 2 TITLE_FIELD FINCODE

All ANNSUMM.XML files under, but not in, the REVENUE sub-folder of F drive should be indexed as XML record file, where the level 2 layer is used to identify new records and the FINCODE value should be used as the record title.

 

ISYS.CFG Keywords

The following keywords may be used in the ISYS.CFG file:

AnnotationDrive Look for the annotations on the specified drive. e.g. AnnotationDrive F
AnnotationIndexing Index annotations so that they are searchable
AutoIndexBackup The number of automatic backups of the index that should be rotated
CacheDocuments The textual content of documents will be compressed and cached in the index to make document browsing and context results perform more rapidly
CacheMetaData A copy of the document meta data should be stored in the index for faster retrieval
CacheSecurityDescriptors Store a copy of NTFS file security information in the index for faster filtering
CloseIndexEvery Commit the working files into the Perceptive Search index every n Doc or Words. This option should be used with care as it can have a profound effect on indexing performance.
Concurrent Whether updating may be performed at the same time as queries are performed. Do not include this entry if you do not need concurrent updating mode
DateRecognition Enables dates to be handled intelligently in queries. Also allows dates to be used in range queries
DefaultOption Indexing rule options applied to all files
DeferredDeleteCache Number, a percentage figure for the maximum size deleted items can occupy in the index. The actual maximum in terms of number of documents will depend on the index. This number represents what percentage of maximum capacity you want.
DotHandling Dots occurring in the middle of a string of characters which appear to be forming a paragraph number (eg "3.2.12") are not treated as word separators.
EntityRecogntion Specifies that Perceptive Search should recognize and index entities (people, organizations, websites, email addresses and locations).
ExplicitExtension Over-rides or assists the standard automatic file format detection in Perceptive Search. The keyword is followed by a file extension, then a document format keyword, for example, "ExplicitExtension TXT ASCII" would tell Perceptive Search that any file with a TXT extension should be interpreted as an ASCII file if it cannot be positively identified as anything else.
FileNameIndexing Whether file names are to be indexed
Formats The list of possible formats for indexed documents. For more information, see Format keywords
FrontPage Path to a document used as a front page for the index
FuzzyPrecompensation Attempt to compensate for OCR and typographical errors
ImagesWithDocs Forces Perceptive Search to assume that associated images are in the same locations as documents. Should only be used by CD publishers who use the AnnotationDrive option
Include Points to another index for chaining indexes together
IndexType Set for website spider indexes to Spider
Insignificant The list of characters to be considered insignificant
Language The language of the documents in the index, may be one of the following European, InsignifAccents, Korean, ChineseTraditional, ChineseHK, Japanese, Arabic, ChineseSimplified, Cyrillic, Greek, Turkish, Hebrew, Vietnamese, Baltic, CentralEuropean or Unicode.

You can also specify an ISO639 2 digit language code after the primary language, this will control which common word set or other language specific options will be used during indexing.

LanguageDetect Enabled language detection for the index, all documents will have a metadata field called ISYS_LANG injected that stores the ISO 639 2 digit language code.
Latency Set the latency period to a number of minutes. e.g. Latency 120, This determines that documents will not be added to the index until they have remained unchanged for the specified number of minutes.
Load Specifies an External Access Module (DLL) to load, applicable to SDK.
MaxWordLength Maximum length of a word, default is 20, maximum is 64, increases the space required proportionally.
MetaDataLimit Controls the maximum amount of metadata which will be stored by the CacheMetaData keyword. Defaults to 6kb, and may never exceed 64k per document.
MetaTitles Document title should be set to the TITLE information in the document meta data, requires SummaryIndexing.
Name The index name.
NGRAM Enables NGRAM indexing; choices are On, Off, or Asian. There is an optional setting for the NGRAM minimum/maximum range; the default is 2/6. (Example: "NGRAM On 5 8"). The Asian option enables NGRAM indexing for Asian text only. For more information about NGRAMS, click here.
NonUnicodeCodePage The code page to use for non-Unicode documents (such as ASCII Text files) in a Unicode index.
NullParasCount Normally empty paragraphs are ignored by Perceptive Search setting this option allows the use of empty paragraph in positional searching.
NumberRecognition Enables numbers to be handled intelligently in queries, including numerical ranges.
NumbersCommon Whether pure numbers are to be considered common. Do not include this entry if you want to index pure numbers.
PathToDocs Allows relative indexes to be moved for distribution.
ResetLastAccessDates After documents have been indexed, Perceptive Search will reset the operating system "Last Accessed" date to make it appear the document was not accessed.
Significant The list of characters to be considered significant.
SortAccelerator Sort accelerators cache extra information in the index that allows for faster sorting. You can apply an accelerator to any standard Perceptive Search sort method, or any metadata field. The syntax is a semicolon list of fields. A numeric value will be considered to be one of the Perceptive Search standard sort methods, a string value is considered to be a metadata field. You can force a metadata field to be treated as a specific data type by prefixing it with i: for integer or f: for float.  Example: 4;7;f:Cost
SpellingTips Create spelling tips for this index.
StaticEntries indicating those directories to be considered static- e.g. not to be included in Updates.
SummaryIndexing Index document meta data or summary for Word, Excel, Powerpoint, PDF or WordPerfect.
SuppressEmailFileAttachments Do not index attachments in file system emails, such as MSG, EML and MBX files.
TitleAfter Consider the title of a document to start after the nth line. For instance, TitleAfter 4.
TitleContains Consider the title of a document to be the first line after the TitleAfter line that contains the string of characters, e.g. TitleContains Re:.
Version The version of Perceptive Search, used to create this ISYS.CFG file.
WideMeans When using WIDELINES as a rule modifier the default width is 128 characters. Here this may be increased to be up to 255 characters.
WorkFiles Store the Perceptive Search temporary files for this index in the specified directory. By default, Perceptive Search will automatically either place the workfiles in your Windows temporary directory, or in the index directory, whichever is more efficient.