Help > Indexes > Crawler Rules

Crawler Rules

Indexing rules control what content should be crawled in a content source. You may use rules to specify which directories should be scanned, how to handle subdirectories, and what types of files to include.

When you first configure a collection, you will be asked to provide a starting directory for the crawler. By default, the crawler will provide this directory, and process every file and subdirectory contained within.

It's also possible to have multiple starting directories per collection, but also consider it may be a better option to have one collection per starting point.

With a starting directory specified, you may add extra rules to control how content of the directory will be processed. There are two types of rules: inclusion and exclusion.

Any file or directory that matches an inclusion rule will be considered for indexing. Conversely, any file/directory that matches an exclusion rule will be excluded from indexing. Rules are processed in order, they first rule that matches will be applied. For this reason, you will typically specify all exclusion rules first.

The rule patterns may include the standard wildcard characters of '*' (string wildcard) and '?' (character wildcard). There is a directory and file component to each pattern. Consider the following examples:

TEST.DOCMatches a file named TEST.DOC, that must be in the starting directory.
**/TEST.DOCMatches any files named TEST.DOC, that may be in any directory under the starting directory.
DIR/**/TEST.DOCMatches a file named TEST.DOC, that is in a subdirectory of DIR, in the starting directory.
**/*Matches all files in all directories, under the starting directory.
Info: If you do not provide any rules against a starting directory, **/*.* is assumed.Hint: You may also have exclusion rules based on metadata by using Routes.

Examples

Process all files in a directory

INCLUDE **/*.*

Process all files, except for *.PST

EXCLUDE **/*.pst
INCLUDE **/*.*

Process all files, exclude *.PST unless it contains 2015 in it’s filename

INCLUDE **/*2015*.pdf
EXCLUDE **/*.pdf
INCLUDE **/*.*