Help > Indexes > Indexing Rules > Scripting Rules > File Rules

File Rules

You can use scripting rules to control the files that Perceptive Enterprise Search indexes during its reindex or update process.  You can use this approach to index files stored in a non-file system, such as databases or webdav servers.

A file rule script works by implementing specifically named functions that the Perceptive Enterprise Search engine will look for at indexing time, when Perceptive Enterprise Search updates an index, it scans the system looking for files that are not currently in the index, a file rule script hooks into this level.

There are two file rule modes, simple and advanced. The simple mode is good for small datasets or single folders, whereas the advanced mode is used to walk larger volumes or to handle databases.

To create a file rule script, you need to implement some of the functions below.

Sub ScanFiles(Response)

ScanFiles provides a simply method to scan all your files in one function, it is only recommended for small datasets.

Parameters

Response - populate the response object with the files you wish to index.  If you pass in a folder, the files within the folder will be passed to ScanProcessFile enabling you to filter them out. Examples of use:

Response.Add "c:\pathtofile.ext  
Response.Add "c:\pathtofolder"  
Response.Add FileSystemObject Folder  
Response.Add FileSystemObject Folder Files  
Response.Add "c:\pathtofile.ext  
Example
set FileSystemObject = CreateObject("Scripting.FileSystemObject")  

sub ScanFiles(Response)
  ' Add files or folders into the response object by calling 
  ' Response.Add.  These may include folder and file objects
  ' from the FileSystemObject object.
  Response.Add FileSystemObject.GetFolder("c:\documents").Files
end sub

function ScanProcessFile(Filename)
  ' ScanProcessFile is called for each file specified in ScanFiles.
  ' This gives you the ability to filter out particular files
  ' by simply returning false
  ScanProcessFile = True
end function

Function ScanProcessFile(Filename)

ScanProcessFile is called for all files added through the ScanFiles method, it allows you to push entire folders using ScanFiles and filter individuals files out.

Parameters

Filename - The absolute path to a file

Returns

Return a boolean to indicate if the specified file should be indexed.

Example

See ScanFiles

Function ScanFirst(Response)
Function ScanNext(Response)

ScanFirst/ScanNext give you ultimate control over the indexing process.  When the indexing process starts, Perceptive Enterprise Search will make a call to ScanFirst to retrieve the first document, you can use this opportunity to initialize any global data or configuration options that you need during the indexing run.  Perceptive Enterprise Search will then continue to call ScanNext until it returns False, indicating that there are no more documents.

Parameters

Response - populate the response object with information about the document, properties are:

Response.Filename The filename of the document as it should appear in Perceptive Enterprise Search. The filename does not need to be an actual file on disk, but can be information that is meaningful to your application. If the filename does not map to disk, you must implement RequestDocument.
Response.Timestamp If you know the last modified date of the item you want to index, you should set it here. Perceptive Enterprise Search will use this date when trying to identify whether the item has changed.
Response.CheckSum If timestamps are not available, you may alternatively set a CheckSum. The checksum must be able to fit in a 32 bit integer.
Returns

Boolean value indicating whether this call to ScanFirst/ScanNext succeeded.

Remarks

The filenames of the items that you return through ScanFirst/ScanNext do not need to be actual files on disk, they can be synthetic filenames that are meaningful to your script.  This gives you the ability to index sources that do not have physical files stored locally, such as databases or remote storage.

If you intend to use a synthetic filename, you must implement RequestDocument to fetch the actual document for indexing.

Example
' This script demonstrates the ScanFirst/ScanNext functions of {ProductName} scripts.
' It uses the FileSystemObjects to get a reference to all files in c:\Documents, 
' and then enumerates them using a Server.Enumerator object.

dim FileSystem
dim ScanEnum

set FileSystem = Server.CreateObject("Scripting.FileSystemObject")

function ScanFirst(Response)
  set ScanEnum = Server.Enumerator(FileSystem.GetFolder("c:\Documents").Files)
  ScanFirst = ScanNext(Response)
end function

function ScanNext(Response)
  ScanNext = false
  do while ScanEnum.Next
    if ScanEnum.Item.Size < 1024 then
      Response.Filename = ScanEnum.Item.Path
      Response.Timestamp = ScanEnum.Item.DateLastModified
      ScanNext = true
      exit do
    end if
  loop
end function
See Also

Database Example

Sub RequestDocument(Request, Response)

RequestDocument is called when Perceptive Enterprise Search needs access to the documents specified through ScanFirst/ScanNext.  If this function is not present, Perceptive Enterprise Search will look for the files on disk. RequestDocument can be used in one of two ways:

  1. Mapping a synthetic filename to a physical file on disk.  Use this option when you need to temporarily recall a file from a non-file system source, such as a database blobs or remote system.
  2. Create a virtual document of either text or html.  Use this option when the content you want to index is not a file, and contains text or html only, such as database records.
Parameters

Request - contains information about the document being requested, properties include:

Request.Filename returns the filename of the document being requested (as return from ScanFirst/ScanNext).
Request.IndexPath returns the path to the index being updated.
Request.Properties returns a collection of NTFS file system properties (Windows only).
Request.Script returns the name of the script currently executing.
Request.Size returns the size of the file on disk, if available.
Request.TimeStamp returns the timestamp on disk, if available.

Response - populate with information about the document.

Response.Filename set to a physical file on disk.
Response.Timestamp [optional] set to the timestamp to store as the last modified date.
Response.FileType [optional] allows you to set the Perceptive Enterprise Search file type for this document.
Response.CheckSum [optional] set to a 32 bit checksum value if desired.
Response.Write(Text) Writes the given text to the virtual document.
Response.WriteProp(Name, Value) Writes the given name\value pair to the document output.
Response.NewParagraph Causes a new paragraph to be outputted.
Response.Template Passes control over to the Perceptive Enterprise Search template engine.  See templates for details.
Example
sub RequestDocument(Request, Response)
    ' Assumes that the Request.Filename is a URL
    set WinHTTP = Server.CreateObject( "WinHttp.WinHttpRequest.5.1" )
    WinHTTP.Open "GET", Request.Filename, False
    WinHTTP.Send
    if WinHTTP.Status = 200 then
        Response.Write WinHTTP.ResponseText
    end if
end sub

Sub ReleaseDocument (Filename)

ReleaseDocument is called once the processing of a document is complete. This gives you the ability to handle any cleanup that is required for each document.

Parameters

Filename - indicates the filename of the document that is being released.