You can use scripting rules to control the files that Perceptive Enterprise Search indexes during its reindex or update process. You can use this approach to index files stored in a non-file system, such as databases or webdav servers.
A file rule script works by implementing specifically named functions that the Perceptive Enterprise Search engine will look for at indexing time, when Perceptive Enterprise Search updates an index, it scans the system looking for files that are not currently in the index, a file rule script hooks into this level.
There are two file rule modes, simple and advanced. The simple mode is good for small datasets or single folders, whereas the advanced mode is used to walk larger volumes or to handle databases.
To create a file rule script, you need to implement some of the functions below.
ScanFiles provides a simply method to scan all your files in one function, it is only recommended for small datasets.
Response - populate the response object with the files you wish to index.
If you pass in a folder, the files within the folder will be passed to ScanProcessFile
enabling you to filter them out. Examples of use:
Response.Add "c:\pathtofile.ext | |
Response.Add "c:\pathtofolder" | |
Response.Add FileSystemObject Folder | |
Response.Add FileSystemObject Folder Files | |
Response.Add "c:\pathtofile.ext |
set FileSystemObject = CreateObject("Scripting.FileSystemObject") sub ScanFiles(Response) ' Add files or folders into the response object by calling ' Response.Add. These may include folder and file objects ' from the FileSystemObject object. Response.Add FileSystemObject.GetFolder("c:\documents").Files end sub function ScanProcessFile(Filename) ' ScanProcessFile is called for each file specified in ScanFiles. ' This gives you the ability to filter out particular files ' by simply returning false ScanProcessFile = True end function
ScanProcessFile is called for all files added through the ScanFiles method, it allows you to push entire folders using ScanFiles and filter individuals files out.
Filename - The absolute path to a file
Return a boolean to indicate if the specified file should be indexed.
See ScanFiles
ScanFirst/ScanNext give you ultimate control over the indexing process. When the indexing process starts, Perceptive Enterprise Search will make a call to ScanFirst to retrieve the first document, you can use this opportunity to initialize any global data or configuration options that you need during the indexing run. Perceptive Enterprise Search will then continue to call ScanNext until it returns False, indicating that there are no more documents.
Response - populate the response object with information about the document,
properties are:
Response.Filename | The filename of the document as it should appear in Perceptive Enterprise Search. The filename does not need to be an actual file on disk, but can be information that is meaningful to your application. If the filename does not map to disk, you must implement RequestDocument. |
Response.Timestamp | If you know the last modified date of the item you want to index, you should set it here. Perceptive Enterprise Search will use this date when trying to identify whether the item has changed. |
Response.CheckSum | If timestamps are not available, you may alternatively set a CheckSum. The checksum must be able to fit in a 32 bit integer. |
Boolean value indicating whether this call to ScanFirst/ScanNext succeeded.
The filenames of the items that you return through ScanFirst/ScanNext do not need to be actual files on disk, they can be synthetic filenames that are meaningful to your script. This gives you the ability to index sources that do not have physical files stored locally, such as databases or remote storage.
If you intend to use a synthetic filename, you must implement RequestDocument to fetch the actual document for indexing.
' This script demonstrates the ScanFirst/ScanNext functions of {ProductName} scripts. ' It uses the FileSystemObjects to get a reference to all files in c:\Documents, ' and then enumerates them using a Server.Enumerator object. dim FileSystem dim ScanEnum set FileSystem = Server.CreateObject("Scripting.FileSystemObject") function ScanFirst(Response) set ScanEnum = Server.Enumerator(FileSystem.GetFolder("c:\Documents").Files) ScanFirst = ScanNext(Response) end function function ScanNext(Response) ScanNext = false do while ScanEnum.Next if ScanEnum.Item.Size < 1024 then Response.Filename = ScanEnum.Item.Path Response.Timestamp = ScanEnum.Item.DateLastModified ScanNext = true exit do end if loop end function
Database Example
RequestDocument is called when Perceptive Enterprise Search needs access to the documents specified through ScanFirst/ScanNext. If this function is not present, Perceptive Enterprise Search will look for the files on disk. RequestDocument can be used in one of two ways:
Request - contains information about the document being requested, properties
include:
Request.Filename | returns the filename of the document being requested (as return from ScanFirst/ScanNext). |
Request.IndexPath | returns the path to the index being updated. |
Request.Properties | returns a collection of NTFS file system properties (Windows only). |
Request.Script | returns the name of the script currently executing. |
Request.Size | returns the size of the file on disk, if available. |
Request.TimeStamp | returns the timestamp on disk, if available. |
Response - populate with information about the document.
Response.Filename | set to a physical file on disk. |
Response.Timestamp | [optional] set to the timestamp to store as the last modified date. |
Response.FileType | [optional] allows you to set the Perceptive Enterprise Search file type for this document. |
Response.CheckSum | [optional] set to a 32 bit checksum value if desired. |
Response.Write(Text) | Writes the given text to the virtual document. |
Response.WriteProp(Name, Value) | Writes the given name\value pair to the document output. |
Response.NewParagraph | Causes a new paragraph to be outputted. |
Response.Template | Passes control over to the Perceptive Enterprise Search template engine. See templates for details. |
sub RequestDocument(Request, Response) ' Assumes that the Request.Filename is a URL set WinHTTP = Server.CreateObject( "WinHttp.WinHttpRequest.5.1" ) WinHTTP.Open "GET", Request.Filename, False WinHTTP.Send if WinHTTP.Status = 200 then Response.Write WinHTTP.ResponseText end if end sub
ReleaseDocument is called once the processing of a document is complete. This gives you the ability to handle any cleanup that is required for each document.
Filename - indicates the filename of the document that is being released.