By default, Perceptive Search WebSite Spider will store the page date/time where available, and otherwise compute a checksum over the entire page. Using this section, you can configure Perceptive Search to always use checksums for changed page detection, even when date/times are available.
If you are unsure how date/times work on a particular site, just visit the site using a browser and display the page properties (In Internet Explorer, go to File > Properties). If the date/time fields are blank, the server is not providing time information. If the fields are non-blank, but very recent (instantaneous, allowing for time zone variations), then the pages are being generated dynamically with a scratch date/time stamp.
This tab also lets you specify regions of the page to exclude for checksum purposes. Such regions may contain variable advertisement information, and so the page should not be considered as a changed page when portions within these regions change.
The portions to exclude from the checksum are indicated by begin/end sequences, and can occur many times within the page. If no such regions are found, the checksum is computed over the entire page.
This option provides special functionality for Domino servers. Domino is designed heavily around the concept of views, and it is normal for a Domino site to have many views of the same underlying documents.
This means that in the process of exploring the site, the Spider will see the same documents many times, accessed via different paths and with different URLs. If you select the De-duplicate Lotus Domino Documents option, then the Spider will apply special processing logic such that it indexes each of the underlying documents once only. This is the recommended setting for Lotus Domino sites.
Domino sites also have the peculiar problem that most external Web search engines (AltaVista, Hotbot, etc), do not index their sites because Domino URLs involve question marks, and these engines tend not to follow links involving question marks.
The solution is for Perceptive Search to generate an 'index' file that contains an entry for every page on the site, expressed in a form without question marks. The external search engines can then be directed to index this file directly, and should follow through and index the underlying Domino pages as well.