Help > Indexes > Indexing Rules > Spider Rules

Spider Rules

An index of a website contains a set of rules that instruct Perceptive Enterprise Search what to index, and how to navigate the website. Spider rules can be one of the following:

Adding a Starting URL

  1. Select Spider Rules for the index in which you wish to add a Spider Rule. If Spider Rules is not visible this is most likely due to the index being configured as a "File System" index. You cannot add Spider Rules to a File System index.
  2. Click New Starting URL. The Perceptive Enterprise Search Spider URL Wizard will appear.
  3. Enter the URL for the website you wish to index. The URL should be formatted like http://www.lexmark.com. Click Next.
  4. Select the crawl depth for the website. The crawl depth indicates how many links from the starting URL to follow. A link on the starting URL is considered to be level 0. A document linked to from the starting URL is level 1, and so on. Click Next.
  5. If a site map for this site is found, you will be given the option to have the Perceptive Enterprise Search Spider crawl the website to the depth specified in the previous step or to use the site map. If you choose the site map option all URLs listed in the site map will be indexed (this will set the crawl depth in the spider rule to -2). Select an option and click Next.
  6. Review the information. If you wish to make any changes click Back. Once you are satisfied with the settings click Finish to add the new Starting URL.

Adding an Exclusion Filter

  1. Select Spider Rules for the index in which you wish to add an exclusion filter. If Spider Rules are not visible this is most likely due to the index being a configured as a "File System" index. You cannot add Spider Rules to a File System index.
  2. Click New Exclusion Filter. The Perceptive Enterprise Search Spider Filter Wizard will appear.
  3. Enter a pattern that will match the URLs you wish to exclude from the index. For example, to exclude all files under "/sitemap/", enter */sitemap/*. Click Next.
  4. Review the information. If you wish to make any changes click Back. Once you are satisfied with the settings click Finish to add the new Exclusion Filter.

Adding an External Domain

  1. Select Spider Rules for the index in which you wish to add an external domain. If Spider Rules are not visible this is most likely due to the index being configured as a "File System" index. You cannot add Spider Rules to a File System index.
  2. Enter the external domain which contains web pages that should also be crawled and indexed. Click Next.
  3. Select the crawl depth for the external domain. The crawl depth indicates how many links from the external link Perceptive Enterprise Search will follow. Click Next.
  4. Review the information. If you wish to make any changes click Back. Once you are satisfied with the settings click Finish to add the new Exclusion Filter.

Note: If you have access to the HTML source of a website you can control the content the spider indexes within a HTML file. The spider will index content from the top of the page until the first <!-- ISYSINDEXINGOFF --> HTML comment, and only start again if it finds a <!-- ISYSINDEXINGON --> comment. It can be switched off and on as many times as needed, and doesn't affect the page's display. In this way sections of the page can be excluded from indexing, and therefore will never be found as a hit on that page. Prime candidates are headers, footers, navigation panels, adverts, etc., which appear on every page.