Urls:
You can provide website urls which is publicly available on internet to expertrec crawler to crawl. Invalid urls will be discarded by crawler.
- localhost url(127.0.0.1) – These types of urls will never be reached by internet and so it cant be crawled.
- Staging website url – Staging site which is clone of your production site is meant for internal use only. This type of urls should not be added in crawl url list.
- Intranet websites – Some websites are available inside your enterprise netwok only. These sites are unreachable from internet, so should not be added to urls list.
In order to add another url, click “Add more urls” button and follow the same url rules for adding new url.
How to crawl certain section of the site ?
Generally, when crawl starts on a url, it will complete when every reachable url is crawled. But what if the site is big enough or there are certain architectural issues on the site and you need to restrict crawl to some fixed section of the site. Expertrec crawler considers path/paths from the given in urls in url section and restricts crawl within that path only.
Ex. if url kept to crawl is “https://www.example.com/pages/history.html”, then crawl will discard all the urls which are not having the pattern “https://www.example.com/pages/”.
This helps to keep the crawl to the expected path or section of the website and search results will be containing urls under this path or section only.