Sometimes you might want to add a search box to your authenticated pages (pages that need a login for access). While using a search engine crawler to index these pages, you have to give certain details so that it can go beyond the login pages. Doing this is very easy with expertrec site search engine.
Let’s take an example. We want to index the following pages and the related URLs, https://www.thornburg.com/registration/logon.aspx
Steps to add search to protected pages
- Go to https://cse.expertrec.com
- Go to Crawl-> Advanced-> Protected Pages
Here login URL is the page URL. In this case, it is https://www.thornburg.com/registration/logon.aspx
Login form ID– To find the login form ID, open developer tools in your chrome browser (control+shift+i), and select the form element in the developer mode. In this case, it is p_login
Enter the id/name for the username box- Select the id/name of the input box. Here it is p_txtUsername
Enter the id/name for the password box- In the inspect element mode, click on the password box for the password box id
Enter a username and password for the crawler and press update.
Now go to the home page and press recrawl. That’s it. Now you can go to your demo page and check if your search results include the behind login pages or not.
Common pitfalls:
Usually, there is a logout link for pages behind authentication. The crawler is going to follow all the links present on your pages and will invariably be following the logout link as well. These logout links should not be crawled, as it will log the crawler out and other pages will not get indexed properly. We recommend that you mark the logout link as not to be followed. e.g. if you currently have your logout link like
<a href=”logout.php”>Logout</a>
replace this with
<a href=”logout.php” rel=”nofollow”>Logout</a>
The rel=”nofollow” will instruct the crawler not to follow this link.
If you see incomplete results on expertrec search, please check if you have guarded the logout link correctly.
Advanced users, should Ideally change their log out to POST instead of GET method. This will also protect you from some browser pre-fetching. For more details please refer to this stack overflow post on what HTTP method should be used for logout.