Sometimes you might want to add a search box to your authenticated pages (pages that need a login for access). While using a search engine crawler to index these pages, you have to give certain details so that it can go beyond the login pages. Doing this is very easy with expertrec site search engine.

Let’s take an example. We want to index the following pages and the related URLs, https://www.thornburg.com/registration/logon.aspxcrawling behind login pages

indexing behind login pages

Here login URL is the page URL. In this case, it is https://www.thornburg.com/registration/logon.aspx

Login form ID– To find the login form ID, open developer tools in your chrome browser (control+shift+i) and select the form element in the developer mode. In this case, it is p_logincrawling behind login pages

Enter the id/name for username box- Select the id/name of the input box. Here it is p_txtUsername

Enter the id/name for password box- In the inspect element mode, click on the password box for the password box idcrawling behind login pages

Enter a username and password for the crawler and press update.

Now go to the home page and press recrawl. That’s it. Now you can go to your demo page and check if your search results include the behind login pages or not.crawling behind login pages

 

Common pitfalls:

Usually there is a logout link for pages behind authentication.  The crawler is going to follow all the links present on your pages and will invariably be following the logout link as well.  These logout links should not be crawled, as it will log the crawler out and other pages will not get indexed properly.  We recommend that you mark the logout link as not to be followed.  e.g. if you currently have your logout link like

<a href=”logout.php”>Logout</a>

replace this with

<a href=”logout.php” rel=”nofollow”>Logout</a>

The rel=”nofollow” will instruct the crawler not to follow this link.

If you see incomplete results on expertrec search, please check if you have guarded the logout link correctly.

Advanced users, should Ideally change their logout to POST instead of GET method. This will also protect you from some browser pre-fetching.  For more details please refer to this stack overflow post on what HTTP method should be used for logout.


muthali ganesh

Muthali loves writing about emerging technologies and easy solutions for complex tech issues. You can reach out to him through chat or by raising a support ticket on the left hand side of the page.

1 Comment

Comments are closed.