Index page editor proof pages with Sitecore 7

Posted by Robin Hermanussen on January 30, 2014 · 2 mins read

After competing in the Sitecore hackathon last weekend, I got excited. So when Kyle Heon posted a particular Tweet today, I immediately knew what I would be doing this evening; create a new (small) project on GitHub.

And here you can find the result. Basically, I created a computed field that actually does a http request and downloads the page that you are indexing.

The reason you would want to do this, is because of how content composition in the page editor works. You can add components to a page that reference separate Sitecore items as associated content (a.k.a. datasources). But the content of those referenced items is indexed separately and therefore difficult to search for using Sitecore.ContentSearch (Sitecore 7 search) – if you want ‘pages’ to be the result, like regular site searches usually need.

You can find more information, including installation instructions, on the GitHub page for Sitecore Html Crawler (just scroll down a little).

There are some limitations to this approach that you should know about.

  1. This doesn’t really work well with wildcard items, because they can only be indexed once; not for every item that you want to display on the wildcard item’s page. You would need to change the actual indexer to support that, or handle the difficulty of it in your search query.
  2. I’ve only tested this with Lucene. I think changing the configuration a little could make it work for other providers such as Solr.