Index page editor proof pages with Sitecore 7

After competing in the Sitecore hackathon last weekend, I got excited. So when Kyle Heon posted a particular Tweet today, I immediately knew what I would be doing this evening; create a new (small) project on GitHub.

And here you can find the result. Basically, I created a computed field that actually does a http request and downloads the page that you are indexing.

The reason you would want to do this, is because of how content composition in the page editor works. You can add components to a page that reference separate Sitecore items as associated content (a.k.a. datasources). But the content of those referenced items is indexed separately and therefore difficult to search for using Sitecore.ContentSearch (Sitecore 7 search) – if you want ‘pages’ to be the result, like regular site searches usually need.

You can find more information, including installation instructions, on the GitHub page for Sitecore Html Crawler (just scroll down a little).

There are some limitations to this approach that you should know about.

  1. This doesn’t really work well with wildcard items, because they can only be indexed once; not for every item that you want to display on the wildcard item’s page. You would need to change the actual indexer to support that, or handle the difficulty of it in your search query.
  2. I’ve only tested this with Lucene. I think changing the configuration a little could make it work for other providers such as Solr.

Comments (2)

  1. 23:51, 2014/01/30Johann  / Reply

    Hi Robin,

    I was thinking of somethings like that but my problems was the device.
    If you use more than one device (like a mobile website + the default desktop) on your computed field you will only store one of the page rendering because maybe the mobile does not show the same component (and so content) than your default one, but if you search with your mobile, you’ll get result from the desktop one.

    If you have any idea how to solve the problems ^^ (maybe creating a computed field by device *.* but it’s hard to turn it into a module, or yeah with change)

    • 09:58, 2014/01/31Robin Hermanussen  / Reply

      Hi Johann,

      I’m sure there’s several cases where this solution is not sufficient ‘out of the box’. But since it’s only one class and one config file, it should be easy enough to adapt to your specific needs.

      Adding another configured computed field and changing it’s behaviour to (for example) index the mobile site should be easy enough.

      Also, I’m currently indexing all the content in the ‘body’ section. For specific sites it may be better to index only some specific parts of the page that contain relevant content (excluding menu’s and overview lists, for example). So changing that logic for your site would be a good idea.

      The solution on GitHub is really nothing more than a starting point. Hopefully, a very good one that will already be adequate for the simplest scenario’s. So go ahead and fork!

      Best Regards,

      Robin

Leave a Reply

Allowed Tags - You may use these HTML tags and attributes in your comment.

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Pingbacks (0)

› No pingbacks yet.

UA-4216957-1