After competing in the Sitecore hackathon last weekend, I got excited. So when Kyle Heon posted a particular Tweet today, I immediately knew what I would be doing this evening; create a new (small) project on GitHub.
So how does #Sitecore v7 Search work when your page is comprised of page editor components?
— Kyle Heon (@kyleheon) January 30, 2014
And here you can find the result. Basically, I created a computed field that actually does a http request and downloads the page that you are indexing.
The reason you would want to do this, is because of how content composition in the page editor works. You can add components to a page that reference separate Sitecore items as associated content (a.k.a. datasources). But the content of those referenced items is indexed separately and therefore difficult to search for using Sitecore.ContentSearch (Sitecore 7 search) – if you want ‘pages’ to be the result, like regular site searches usually need.
You can find more information, including installation instructions, on the GitHub page for Sitecore Html Crawler (just scroll down a little).
There are some limitations to this approach that you should know about.