Making Good Progress on the Search feature

I'm making good progress implementing the site Search feature using DotLucene (not to be confused with Lucene.NET), big thanks to Martijn Boland for good advice and also for good code examples on using DotLucene in his Cuyahoga Web Site Framework.  His project is also a good reference for those who want to use NHibernate in ASP.NET.

The search is functional and working on this site, though it still needs some refinement before I make a release. The index for the search is maintened as content is created and updated so it will always stay in synch. Search results are filtered by role and the index is updated if view permissions for a page are updated so that the filtering is always correct.

Search is implemented in:
Html Content
Blogs
Forums
Event Calendar
Image Gallery
Links
Shared Files

The biggest challenge is getting good performance in building the index for an existing site which will be needed for users upgrading from older versions of mojoPortal. I've created a button to build the index for the whole site but the quandry I face is after the upgrade people should really never click it. Especially if the site gets really big say hundreds of pages or thousands of forum posts it would really be best to leave the indexes to manage themselves. But if the button is there, people will click it.

One option I suppose is to put the button in the page settings and make the user manually index each page in the site rather than the whole site. This would break it down to smaller chunks at a time but still would not be a good thing to do on the page that has the forums after you get a thousand posts as each post will be indexed.  I don't know of anyone currently using mojoPortal that has hundreds of pages or thousands of posts so maybe I shouldn't worry about it too much. Building the index will definitely be an Admin only feature.

Comments

re: Making Good Progress on the Search feature

Friday, July 8, 2005 5:13:52 PM
Hi Joe,

Can you replace the button with a command-line program that builds the index for the whole site, and then place the program along with the DB upgrade scripts?  That way it will be clear that only upgraders need to use it and should only use it once.

--Dean

Joe

re: Making Good Progress on the Search feature

Friday, July 8, 2005 5:35:12 PM
That actually crossed my mind but I do have to have at least have a facility for re-indexing all the content for a page if role permissions for the page change so having the button at the page settings level is no worse than what I already have to accomodate. But, it will still be ugly if you decide to change permissions on the page that contains the forums after you have a bajillion posts.  Hopefully people will be wise on the page that has forums and not change permissions after they have a lot of posts, it really shouldn't have that much impact on normal pages with just a typical length of Html content. View permissions are at the page level and I don't want to return search results to users who don't have view permissions so if the permissions change I need to rebuild the index for all modules contained within the page. Same thing if content is moved from one page to another as part of an authoring/publishing proces, the view permissions on the published page are likely to be different than the page in the authoring area where content is written and approved. So I'm leaning toward having a button at the page level but having a warning dialog if the button is clicked so they know the issue and have a chance to back out.

I'm going to look into whether I can spawn a separate thread from a web request to kick off the process but return the response quickly to the browser and just notify that processing has been initiated.

re: Making Good Progress on the Search feature

Saturday, July 9, 2005 6:56:46 PM
I understand why changing permissions would require reindexing, but why make it a separate button?  Why not just reindex whenever the permissions change?  Regardless, doing the reindexing on a separate thread makes sense.

--Dean

Joe

re: Making Good Progress on the Search feature

Sunday, July 10, 2005 3:23:24 AM
Yes the plan is to automatically re-index when page permissions change, no need to click a button. The only need for the button really is for upgrading from previous versions.

Right now I have the button in Site Settings which means it will re-index all site content at once. If I were to remove that button and put one at the page level it would be a little more work for the admin doing the upgrade because he would have to click the button on every page's Page Settings but it would make it index smaller chunks of the site at a time. 

Indexing at the page level would also solve another cosmetic issue that is minor but difficult to solve.  The menu is setup to use the pageindex to determine which menu item to highlight. If the page is not in the top level, the pageindex used is really the pageindex for the topmost parent so that it is highlighted. It makes no sense to highlight the actual page's menu if it is not a top level page because its menu item is not visible except during mouse over of the top level.  The Site Settings knows what the pageindex for the active page is but if it has to index all pages it cannot know the page index of all pages as that is really a mechanism managed by the skmMenu not the SiteSettings itself. So the upshot is if you re-index the whole site it may not highlight the correct page when you click the search results link while if you index a page at a time the correct pageindex is stored in the search index so the links in the search results are built correctly every time and do highlight the correct item.
 
I do have it working on a separate thread now and that seems much better though currently there is nothing indicating when the job is done. Again since indexing is not something that should be done more than once I'm inclined to not put in any notification.  I've avoided the use of session variables entirely so far and would rather keep it that way. If you have the log4net.config file set for logging DEBUG info it does show in the log but that is also not something you would want to leave configured that way.

re: Making Good Progress on the Search feature

Sunday, July 10, 2005 9:07:50 AM
I'm afraid I still don't see why you need a button for the upgrade case.  From a user interface perspective, I think it makes more sense for the admin to run an executable one time during an upgrade, instead of having an ever-present button that shouldn't normally be clicked or a bunch of buttons that shouldn't normally be clicked.

Regarding the pageindex issue, couldn't the reindexing executable iterate through all pages and determine their pageindexes before reindexing them?

Regarding notification, I agree it isn't a big deal at this point, though it might be more appropriate to log the "reindexing complete" message at INFO level.

Anyway, all this is just my $.02.  You obviously know better than I do what is practical and what isn't with the current codebase.

--Dean

Joe

re: Making Good Progress on the Search feature

Sunday, July 10, 2005 9:38:11 AM
Well the command line would have the same problem as far as the pageindex because it is not a property of the page object per se but rather something managed by the skmMenu when it builds the menu for the request and its only purpose is to highlight the correct menu node at the root level of the menu. Any page that is not at the root node of the menu gets the same page index as its topmost parent so that the top most parent which is a root level menu item is highlighted.

Your point about more buttons that should not be clicked is well taken though so I'm not going to put the button at the Page Settings level. 

I could either stick with the one button an the Site Settings page and it will prompt before executing to give the user a chance to cancel or I may try to do something in the Application_OnStart in the Global.asax.cs to look and see if the index folder is empty and if so spawn a thread to build it. If that works I could do away with the button on Site Settings and not have any buttons that should not be clicked.

Good point on the logging at INFO level also!  I guess I could make that the default log level instead of ERROR and just use it judiciously to keep from too verbose logging.

Thanks as always for your good input! It helps having people to bounce these ideas around.

Joe

re: Making Good Progress on the Search feature

Sunday, July 10, 2005 12:57:49 PM
I was able to do it with no buttons!

My idea of checking for the index in Application_OnStart didn't work because you can't get at any of the request information in that method.

So I did it in the Search Results page. If you do a search and no results are returned it will chack if any index files exist. If not it will queue up the job of building the index on a new thread and display a message indicating that the index is being built and to wait a few minutes then try the search again.

Seems much better than having buttons that should only be clicked once.

re: Making Good Progress on the Search feature

Monday, July 11, 2005 4:57:23 PM
Sounds great.  Thanks!

--Dean

Comments are closed on this post.