Rebuilding the Search Index

The search index in mojoPortal content management system is based on Lucene.NET. The search index files live in the file system under /Data/Sites/[SiteID]/index

You can rebuild the search index by deleting all files from the index folder and then doing a new search. It will detect that the index folder is empty and it will re-build the index. It is probably wise to backup the files before deleting them.

There is also a hidden button that can be used to rebuild the search index if you set this to true in Web.config or user.config:

<add key="ShowRebuildSearchIndexButtonToAdmins" value="true" />

Rebuilding the search index is not something you should do frequently. Generally it maintains itself and you don't have to mess with it, but if you are having a search related problem it can be helpful. On a small site its fairly trivial to rebuild the index, but it can be very dodgy rebuilding the index on a large site with a lot of activity. Its best to backup your index files first. On a large site it may take a while and it has the possibility to stall out. The way it works when you delete the index files either using the button or manually, it first runs through a process where all site pages are processed by each of the index builders. Each searchable feature has its own index builder. If the page contains the feature then the index builder for the feature will process the content into the mp_IndexingQueue table and then an IndexWriter task is run on a background thread to process the queue in sequential order into the search index. The process can be dodgy because the task may index all the rows in the table and then quit because the table is empty, meanwhile the index builders may still be queing some items but the task did not see them because when it checked there weren't any rows to process. Any time you edit any content it will kick off the indexwriter again and then it should resume and process the queue. You can monitor the running tasks from the Administration > Task Queue page. If the index writer task is still running be patient, it does some sleeping some to allow for more rows coming into the queue table. It can be tricky business and something could go wrong, the application pool could get recycled which would also kill the task on the background thread. In some cases for large sites I have resorted to rebuilding the index on a separate machine like a developer or staging machine with a fresh restore of the database so that I have the latest content on the machine. Then the index files can be uploaded onto the production server, after backing up and deleting the existing index. Of course there is some risk of new content coming into the site while you are working and that content will not be in the search index untl the next time it is edited.

Removing Duplicates from the Search Index

Occasionally it can happen that a content item is duplicated in the search index. Generally each content item has a unique key within the index and that key is used to delete an existing item before creating a new or updated item in the search index, but due to timing or possible interruption of the indexing process it can happen that duplicate keys are created in the index, and the underlying Lucene search index does allow items with duplicate keys. Rebuilding the the search index should generally clear up duplicates and if for some reason you have a lot of duplicates that is the way to go, but in the case where there are only one or a few duplicates and the index is large then you may prefer to more surgically remove the duplicates. Editing the content item again should cause it to get re-indexed which should remove the duplicate, be sure to wait a few minutes after editing to allow time for the indexing to happen before you expect to see the change. We also have a file named in IndexBrowser.aspx in our source code repository, you can grab that file and drop it into your site then navigate to it. This page will allow you to directly browse the contents of the search index and delete items if needed. However if you delete an item that has a duplicate key then both copies would be deleted since it deletes by using the key. Only if the index item somehow has a different key but the same content could you delete a duplicate this way wihtout losing both copies from the index.