Duplicate Search Results Coming from internal search

This is an open forum for any mojoPortal topics that don't fall into the other categories.

This thread is closed to new posts. You must sign in to post in the forums.
1/6/2011 11:12:24 AM
Gravatar
Total Posts 76

Duplicate Search Results Coming from internal search

Guys,

I have promoted my companies new corporate website out. ( thanks for past help and a great CMS ) 

But noticed that the internal search was returning multiple results for the same page.

I haven't touched any of the search settings. 

We build software that banks and hedge funds use to analyst risk ( as well as other things) of trades and portfolios.

But searching for the term "risk" the first 4 results are all for the same page. 

http://www.quantifisolutions.com/SearchResults.aspx?q=risk

Some results are missing descriptive text ( missing meta info, our meta content ins't fully fleshed out yet )?

Any help would be nice, so I can focus in on where to fix things.

I would love to avoid altering logic that creates/queries the indexes  unless i need to. 

thanx in advance~!

Warner

1/6/2011 11:18:26 AM
Gravatar
Total Posts 1203
Proud member of the mojoPortal team

Help support mojoPortal!
Add-on modules

Re: Duplicate Search Results Coming from internal search

We had that same problem (I think it was a bug in one of the older versions of mojoPortal that was causing it). We rebuilt the search index, and it took care of it. I also added the key

<add key="ShowRebuildSearchIndexButtonToAdmins" value="true" />

to user.config, so the webmaster can rebuild the search index in the future if things get out of whack.

Jamie

1/6/2011 11:18:39 AM
Gravatar
Total Posts 76

Re: Duplicate Search Results Coming from internal search

Oops I meant for this in the developer forum.  I will repost there... please look to respond there...

1/6/2011 11:24:25 AM
Gravatar
Total Posts 76

Re: Duplicate Search Results Coming from internal search

Jamie,

Thanks but i am running close to the cutting edge for mojoPortal(v2.3.5.8),

I have rebuilt the indexes many times... no luck~

I moved response to the developer forum : http://www.mojoportal.com/Forums/Thread.aspx?thread=7149&mid=34&pageid=5&ItemID=9&pagenumber=1#post29524

1/6/2011 11:33:29 AM
Gravatar
Total Posts 18439

Re: Duplicate Search Results Coming from internal search

Hi,

The thing to understand is that each instance of html content on the page is indexed separately, so if there are four instances of html content on the page and they all contain the word "risk" it will result in multiple hits for the page because each html item is on that page.

Really the question should not have been moved to the developer forum because it is not a question about working with source code in visual studio. This forum or the questions about site administration forum are both more appropriate for this question.

In the next release of mojoPortal there will be an option in the settings for each html instance in case you want to exclude it from the search index but it will exclude it completely if you do that.

Hope it helps,

Joe

1/6/2011 1:45:03 PM
Gravatar
Total Posts 76

Re: Duplicate Search Results Coming from internal search

Thanks Joe,

Makes sense now.  I look forward for that feature.  

If i might be so bold, might I make a request for the ability to limit the the results so pages are only listed once in the return search. 

 

 

1/6/2011 2:17:11 PM
Gravatar
Total Posts 1203
Proud member of the mojoPortal team

Help support mojoPortal!
Add-on modules

Re: Duplicate Search Results Coming from internal search

I just thought of something else. You can use this key and search results will have contextual information so it will be more clear that the results are from different portions of the same page:

<add key="EnableSearchResultsHighlighting" value="true" />

You can see what this looks like on our web site.

1/7/2011 12:50:56 PM
Gravatar
Total Posts 76

Re: Duplicate Search Results Coming from internal search

Thanks~ 

I played with that, but in the end I turned on Bing search so the duplicate entries for the same page were removed...

"the suits" like the results better... now

Didn't feel like paying for Google search (paying since we don't want Google ads)

1/7/2011 7:07:10 PM
Gravatar
Total Posts 1203
Proud member of the mojoPortal team

Help support mojoPortal!
Add-on modules

Re: Duplicate Search Results Coming from internal search

I'm glad you came up with a solution that works for you. What's kind of funny is that with our old HTML site we were paying for a search service, so the mojoPortal Lucene search was heaven-sent! laugh

1/11/2011 8:37:08 AM
Gravatar
Total Posts 76

Re: Duplicate Search Results Coming from internal search

I do plan on looking into ways to do some customizing of the internal search engine results as I am now working on a high security Client Portal site, so external crawlers wont work. But hopefully I can push it off until some of the changes get into the source control. :)

thanks everyone for the help~

1/11/2011 1:29:27 PM
Gravatar
Total Posts 18439

Re: Duplicate Search Results Coming from internal search

The internal search engine is a complex system because it does have to account for permissions and whether the user can see the page so that we don't leak any secure data in search results. So basically when view roles change on pages or content it has to be re-indexed. Crawler based search engines have it much easier and it makes it easy to index based on the specific url so that there are no duplicates. Basically we store the view roles also in the search index and we pass in the user roles during search so we can filter results.

In mojoPortal content is not indexed based on the url and not by a crawler and the "Page" is not what is indexed but the feature instances on the page if they are searchable the feature is responsible for indexing its own content. So in the search index the structure of the index is not the same as the structure of the site, there is a document for each indexable item in the search index so each matching document in the index is a hit in search results even if more than one document points to the same url. Pages don't know how to index anything and may contain any number of searchable/indexable features including custom features that developers may implement themselves that may also implement search.

The only filtering approach that I can think of based on url would be done during databinding of the search results where we "could" keep track of the url for each result and if the previous result had the same url then filter the item out. The down side of this approach is that it is problematic for paged search results, if the page size is 10 and the first 10 items have the same url it would render 1 row on the first page of results doing that kind of filtering. So, I'm not real keen on that approach. 

Suggestions I have to help with this are:

  • If you are creating pages with only Html content features on the page and using the built in column layout, you can instead use 1 instance of Html and do layout inside that instance using content templates. This would improve page performance by reducing hits to the database and would also mean there is only one indexable html item on the page therefore only 1 search result. For example the home page on this site has apparently 3 columns but that is really just 1 instance of Html and the column layout is internal to the instance.
  • The new setting that allows you to exclude an html instance from search may help with this and is implemented already in the source code repository
  • You could make the search results look more distinct even though they point to the same page by making it show the instance title. Add this setting to user.config <add key="ShowModuleTitleInSearchResultLink" value="true"/> I can't remember if this requires rebuilding the index but if it does not show them then try rebuilding the index

Hope it helps,

Joe

You must sign in to post in the forums. This thread is closed to new posts.