Problem with search in German

This is the place to report bugs and get support. When posting in this forum, please always provide as much detail as possible.

Please do not report problems with a custom build or custom code in this forum. If you are producing your own build from the source code and have problems or questions, ask in the developer forum, do not report it as a bug.

This is the place to report bugs and get support

When posting in this forum, please try to provide as many relevant details as possible. Particularly the following:

  • What operating system were you running when the bug appeared?
  • What database platform is your site using?
  • What version of mojoPortal are you running?
  • What version of .NET do you use?
  • What steps are necessary to reproduce the issue? Compare expected results vs actual results.
Please do not report problems with a custom build or custom code in this forum. If you are producing your own build from the source code and have problems or questions, ask in the developer forum.
This thread is closed to new posts. You must sign in to post in the forums.
9/4/2008 3:39:05 AM
Gravatar
Total Posts 16

Problem with search in German

Hello!

After using DNN 3 years and having heart-attacks at each update, i came to mojoPortal. It's an awesome project, i am getting more and more addicted.

I only have two problems:

  • FCKEditor is encoding german (and other languages, too) special characters via &...; . There seems to be no way to avoid this. But the search engine just not find these contents. Only if i change the HTML directly in the source-code mode of FCKeditor, it will be found.
  • Titles are not searchable

Any ideas?

Thanks in advance, Markus

9/4/2008 8:27:54 AM
Gravatar
Total Posts 18439

Re: Problem with search in German

Hi Markus,

As far as I know FCKeditor does not do that. Are you sure that encoding is really happening? Have you looked directly in the database to confirm it is encoded there? 

You are right about the Title search. I just tested it here and I also fixed it. The fix will be in svn trunk by tonight for developers and will be in the next release for everyone else (hopefully very soon).

Hope it helps,

Joe

9/6/2008 11:49:03 AM
Gravatar
Total Posts 16

Re: Problem with search in German

Hello Joe!

Thanks for your incredible rapid response.

Yes, i've checked HtmlContent and Blog-Entries in the database. In both cases, the content is stored html-encoded. As I mentioned before, if i go into the sourcecode-mode of FCKEditor and change it manually, it is saved without encoding. Next time, I change the same co via edit-mode, it's reverted to html-encoding.

I've found a file in a subdirectory of FCKEditor. You can find it by search at ü for example (ü in German; you can produce this char in Windows via ALT + 0252). The whole encoded charset is listed there. Maybe it helps a little bit.

Sincerely, Markus

9/6/2008 2:21:50 PM
Gravatar
Total Posts 18439

Re: Problem with search in German

Hi Markus,

I am able to produce the problem here. I'm pretty sure FCKeditor didn't used to do this. I think its a bug introduced somewhere in the last 12 months.

Clues I found:

Forum post in FCKeditor forums

I tried setting the new HtmlEncodeOutput setting (as described here and here ) to false but the problem persists.

Also looks like maybe they closed a bug ticket because they couldn't produce it, but the forum post link above says it can be produced right on their demo site.

Maybe you could follow up with the FCKeditor project and try to get some attention to this bug?

Best,

Joe

9/6/2008 2:42:41 PM
Gravatar
Total Posts 18439

Re: Problem with search in German

Hi Markus,

With a little more experimentation I found the solution to stop the encoding. Web/CientScript/mojofckconfig.js

Change this to false:

FCKConfig.ProcessHTMLEntities = false ;

Clear your broswer cache, then try the editor again.

Unforunately the only way to fix the existing content is to enter it again.

I will make the same change here so it will be fixed in the next release of mojoPortal.

Best,

Joe

9/11/2008 3:25:31 AM
Gravatar
Total Posts 16

Re: Problem with search in German

Hi Joe!

The config tweak did the job. Now, anything works like a charm. Think you deserve a beer!

Thanks, Markus 

9/11/2008 6:21:01 AM
Gravatar
Total Posts 18439

Re: Problem with search in German

Hi Markus,

Part of me is wondering if it was really the best solution. Maybe FCKeditor encodes those entities for a good reason (probably for w3c validation).

I wonder if it would also have solved it if I just html encode the search input. Maybe then it would find a match on ü because the searched term would also be encoded.

What do you think?

Best,

Joe

9/15/2008 4:02:03 AM
Gravatar
Total Posts 16

Re: Problem with search in German

Hello Joe!

I think it's a matter of browser compatibility. http://www.w3.org/TR/html4/charset.html declares, that HTML 4.0, which should be the baseline nowadays, implements ISO10646 or UNICODE character sets. All conforming user agents must support this set. There are, of course, backward issues with older browsers.

http://docs.fckeditor.net/FCKeditor_2.x/Developers_Guide/Configuration/Configuration_Options/ProcessHTMLEntities states, the characters are encoded due to W3C-standards. It doesn't tell about the reason, which is very annoying. Maybe it's cross-compatiblity with PHP (PHP has a very bad, some people say non-existing UNICODE support).

Apart from this, I suspect it right to store the input using "ü" and not "ü" in the database. That's why modern db-engines are supporting UNICODE. So, when modern databases support it, HTML 4 supports it, all relevant modern browsers are supporting it, and modern CMSs running on modern runtimes support UNICODE, I see no reason to avoid it.

But maybe it's better to also consult some people from Japan or China, because of their problems in these cases are much greater than ours over her in good ole europe.

Just my 2cents, Markus

9/15/2008 4:09:01 AM
Gravatar
Total Posts 16

Re: Problem with search in German

I checked your list of mojo Sites.

http://xna.pl/ can be searched with polish special chars, http://community.crmexpert.cz cannot be search with czech ones. Maybe the polish got another clue?

Markus

9/15/2008 6:22:52 AM
Gravatar
Total Posts 18439

Re: Problem with search in German

Hi Markus,

I guess for now I will keep the new change to disable entity encoding, but I'm only 95% convinced its the right choice given that we are using xhtml not html 4.

One thing to keep in mind about the search is we are using Lucene.NET and it does not have support for all languages. It can't index Swedish characters for example so searching on those won't work, perhaps its the same issue with Czech.

Best,

Joe

9/15/2008 10:20:03 AM
Gravatar
Total Posts 16

Re: Problem with search in German

Hello Joe!

You're right of being careful in this case. http://www.w3.org/TR/xhtml1/#guidelines is a short review of what is to be obeyed comparing HTML 4 and XHTML 1. It basically states XHTML1 is a screwed-up, xml-ed HTML4

I think, the whole story has two sides:

  • Presentation in the browser
  • tranfer of data from client to server via POST-method of HTTP

Point 1 can be answered quickly. If this "ü" is correctly displayed in your browser, anything should be OK. Better, if this is also being checked on a Japanese PC for example. I only have quick-viewed this using Japanese in Opera. It works.

Point 2 is the big question mark.

Anyway, it's time to leave work.

Sincerely, Markus

You must sign in to post in the forums. This thread is closed to new posts.