This week I had a question from someone who tried to find a PDF based on keywords in the file-name and it did not show up in the results.
Wordbreaker is a language specific active-x control that breaks compound words (obvious!).
Ie. when i have the browser language set to English, the word "thumbnail" would be broken into "thumb" and "nail". This wordbreaker is designed to improve search results. And when searching for content this works as expected. So if I'm searching for "thumbnail" it will return results with "thumb", "nail" and "thumbnail".
However, when the searchterm is not in the content or meta data (only in the in the filename: not a best practice, I know), the wordbreaker is not working quite as it should.
I have:
So what I've done here is check if it is indeed the wordbreaker by changing the language, furthermore I've added an empty document to test if it isn't the extension / iFilter.
If the word is put in the content, it will show in the results and funny enough also have the filename / title highlighted.
Okay, problem is indeed the wordbreaker but how do we fix it? I couldn't find any solution on the internet, although I found plenty of people with the same issue.
Adding quotes doesn't work, maybe I should tell the users to change their browser language?
Nah, c'mon! The wordbreaker helps improve results most of the time, so this not really an option.
Finally, I found a workaround though that I'm happy to share with you guys: when adding a ".*" (dot and asterix) or a ".ext" (dot extension) SharePoint Search does not use the wordbreaker. Pretty lame, but it kinda works.
Hopes this will help!
The situation
I found out that this is caused by the (in)famous wordbreaker.Wordbreaker is a language specific active-x control that breaks compound words (obvious!).
Ie. when i have the browser language set to English, the word "thumbnail" would be broken into "thumb" and "nail". This wordbreaker is designed to improve search results. And when searching for content this works as expected. So if I'm searching for "thumbnail" it will return results with "thumb", "nail" and "thumbnail".
However, when the searchterm is not in the content or meta data (only in the in the filename: not a best practice, I know), the wordbreaker is not working quite as it should.
Examples
I'll explain this in the following examples:I have:
- an image called ladybug.jpg
- an image called lady bug.jpg
- an empty document called ladybug.doc
- an empty document called lady bug.doc
- a document with ladybug in the content called ladybug.doc
Test1, browser language set to English, query "ladybug"
- not shown in result
- shown in result
- not shown in result
- shown in result
- shown in result.
Test2, browser language set to Dutch, query "ladybug"
- shown
- not shown
- shown
- not shown
- shown
So what I've done here is check if it is indeed the wordbreaker by changing the language, furthermore I've added an empty document to test if it isn't the extension / iFilter.
If the word is put in the content, it will show in the results and funny enough also have the filename / title highlighted.
Conclusion
It appears that when no results are found in the content, only the separate words are used to search and not the compound word itself..Okay, problem is indeed the wordbreaker but how do we fix it? I couldn't find any solution on the internet, although I found plenty of people with the same issue.
Adding quotes doesn't work, maybe I should tell the users to change their browser language?
Nah, c'mon! The wordbreaker helps improve results most of the time, so this not really an option.
Finally, I found a workaround though that I'm happy to share with you guys: when adding a ".*" (dot and asterix) or a ".ext" (dot extension) SharePoint Search does not use the wordbreaker. Pretty lame, but it kinda works.
Hopes this will help!
0 comments:
Post a Comment