Date: prev next · Thread: first prev next last
2011 Archives by date, by thread · List index

Re: [tdf-discuss] Need for more compound words for spellcheck dictionary.


RGB ES wrote:
AFAIK, hunspell is used for all dictionaries so the problem is not the
engine but the dictionary itself: they need to be build around the
idea of using compound words and that huge work (it seems) is not
complete yet.

Ricardo
That makes sense, the spell checker works very well except for the fact that it flags far too many words that are not actually misspelled, just absent from the internal word list. My experience is that the vast majority of these words are compound words.
2011/3/2 Friedrich Strohmaier <damokles4-listen@bits-fritz.de>:
Hi Robert, *,

I'm not very deep involved in spellchecking, but nevertheless trying a
shot..

Robert Derman schrieb:

RGB ES wrote:
AFAIK, LibO dictionaries are the same dictionaries from OOo.  If you
have a custom dictionary where you added the words you miss, you can
"import" (I mean, copy to the right location) that dictionary into
LibO user profile. See here for more details about the user profile:
http://wiki.documentfoundation.org/UserProfile
2011/2/20 Robert Derman <robert.derman@pressenter.com> :
One of the reasons, perhaps the main reason I have not upgraded to
LO from OpenOffice 3.1 yet is that I dread having to go through the
process of adding over a thousand compound words to the spellcheck
dictionary.  This dictionary has almost NO compound words in it!
Does anyone know if this problem has been addressed with LO 3.3.  I
am using the U.S. English version.  If this severe shortcoming has
not yet been addressed yet, I think we should do so before version
3.4.
If I remember well german, dictionary changed to hunspell dictionary
engine for that reason. German and many more languages' words are
compound words in a very wide range so that problem arose from
beginning. Not shure what spellchecking engine is used for english
languages spellchecking - I guess it's aspell which has poor support for
compound words.

But all guessed. Not enough insight in that topic.

[.. impact of poor spellchecking ..]
I have a sort of technical question here. Is there a way for non programmers to actually get a look at the word list that comes with LO? And on a related point, if so, perhaps a group of volunteers could add the words that are needed to that list and then send the enhanced word list to the developers so that it could replace the inadequate word list that is now used.

I don't know if the situation is as bad in other languages as it is in English, I do know that German in particular is heavy with compound words. If so, this might need to be done with a number of languages.

On reason I am concerned about this is that in most areas where OOo/LO differ from MS Word it could be perceived that they are just different. One is not necessarily better or worse than the other. But as far as the performance of spell checking is concerned, when compared with Word, Writer will be clearly perceived as inferior! Clearly this will not make a good impression on the people in business who make the decisions as to what software their people should be using. Clearly having a spell checker flag many non-misspelled words will slow down workflow and that is a situation that most people in business simply will not tolerate.

So if there is not a better word list available that we can just drop in, then we really need to go to work and do it ourselves!


On a related subject, I have spent just a couple of hours adding things to the autocorrect which turned it into a fair quality grammar checker, just little things like change friday to Friday, january to January, etc. The grammar checker in Word is pedantic and obtrusive. I truly believe that with a very small amount of work, we could turn the autocorrect into the kind of grammar checker that many if not most people would be very happy with.

One other thing, I make extensive use of the autocomplete function, but it has a couple of annoying traits. How much would it take to either make its internal word list optionally permanent, so that you could turn of the gather words function without loosing the autocomplete function completely, or add a few rules like, don't gather strings containing numerals, all caps like from chapter headings, or inappropriate punctuation marks. With a couple of things like this, I believe many more people would actually choose to use this feature, rather than ask how to turn it off.

--
Unsubscribe instructions: E-mail to discuss+help@documentfoundation.org
Archive: http://listarchives.documentfoundation.org/www/discuss/
*** All posts to this list are publicly archived for eternity ***

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.