WPTools, Addict and Cyrillic script

  • Hi,

    Can anyone outline what are the basic steps I need to take to enable WPTools + Addict pair to support spellchecking of Cyrillic text.

    The problem seems to be with the parser not recognizing characters properly. After typing in some Cyrillic text and initiating spell check, Addict's spell dialog appears, but the Not Found edit box contains some weird characters and numbers, and word boundaries are often not set correctly.

    The same can be reproduced with Addict+WP demo: WptAdDemo.exe

    Similar problem occurs when I use Addict with RichEdit98 and do not set its Language property to Serbian (Cyrillic). However, after setting Language property to Serbian, Addict's parser works perfectly!

    How can I achieve the same with WPTools?

    I am using WPTools 5.20.2, Addict 3.44 with BCB6 Ent on XP Proffesional.[/img]

  • OK, here is a better description, together with my understanding of how it works (Please, correct me where I'm wrong):

    WPTools internally works with Unicode, and Addict works with AnsiStrings.
    Each time a conversion is performed, code page should be taken into account, otherwise results will not be what I expect.

    If I switch to Serbian (Cyrillic) keyboard, and type in some correctly spelled words, WPTools shows them properly because for each character UpdateCodePage function is executed. However, Addict does not recognize them properly, and underlines them as misspelled, because of the code page issue.

    Here is the solution I found:

    In WPTAddict.pas I located a procedure WPOnSpellCheckWord:

    Code
    {$IFDEF WPTOOLS5}procedure TWPTAddictInstance.WPOnSpellCheckWord(Sender: TObject; var Word: WideString;       var Res: TSpellCheckResult; var Hyphen: TSpellCheckHyphen;           par : TParagraph;posinpar : Integer);begin  if (Assigned(FAddictSpell)) and FLiveSpell then  begin    if (FAddictSpell.WordAcceptable(Word)) then...

    Addict's function WordAcceptable will automatically convert Word parameter, using current code page. Therefore, deliberate conversion to Cyrillic codepage was needed:

    Code
    if (FAddictSpell.WordAcceptable(WideStringToStringEx(Word, 1251))) then
    ...

    Now, Addict perfectly singles out only really misspelled words.

    This, however, raises two questions:

    1. I have hardcoded Windows Cyrillic code page 1251, which is bad. Of course, I can use _CodePage variable, but anyway this would imply that only one code page can be supported per document. No mixing of English and Serbian text within one document :( . It would be nice to have this codepage attribute somewhere in word, style or at least paragraph level. Or, this already exists :

    2. Addict also needs to generate suggestions, and to replace misspelled word. Can you please tell me which functions should I analyze, in order to find where else the conversion is needed?

    • Offizieller Beitrag

    Hi,

    >> 1. I have hardcoded Windows Cyrillic code page 1251, which is bad. Of course, I can use _CodePage variable, but anyway this would imply that only one code page can be supported per document. No mixing of English and Serbian text within one document :( . It would be nice to have this codepage attribute somewhere in word, style or at least paragraph level. Or, this already exists : <<

    There is no codepage attribute but a charset attribute.

    AttrHelper.CharAttr := par.CharAttr[ index ]
    if AttrHelper.GetFontCharset( n ) then
    codepage := WPGetCodePage(n)


    >>2. Addict also needs to generate suggestions, and to replace misspelled word. Can you please tell me which functions should I analyze, in order to find where else the conversion is needed?<<

    Search for par.GetSubText - that is usually used to exatrct a part of the paragraph as widestring.

    Julian