Fast replacements of characters

  • I am looking for the fastest way to remove some characters from a WPRichText document so that it can be saved into an XML file. Basically, this is a sluggish way to achieve this.

    while WP.Find('&', False) do WP.SelectionAsString := '@amp;';
    while WP.Find('@amp;', False) do WP.SelectionAsString := '&';
    while WP.Find('"', False) do WP.SelectionAsString := '"';
    while WP.Find('''', False) do WP.SelectionAsString := ''';
    while WP.Find('>', False) do WP.SelectionAsString := '>';
    while WP.Find('<', False) do WP.SelectionAsString := '&lt;';

    But it would be much more efficient to scan the whole document and replace characters ['&','''','"','>','<'] with their corresponding string.

    What would be the quickest way to do those replacements?

  • How about

    tmp := StringReplace(WP.AsANSIString('WPT'),'>','&gt;',[rfReplaceAll])
    ditto for the other changes then

    WP.AsString := tmp

    It might also be worth investigating the effect of WP.AsANSIString('HTML'). I'm not sure if it will zap stuff you want leaing in or not.

  • Zitat

    "it is not optimal - better create the entities in the writer"

    Well! I have no idea what you you mean by "create the entities in the writer".

    I need to perform those transformation on thousands of documents so I need a fast way to do this, so I would be interested to know what you mean by this.

    • Offizieller Beitrag

    Hi,

    Zitat

    Well! I have no idea what you you mean by "create the entities in the writer".

    I need to perform those transformation on thousands of documents so I need a fast way to do this, so I would be interested to know what you mean by this.

    &amp; is an entity - you can write thos strings in the text writer class. It is pretty simple to create a text writing class. See unit WPWrite2.pas. Create a new unit, inherit from the class TWPTextWriter and override the function WriteChar to create an entity for the special character. Of course you need to replace fpOut.Write(save_pc^, len) with a

    pc := save_pc;
    while len>0 do
    begin
    fpOut.Write(pc^, 1);
    dec(len);
    inc(pc);
    end;

    so one character is handled at a time.

    Julian