Font size / name + HTML

  • When reading rtf -> writing html -> loading html the font switches.
    WpTools seems to behave very strange at least when using styles. Or maybe i have done some setting that confuses the reader and writer classes.

    My RTF document has the following header.

    Code
    {\rtf1\ansi\deff0\uc1\ansicpg1252\deftab254{\fonttbl{\f0\fnil\fcharset1 Verdana;}{\f1\fnil\fcharset1 Verdana;}{\f2\fnil\fcharset1 Times New Roman;}{\f3\fnil\fcharset2 Wingdings;}{\f4\fnil\fcharset2 Symbol;}{\f5\fnil\fcharset2 Webdings;}}{\colortbl\red0\green0\blue0;\red255\green0\blue0;\red0\green128\blue0;\red0\green0\blue255;\red255\green255\blue0;\red255\green0\blue255;\red128\green0\blue128;\red128\green0\blue0;\red0\green255\blue0;\red0\green255\blue255;\red0\green128\blue128;\red0\green0\blue128;\red255\green255\blue255;\red192\green192\blue192;\red128\green128\blue128;\red0\green0\blue0;\red128\green128\blue0;}\wpprheadfoot0\paperw11906\paperh16838\margl1882\margr1882\margt1440\margb1440\headery254\footery254\endnhere\sectdefaultcl{\*\generator WPTools_6.060;}{\info{\*\operator hvli}

    And the following stylesheet

    Code
    {\stylesheet{\s1\li0\fi0\ri0\sb0\sa0\ql\vertalt\f0\fs20 Normal;}{\s2\li0\fi0\ri0\sb0\sa0\ql\vertalt\f1\fs20 Normal;}{\s3\li0\fi0\ri0\sb0\sa0\ql\vertalt\fs20 Default Paragraph Font;}{\s4\li0\fi0\ri0\sb0\sa0\ql\vertalt\f5\fs22\snext3 @font-face;}

    The first 2 paragraphs in the document are empty:

    Code
    {\pard\plain\s4\li0\fi0\ri0\sb0\sa0\ql\vertalt\f1\fs20\par\pard\plain\s4\li0\fi0\ri0\sb0\sa0\ql\vertalt\f1\fs20\cf15\par\plain\s4\li0\fi0\ri0\sb0\sa0\ql\vertalt\f1\fs20\cf15 Vriendelijke groet, \par

    Reading this document all works well. The first paragraph have font name "Verdana" (f1) and font size 10 (fs20)

    Now when i save it to html i get the following:

    Code
    </style></head><body><div class="@font-face" style="text-indent:0.00in;text-align:left;vertical-align:top;margin:0.00in;">&nbsp;</div><div class="@font-face" style="text-indent:0.00in;margin-left:0.00in;margin-right:0.00in;">&nbsp;</div><div class="@font-face" style="text-indent:0.00in;margin-left:0.00in;margin-right:0.00in;"><font face="Verdana" size=2 color="black">Vriendelijke groet, </font></div>

    Now it lists the first 2 paragraphs as having font name = 'Webdings' and font size is 11.
    However if i type on the second empty paragraph it still shows up as Verdana 10. The first paragraph however shows up as Webdings 11.

    So both on storing HTML as well as reading HTML it makes no sense to me.

    Now if i change the html to the following:

    Code
    <div class="@font-face" style="text-indent:0.00in;text-align:left;vertical-align:top;margin:0.00in;">&nbsp;<font face="Verdana" size=3></font></div>
    <div class="@font-face" style="text-indent:0.00in;margin-left:0.00in;margin-right:0.00in;">&nbsp;<font face="Verdana" size=3></font></div>

    My first paragraph is still font size 11 (probably because of the stylesheet)
    But my second empty paragraph is font size 12.

    In my opinion the correct order for choosing the font should be:

    - font setting from the paragraph
    - stylesheet setting from the paragraph
    - default font for the document

    For some reason it seems empty paragraphs behave differently. As if they are looking at the next paragraph to tell them what the current font is.

    Again when reading RTF, not a problem at all. Add HTML to the mix and you're in big trouble.

    WPTools 6.13.1

  • I have changed the html writer slightly so it adds the font information for empty paragraphs:

    This is almost similar to the RTF writer. What is weird though is that i'm passing -1 as parameter to UpdateCharAttrEx. But at least it seems to work.

    Now all that remains is fixing the HTML reader. I have tried forcing all font style and size to Verdana 10, but the first empty paragraphs always seems to revert back to the font in the style sheet.

    I realise that all this doesn't matter if we are just displaying HTML, but i need to use it as an editor as well. For example my users are replying to an email and when they start to type what used to be the default font no longer functions and it reverts to whatever font from the stylesheet is used.

  • I think i found the problem here.

    The LoadedCharAttr is not set inside the HTML reader. The RTF reader does set this variable.

    I'm not sure if this is correct, but again: it seems to work.

  • I had to change the code a little. It seems loadedCharAttr was now being set on the wrong paragraph.

    My new solution now also includes a CheckDivPar. I cant really explain how or why this works and i don't know wether this will create other problems, so lets call this code change experimental.

  • If an empty paragraph isn't visible (i dont see why it should not be) then setting LoadedCharAttr shouldn't matter much i suppose.

    Do you mean that on writing HTML it should only write the style class and no attributes? I don't see why, because that would give the paragraph a different font setting from what it has in the editor.

    I guess i don't really see any downside to handling HTML as if it was RTF. But that may have to do with my lack of understanding on both formats.

    The issue may be that i'm mixing RTF with HTML, so if there is a difference in how both formats are "handled" then this creates inconsistencies for the user. In this case users reported this to me as a bug and i can't blame them for it.

    • Offizieller Beitrag

    You can really do that as You need - this is one reason the source for the HTML reader/writer is provided.

    The default reader/writer tries to do it like an internet browser when reading data.

    Zitat

    I guess i don't really see any downside to handling HTML as if it was RTF. But that may have to do with my lack of understanding on both formats.

    HTML is a style driven format while RTF is a state driven format. Implementing styles is very complicated in RTF and requires workarounds, such as removing redundant state information.

    Ideally the HTML reader would not use any CharAttr at all but only span objects with a style attached - but since this makes editing pretty hard and conversion to RTF difficult it uses CharAttr by default.

  • It seems anything i try has some bad side effect...

    After the change it writes span styles for the empty paragraphs and includes the &nbsp;. However if i load this result into wptools and then save it again as HTML it no longer contains the &nbsp. This is because the character count is now 2, (characters used for opening en closing the span object).

    This means i have to adjust the code yet again.

    this doesn't feel like a good solution anymore...

    • Offizieller Beitrag

    Hi,

    &nbsp; should be converted into #160 and saved as &nbsp; as I just tested

    HTML
    <html>
    <head></head><body>
    <p style="font-size:9.00pt;margin:0.00in 0.00in 0.10in 0.00in;"><span style="font-family:'Arial';font-size:9.00pt;">&nbsp;</p>
    <p style="font-size:9.00pt;margin:0.00in 0.00in 0.10in 0.00in;"><span style="font-family:'Arial';font-size:9.00pt;">&nbsp;</p>
    <p style="font-size:9.00pt;margin:0.00in 0.00in 0.10in 0.00in;"><span style="font-family:'Arial';font-size:9.00pt;">&nbsp;</p>
    </body></html>