News:

SMF - Just Installed!

Main Menu

HTML Entities

Started by mgmino, August 05, 2024, 03:56:54 PM

Previous topic - Next topic

mgmino

It appears that some HTML Entities are not supported. The epub book has — which causes an error.

kweckwor

Yes, ePub is made up of XHTML files and not HTML, and XHTML doesn't support all named entities as far as I know. In fact I think it only really supports a small set of about 5 entities. It should support the decimal version of the entity though (e.g. — can be substituted with — - I've verified that works fine).

I think the level of support for named entities varies with the browser and what is defined in the DTD that is applicable for the document type. In the case of my app, I'm using the Android WebView component (based on Chromium, which is lower level component and not a full browser like Chrome). In any case, the error displayed by the WebView renderer is that the mdash name is not defined. I was able to add the definition to the document using an !ENTITY tag and that works as well, but that's not something I can do in my app dynamically (once the document is rendered I can't inject the definition any longer).

So if you have control over the ePub content, you can substitute the decimal version of the entity rather than using the named version.

In the mean time, I'm going to see if I can do that substitution automatically. I don't think it should be a problem, but I don't know what the impact to performance will be. So I'll see how that goes. If all goes well, I might only support a subset of what I think are the more common named entities that I think might be used.

I'll post again with more details once I figure things out better.

Karl

kweckwor

Here is my current game plan for "named/character entity" substitution:
  • Automatically substitute the following entities:
    •  = 
    • ©=©
    • ®=®
    • ™=™
    • –=–
    • —=—
    • ‘=‘
    • ’=’
    • “=“
    • ”=”
    • …=…
  • For any named entity that is not mapped, display a space character. This should prevent truncating the content with a parsing error, which doesn't seem to me like a good user experience.
  • If your books have any other named entities that you would like to render, you can add those to the tool by updating a file called entity.txt in the app shared storage folder. I'll include more details in the online documentation for how to accomplish that when I release the updates.

If you (or anyone else reading this thread) have suggestions for other entities that should be handled by default, let me know and I'll consider adding those as well.

I've already implemented the above, but I want to test it more thoroughly, along with some other updates that are already in progress. I think I'll be able to release in the next 2-3 weeks.

Karl