Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foreign object model; round-trip preservation #16

Open
kohlhase opened this issue Oct 2, 2017 · 9 comments
Open

Foreign object model; round-trip preservation #16

kohlhase opened this issue Oct 2, 2017 · 9 comments
Assignees

Comments

@kohlhase
Copy link
Member

kohlhase commented Oct 2, 2017

see OpenMath/OM3#148

@davidcarlisle
Copy link
Member

It's not clear to me that there is any issue to be addressed here, the fact that a string encoded the binary encoding has < or > in it shouldn't be an issue.

@kohlhase
Copy link
Member Author

kohlhase commented Oct 5, 2017

I tend to agree with David here. Let's ask @lars-hellstrom if he upholds his problem.

@lars-hellstrom
Copy link
Contributor

Yes, I maintain that there is a problem! The problem is that it is not clear from the standard how foreign objects in one encoding correspond to those in another encoding.

This becomes painfully clear if you imagine writing a tool that converts between the binary and XML encodings. Should a character < in the payload of a binary encoding foreign object be turned into < or &lt; in the contents of an XML encoding OMFOREIGN object? If the latter, there is no way to put tags in those contents. If the former, you implicitly say that the payload of a binary encoding foreign object is always XML, which is equally silly.

The root of the problem is probably that the standard is vague about the importance of the encoding attribute, since different types of foreign objects need different translations between different OpenMath-encodings. I propose that wording to the effect of imposing the following restrictions should be added:

  • If the encoding is an XML namespace, then the payload of a binary encoding foreign object is XML code (in UTF-8 encoding).
  • If the encoding is not an XML namespace, then the contents of an XML encoding foreign object may only be character data. (I.e., no tags, processing instructions, or the like.)

"XML namespace" should also be considered to include the two historical strings (that I don't have the zeal to look up right now).

@davidcarlisle
Copy link
Member

davidcarlisle commented Oct 5, 2017 via email

@kohlhase
Copy link
Member Author

kohlhase commented Oct 6, 2017

I agree with David on this.
The only real way of deciding this issue is probably to implement a round-trip tool that works (then David is right) or die trying (then there is evidence that Lars is).

@lars-hellstrom
Copy link
Contributor

An OpenMath object is fundamentally the abstract thing described in Chapter 2, which may contain foreign objects. These abstract things then have a number of encodings, as described in the sections of Chapter 3. I have always presumed that in order to qualify as an encoding of OpenMath, a scheme must be able to encode all OpenMath objects, but it seems @davidcarlisle rather says each encoding has a different domain, because they support different sets of foreign objects?

The point about what happens to a PNG when included in an XML format is an interesting one — I had assumed that the octets should just become characters &#x00; through &#xFF; (looks horrible, but should process OK), although some of them being forbidden maybe requires something like adding a base64-encoding layer on top of the octets instead — but that is a separate problem. My example shows there are problems already within the realm of character based formats. About the use of these in the binary encoding, the standard explicitly says:

Character based formats (including XML based formats) should be encoded in UTF-8 to produce a stream of bytes to use as the payload of the foreign object.

Thus: we can reliably tell which sequence of characters the payload of an OpenMath-binary foreign object encodes. What is unclear is how those characters should get encoded when transcoding to OpenMath-XML. If < is encoded as &lt; then the transcoded object cannot contain tags, which is wrong for MathML. If < is encoded as < then the character sequence had to be well-formed XML already in the binary encoding, which is wrong for everything not XML.

The two bullet points I proposed are precisely about clarifying when one should do one thing and when one should do the other.

Perhaps it's easier if I try to phrase it in RNG? Right now we have for OpenMath-XML

# foreign constructor
OMFOREIGN =  element OMFOREIGN {
    compound.attributes, attribute encoding {xsd:string}?,
   (omel|notom)* }

I think that should be changed to (though bear in mind that I'm guessing on RNG syntax)

# foreign constructor
OMFOREIGN =  element OMFOREIGN {
    compound.attributes, attribute encoding {xsd:anyURI}?,
   (omel|notom)* }
  |
  element OMFOREIGN {
    compound.attributes, attribute encoding {xsd:anyMIME}?,
   text }

@davidcarlisle
Copy link
Member

davidcarlisle commented Oct 6, 2017 via email

@lars-hellstrom
Copy link
Contributor

@davidcarlisle wrote:

You can put any sequence of bytes in an xml file if you interpret that as base64 encoding the sequence and putting that in so I assumed that going from binary to xml that's what you'd do.

For PNG I suppose that works OK, provided you change the encoding from image/png (in OM-binary) to maybe image/png; content-transfer-encoding=base64 (in OM-XML).

It does mean that if you start with some xml encoded foreign object in an xml openmath object and map it to the binary encoding and back you may have a different encoding (base 64) of the original but it's not really ambiguous.

If you do that, then I strongly suspect that the elements in that foreign object will no longer show up as elements in the DOM tree. That's not what is expected when you do a round trip — the conversion operations should be inverses of each other (up to relevant equivalence), not just injective.

you can't syntactically distinguish a mimetype from a (possibly relative) URI text/xml could be that mimetype or it could be a relative url to text/xml.

Oh, right! I was thinking "absolute URI", not "any URI"; my bad. The idea was to catch XML namespaces — those should be absolute URIs, should they not?

@davidcarlisle
Copy link
Member

should be inverses of each other (up to relevant equivalence)

well there's the rub. I would consider an OM XML with a foreign object encoded as xml with encoding="foo" and another OM XML with the same foreign object encoded as encoding="base64-encoded-foo" as equivalent if they represent the same OM object, I think you'd consider them different.

practically if you do the simplest, safest thing each time then every time you convert and convert back you will add another layer of base64 encoding, some systems may special case known encodings and avoid that (or in practice most systems will only handle a very limited range of foreign objects anyway)
but that is a matter of software usability and I don't think we need to standardise it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants