Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle unprefixed attributes in Element#attribute correctly #151

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

makenowjust
Copy link
Contributor

Currently, Element#attrubute handles unprefixed attributes as attributes with the default namespace, and handles the default namespace as nil value. However, it is a mis-interpretation of the specification of XML 1.0.

XML 1.0 specification says:
(https://www.w3.org/TR/xml-names/#defaulting)

The namespace name for an unprefixed attribute name always has no value.

Therefore, we need to distinguish nil and the default namespace. Also, unprefixed attributes should not be handled as attributes with the default namespace.

Note that XML 1.0 specification also says:

Default namespace declarations do not apply directly to attribute names; the interpretation of unprefixed attributes is determined by the element on which they appear.

This means that we can know (programmatically) how to handle unprefixed attributes from the element's namespace. It does not mean that the namespace of an unprefixed attribute becomes the default namespace.

This commit changes Element#attribute behavior correctly and updates tests for it.


@kou It breaks backward compatibility, so I'm concerned. What do you think?

Currently, `Element#attrubute` handles unprefixed attributes as
attributes with the default namespace, and handles the default
namespace as `nil` value. However, it is a mis-interpretation of
the specification of XML 1.0.

XML 1.0 specification says:
(https://www.w3.org/TR/xml-names/#defaulting)

> The namespace name for an unprefixed attribute name always has
> no value.

Therefore, we need to distinguish `nil` and the default namespace.
Also, unprefixed attributes should not be handled as attributes
with the default namespace.

Note that XML 1.0 specification also says:

> Default namespace declarations do not apply directly to attribute
> names; the interpretation of unprefixed attributes is determined
> by the element on which they appear.

This means that we can know (programmatically) how to handle
unprefixed attributes from the element's namespace. It does not mean that
the namespace of an unprefixed attribute becomes the default namespace.

This commit changes `Element#attribute` behavior correctly and updates
tests for it.
Comment on lines +1273 to +1276
# xml_string = "<root xmlns:a='http://example.com/a' a:x='a:x' x='x'/>"
# document = REXML::Document.new(xml_string)
# document.root.attribute("x") # => x='x'
# document.root.attribute("x", "a") # => a:x='a:x'
# document.root.attribute("x", "http://example.com/a") # => a:x='a:x'
Copy link
Contributor Author

@makenowjust makenowjust Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also updated here to make it clear that the namespace argument takes a namespace URI.

@kou
Copy link
Member

kou commented Jun 19, 2024

I know the specification (and I know the specification confuses many people) and I can understand this proposal.
But I don't want to introduce incompatibility as much as possible.

In this case, it seems that Element#attribute uses a loose approach. For example, it ignores undefined namespace URI. Based on it, the current behavior (the default namespace is ignored) may be acceptable.

How about adding a new strict version of Element#attribute (and deprecating Element#attribute? Is it too aggressive?) instead? It seems that we have Element#text for text node value and Element#get_text for text node. How about Element#get_attribute? But get_XXX isn't Rubyish. Element#text_node/Element#attribute_node may be better?

@makenowjust
Copy link
Contributor Author

@kou I agree with deprecating Element#attribute. Also, I'm not sure a new strict version is necessary because we have Element#attributes and Attributes#get_attribute.

@kou
Copy link
Member

kou commented Jun 19, 2024

Then how about documenting this instead of changing the current behavior?
Element#attributes and Attributes#get_attribute will be too long as an alternative way. We can revisit an API for this when someone claims this again.

@makenowjust
Copy link
Contributor Author

In many cases, people want to obtain an attribute value, and don't want an Attribute object. Thus, people should use element.attributes[name], and long elemene.attributes.get_attributes(name) is not a problem. (get_attribute seems redundant to me, but it is another problem.)

@kou
Copy link
Member

kou commented Jun 19, 2024

element.attributes[name]

FYI: We can use element[name]. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants