Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deactivate underscores when expanding natbib's \bibitem[label] #2385

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dginev
Copy link
Collaborator

@dginev dginev commented Aug 6, 2024

This is a minor change avoiding a needless error in natbib's \bibitem.

A minimal motivating example (that I could turn into a test) is:

\documentclass{article}
\usepackage{natbib}
\begin{document}

\begin{thebibliography}{1}

\bibitem[_xy(1899)]{_xyz_1899}
 A name of something. (accessed Nov 01, 1899).

\end{thebibliography}
\end{document}

Note the underscores in the \bibitem use, especially the one in the optional label [] argument. These survive well under pdflatex -- and to my observations are largely ignored, at least in the specific document I am studying that uses this.

With the current latexml master, this example produces two unfortunate errors of the kind:

Error:unexpected:_ Script _ can only appear in math mode at test.tex; line 7 col 0
Error:unexpected:_ Script _ can only appear in math mode at test.tex; line 7 col 0

The PR simply switches the offending argument to Semiverbatim in the natbib parser, deactivating the underscore's math behavior.

@dginev dginev requested a review from brucemiller August 6, 2024 18:38
@dginev
Copy link
Collaborator Author

dginev commented Aug 6, 2024

This idea may warrant some extra discussion... If we want math mode constructs to still expand in this argument,such as:

\bibitem[Ex$\ddot{a}$mple(1899)]{...}

then Semiverbatim is a bit misplaced - it deactivates the $ , but will expand the \ddot in the natbib Expand($label) call.

Maybe I should invent a new parameter type, which only deactivates the underscore? Thoughts welcome. That would look a bit more on the lines of:

DefParameterType('NatbibSemiVerbatim', sub {
	# deactivated underscore
	my $arg = $_[0]->readArg; 
	my @inactive = map {Equals($_, T_SUB) ? T_OTHER("_") : $_ } $arg->unlist; 
	return Tokens(@inactive); });

Edit: a slightly more direct version of a new parameter, which only deactivates underscore. A bit patchy possibly, but it is a little unclear which behavior natbib is aiming for exactly.

@dginev dginev changed the title allow semiverbatim content in natbib \bibitem[label] [WIP] allow semiverbatim content in natbib \bibitem[label] Aug 6, 2024
@dginev dginev force-pushed the natbib-semiverbatim-key-for-bibitem branch from 06e0e91 to b8731d7 Compare August 28, 2024 02:52
@dginev dginev changed the title [WIP] allow semiverbatim content in natbib \bibitem[label] Deactivate underscores when expanding natbib's \bibitem[label] Aug 28, 2024
@dginev
Copy link
Collaborator Author

dginev commented Aug 28, 2024

The general observation is that when a bare label is used in natbib's \bibitem[label] - but its entry isn't cited - pdflatex won't emit an error. I believe this has to do with writing that data out via \NAT@wrout which won't trigger expansion. Only after the written data is read back in (usually on a next call to pdflatex) could issues with underscore activation come up - and only if \cite used that entry.

So, for now, I have decided to not change the parameter types, but instead guard LaTeXML's emulation which uses an explicit Expand() call. Deactivating the underscores prior is sufficient.

I also added a test for this kind of tortured use case.

@brucemiller
Copy link
Owner

Your last observation almost gets it, I think. This label argument is getting expanded before writing to the aux file, but it is not digested until later, and only if the bibitem is cited. So that would mean that undefined macros or # will cause immediate problems during latex's expansion of the label, but tokens that only affect digestion will pass through until they're cited - if ever! So, not just _, but ^, & or even a single $ (or really any sequence that can't be digested) would be ignored by latex if not cited, but (currently) cause problems for LaTeXML. Moreover, _ itself isn't the problem; it's fine inside of, say \bibitem{foo$a_b$(1999)}{underscore}.
Arguably these documents are in "Error", even if they don't cause errors, so I wonder how deep we should go. But if we were to try to fix it, I think we need to track where the label gets digested and use some kind of error-free digestion(?)

@dginev
Copy link
Collaborator Author

dginev commented Sep 17, 2024

Good point, we should be approaching this even more generally. Having a dedicated parameter type that "postpones" the errors of certain Digest steps could be tricky... But maybe there is something there.

We have a natural place to anchor such a new parameter, at the DefConstructor for \NAT@@wrout. It may be worth playing around a bit with the example I had concocted. I'll investigate.

@dginev dginev marked this pull request as draft September 17, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants