I’d like to know what is the recommended way to encode metadata that contains inline MathML or TeX code, for instance:
<title-group>
<article-title>Uma Reflexão de Professores sobre Demonstrações Relativas à Irracionalidade de <inline-formula><mml:math display="inline" id="m1"><mml:mrow><mml:msqrt><mml:mn>2</mml:mn></mml:msqrt></mml:mrow></mml:math></inline-formula> </article-title>
<trans-title-group xml:lang="en">
<trans-title>Teachers' Considerations on the Irrationality Proof of <inline-formula><mml:math display="inline" id="m2"><mml:mrow><mml:msqrt><mml:mn>2</mml:mn></mml:msqrt></mml:mrow></mml:math></inline-formula> </trans-title>
</trans-title-group>
</title-group>
Hi @gustavo.fonseca . Apologies for the very delayed response. I believe you may have submitted your question before we had all of our notifications setup for this category of posts, so we simply missed your question. I’m very sorry about that!
You may find information about including MathML in your deposits and example XML here:
Take a look and let me know if you have any additional questions,
Isaac
I’m reviving this in order to keep the discussion in one thread.
The examples for encoding inline mathematics use MATHML, but it seems that members sometimes report titles and abstracts encoded in TeX. Unfortunately some members seem to use LaTeX encoding improperly. For example, Springer uses $$ as a delimiter instead of $, but that implies display mathematics rather than inline mathematics. MATHML seems to be used more commonly by the large publishers, but that’s a bit of a pain because there are millions of authors who write in LaTeX but almost none that write in MATHML.This example uses both MATHML in the abstract and TeX in the title and titles of references.
I think the documentation could be improved on this point, and maybe this is why Springer is supplying their titles in the wrong format. Abstracts clearly need JATS encoding, which supports both TeX and MATHML. LaTeX encoding in JATS uses the <tex-math> element (this needs to be encoded as CDATA because it may contain characters like <, >, or &). titles and references are less clear, and references are almost always given in bibtex format so TeX is more natural to use. Abstracts are quite complex and may contain display mathematics. The examples in the documentation appear to use only MATHML, but almost no authors supply their information in this format. Conversion from one format to another is likely to introduce errors. In our case we capture the metadata directly from the author’s manuscript, which is in LaTeX.
Hello @mccurley! My understanding from speaking with members is that the JATS-formatted abstracts they provide aren’t marked up specifically for Crossref, for members who use JATS natively they are just passing along the abstracts as-is, so this is very likely a problem created well upstream of Crossref metadata. We ask that members comply with JATS4R recommendations, but I don’t see anything specific regarding LaTeX or MathML (https://jats4r.niso.org/abstracts/).
That said, I can look into widening the discussion about how to handle both within our metadata and I can add some language about MathML and LaTeX best practice to the documentation as well.
@PFeeney Thanks Patricia for responding on this old thread. I’m mostly trying to comply with the 5.3.1 version of the crossref schema. There are really separate cases for title and abstract. Titles are clearly described in the schema as only accepting MATHML, which is perhaps OK since there is no need for “display mathematics” (what is called “block” in JATS). We have a translator that creates MATHML from the author-supplied LaTeX, but it will fail on complicated things. Titles should be mostly ok since it involves only superscripts, subscripts, and mathematical symbols that are mostly representable in UTF-8. There are a few oddities like a z with a tilde over it that has no UTF-8 equivalent. Apparently some publishers like Springer are reporting raw LaTeX in this field. That case should probably be cleared up in the documentation (it’s problematic for something like a<b in a title).
Abstracts are often more complicated, because they might contain display mathematics. The 5.3.1 schema says that it accepts JATS, and JATS allows both MATHML but also LaTeX-encoded mathematics through the <tex-math> tag. Different publishers have different requirements and practices, but anything that comes directly from authors is likely to be encoded in LaTeX rather than MATHML. Large publishers will sometimes translate the LaTeX into MATHML, but that often requires human editing. We are building a low-touch publishing pipeline to reduce the costs of publishing, so we would like to avoid the need to translate from complicated LaTeX to MATHML in the abstract.
Journals that use OJS can accept LaTeX-encoded mathematics in both title and abstract, and then display them correctly on their website using the OJS mathjax plugin. An example is Lower Bounds for the Turán Densities of Daisies | The Electronic Journal of Combinatorics that contains display mathematics and some unusual characters to represent set intersections. It looks like there is a need for the crossref plugin to encode mathematics in titles and abstracts since their abstract was reported as raw LaTeX that violates the JATS standard. We don’t use OJS so it’s more of a problem for other small journals. Conversion of LaTeX to MATHML could be a big problem in PHP.
The JATS schema is pretty well defined, but it still has some ambiguities in it that JATS4R tries to clear up. Their guidance on mathematics is pretty clear, since they say that both <tex-math> and <mml:math> is acceptable.
Perhaps the biggest ambiguity is whether LaTeX is supported in the crossref <title> tag. Some members like Springer are clearly trying to use this (I haven’t looked for others).
Using too much mathematical jargon in a title is generally a bad idea™ but it’s sometimes hard to convince authors to change their habits.