Ticket of the month - April 2024 - A Shady Character

Have you ever completed making a deposit, thinking everything was fine, nothing seemed out of the ordinary, but then you realised that the DOIs from that submission weren’t working, and you have received the following error message?

The Crossref admin tool showing a metadata deposit error message reading "Deposited XML is not well-formered or does not validate: Error on line 14: An invalid XML character (Unicode: 0x2) was found in the element content of the document."

What could this possibly mean?

What is happening here is that there is a hidden, unsupported character lurking somewhere in the metadata of your deposit, and causing the entire deposit to fail.

This often happens when text is copied and pasted from another application (e.g., a rich-text editor, like Microsoft Word). You cannot see the unsupported character by looking at the deposit, but if you copy and paste the metadata into a plain text editor, you will see extra symbols appear.

Since these characters are hidden from sight, it may be difficult to determine where in your metadata they have sneaked in. We find that the majority of the time, they tend to be found in the abstracts, so this is a good place to start.

On the prowl

So how do you go about locating these sneaky characters if they’re hidden in plain sight?

We recommend that you copy and paste your abstract(s) into a plain text editor, and you will see extra symbols appear. For example, I use an application called BBEdit (it is free). You can also use an online de-coder, such as https://meyerweb.com/eric/tools/dencoder/.

There are many options for either a plain text editor or other browser-based de-coder/encoders, but these are just two examples that we use that may help.

To see what this looks like, let’s look at the below example. It looks like just text.

posttuberculosis changes

But if you copy that into a plain text editor you’ll see this:

Screenshot 2024-05-01 at 3.39.19 PM

or this:

Screenshot 2024-05-01 at 3.39.51 PM

To fix this, you would simply need to delete the hidden character and manually add a space or other appropriate character. We find that unsupported characters sometimes insert themselves through hyphens or dashes, so keep a lookout for those as well.

How to avoid these hidden characters?

In order to avoid unintentionally letting these unsupported characters sneak into your metadata, we recommend copying and pasting large pieces of text using a plain text editor, and not copying directly from rich-text applications.

Hopefully this has shed some light on a shady subject, and you’ll have a better idea of what to do next time you see one of these mystery error messages.

4 Likes

I actually deal with these ‘shady characters’ on an almost daily basis in my routine editing work. The most frequent cause, if anyone is interested, is unsupported fonts – in my case, it sometimes get it from certain types of punctuation or symbols in East Asian fonts.

In the case of the issue with “posttuberculosis changes”, I suspect that it probably had to do with an unsupported hyphen that was deleted out of the doucment, but which somehow still remained within the file.

If you have an MS Word document for the manuscript, you can clear up quite a lot using a plain text paste. Copy the text that you want to clean up, then strip out the formatting with a plain text paste. This can be done by either by right clicking and selecting the relevant option, or pressing Ctrl + Shift + V (Cmd + Shift + V on Mac).

Be warned: If the author has entered symbols using the Symbol font rather than the Symbol browser, the issues can be difficult to spot, as symbols are often transformed to their corresponding alphabet letters.

1 Like