Received vs publication date in full public file download

Hi!

I downloaded the full public metadata torrent and I wanted to look at the time between DOIs’ received date and publication date. However, none of the data files had a “received date” column.

I know that not every publisher populates that field, but some definitely do, so how come it isn’t in the dataset? Is it referred to as something else?

Thank you! :sunflower:

Hello, and thanks for your post.

We don’t collect a received data in our core bibliographic metadata.

It can be optionally included within the metadata assocaited with our Crossmark service, and tagged as ‘publication-history’.

For example, 10.5555/12345678 is a test DOI we use for illustration purposes, and it has the following publication history metadata included in its record:

{
"value": "2012-07-24",
"order": 0,
"name": "received",
"label": "Received",
"group": {
"name": "publication_history",
"label": "Publication History"
}
},
{
"value": "2012-08-29",
"order": 1,
"name": "accepted",
"label": "Accepted",
"group": {
"name": "publication_history",
"label": "Publication History"
}
},
{
"value": "2012-09-26",
"order": 2,
"name": "published_online",
"label": "Published Online",
"group": {
"name": "publication_history",
"label": "Publication History"
}
},
{
"value": "2012-10-27",
"order": 3,
"name": "published_print",
"label": "Published Print",
"group": {
"name": "publication_history",
"label": "Publication History"
}

All of this is optional, though. Only around 15% of registered items have any Crossmark metadata at all, and most of those do not use the publication history assertions.

Cool! Thank you. What metadata retrieval method is best to use to get this Crossmark data?

I understand it’s probably incomplete but I’m keen to get whatever I can.

Where the publishers have provided it, it is available in the public data file records.

There’s not a structured taxonomy or vocabulary - Crossmark assertions are very open ended - but the majority of them are tagged as “publication_history”, “ArticleHistory”, or “ChapterHistory”. You can see the full list of things that publishes have called their assertions here:
https://0-api-crossref-org.library.alliant.edu/works?facet=assertion-group:*&rows=0

So, that should give you a starting point of what to look for.

1 Like

Great, thank you. I have a related question: Can I query the API with multiple DOIs at a time?

Not by individual DOI, but you can do that by other categories/groupings.

e.g. you could request all DOIs for a given journal by filtering in the /works route by that journal’s ISSN.

you can find full details of all the filters available in the documentation at api.crossref.org

Hmm, it doesn’t really work, because I don’t know what journals will feature the Crossmark metadata I want.

It might be useful if I explain what I am trying to do:

  1. I used the public metadata file to get a list of all DOIs published after 2020
  2. I am calling the API for each DOI and checking if they have Crossmark data and, if so, a received date
  3. If they do, I’m saving that plus the earliest publication date
  4. I’m getting the time difference between the publication date and received date, and then getting the median of those numbers in order to get the average delay between receipt and publication.

Can you suggest any way of making this faster than querying one by one?

Thank you :slight_smile:

Hello,

I ran your question by some of my colleagues who are more adept with large scale API queries and data analysis. They suggested that your process could be more efficient if you just omitted step 2 and obtained the Crossmark metadata from within the public data file directly.

Best,
Shayn

1 Like

Hi,

Thanks so much for your reply! Sadly,
the public metadata file doesn’t contain received date, so I can’t use it to get what I need. I’ve learnt how to speed up the API calls by writing better code, so I’m making good progress with this project.

I’m glad you found a process that’s working for you!

Just for the sake of others who may came across this forum post, I double-checked with a couple of my colleagues, and can confirm that the Crossmark metadata in general, and the publication history assertions in particular, are present in the public data file.

Any record that had a publication history > received date in the API as of March 31st 2023 will also have that publication history > received date in the public data file.

You should definitely keep doing what’s working for you. But, I just wanted to clarify, so there would be accurate information up here, since it’s a public forum.

Best,
Shayn

1 Like