What is the key factor of the query performance?

When I do a query by title, sometimes it is very slow to get the response, even with the ‘mailto’ parameters.

For example (I hide my email here):

http://0-api-crossref-org.library.alliant.edu/works?query.bibliographic=%22Faster%20R-CNN:%20Towards%20Real-Time%20Object%20Detection%20with%20Region%20Proposal%20Networks%22&mailto=xx@xx.xx

Usually it costs 3-5s to get the response. But sometimes it will cost about 8-10s.

Could you please explain which part during this query takes the most time? Is it the network? QOS? or the database inner query time? If it is the qos, could you please share me the estimate query time for a Crossref Plus tier?

Best wishes.

Hi @geoffreychen777,

Thanks for your message! What kind of query were you doing that returned results in 3-5s? Were there fewer terms in your query.bibliographic parameter? The more terms included in the query.bibliographic parameter, the greater the likelihood that your query will take more time for us to return results.

This query: http://0-api-crossref-org.library.alliant.edu/works?query.bibliographic="Faster%20R-CNN:%20Towards%20Real-Time%20Object%20Detection%20with%20Region%20Proposal%20Networks"&mailto=xx@xx.xx is querying for each of these words/terms:

Faster, R-CNN:, Towards, Real-Time, Object, Detection, Region, Proposal, and Networks

So, the API is searching the bibliographic metadata for each of the 138 million plus records in our corpus for each of those words and then returning results ordered by the metadata that matches each and all of those search terms most closely (i.e., it’s an expensive query).

That said, we have had some instances of slower response times in the Polite pool this week:

so, depending on the timing of this query, the slower response time might also be related to overall traffic on the pool.

If you can contact me at support@crossref.org, we can provide you with some additional details about the Metadata Plus service (and, thus, the Plus pool of the REST API).

My best,
Isaac

Thank you for your answer, I’ve contacted you through the email. Looking forward to your replay!

Best,

A note to others reading along:

There’s a couple of different options that might be more efficient than the REST API query we discussed above, but it depends on what you need from us (and/or what you are querying against):

  1. you could limit number of rows you request of us, like this: http://api.crossref.org/works?query.bibliographic=“Faster%20R-CNN:%20Towards%20Real-Time%20Object%20Detection%20with%20Region%20Proposal%20Networks”&rows=5&mailto=xx@xx.xx ; this result is much faster and only providing the top five results
  2. you could also just take the search directly to search.crossref.org, like this: https://0-search-crossref-org.library.alliant.edu/?q=Faster+R-CNN%3A+Towards+Real-Time+Object+Detection+Region+Proposal+and+Networks&from_ui=yes
  3. Lastly, our Simple Text Query form does a good job of matching DOIs if a fuller citation is provided; you can try it yourself here: Simple Text Query

I’m searching for: Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence 39, no. 6 (June 1, 2017): 1137–49.

And, getting these results:

Hi, Thanks for your reply.

What I’m trying to do is very simple:

I’m developing an APP that can be used to retrieve metadata for some paper PDF files. But I do not have the full reference text and doi of those papers. So I can only parse the PDF files to get the titles, DOIs of them. For a paper, if there is a DOI in the PDF, it is easy to get its metadata. But some papers have no DOI in their PDF files. So the only option is to do a query like this:

http://0-api-crossref-org.library.alliant.edu/works?query.bibliographic=“TITLE_OF_PAPER”&rows=5&mailto=xx@xx.xx

I tired to limit the rows. But sometimes I still need to wait for even 16s to get the query response.

Following your suggesstion, I tried the Simple Text Query. I found that I need to provide at least: the first author, title, and year, to get the DOI response. This doesn’t seem feasible in my scenario. And according to the API document, Simple Text Query is for human, not machine. I’m not sure I can do this Simple Text Query programmably.

Seems that there is no more efficient solution to my query. Anyway thanks for your suggestions.

1 Like

Hi @geoffreychen777 ,

I talked with Patrick Polischuk, the product manager responsible for metadata retrieval, and he reminded me that we do have some feature development on the roadmap that should fit this use case a lot better: [CR-555] - Jira. The work is just in the research and planning phase, as you can see from our public roadmap, but we’ll get there in the longer term.

-Isaac

Thanks for your info.

Looking forward to this feature!

1 Like