For instance, this DOI 10.1386/dtr_00025_1 has a relevance score of 0.81349975 and the affiliation metadata registered for all of the contributors of that DOI is: 0000000092155771Lesley University. Thus, I would suggest eliminating this as a viable result (for this specific query), and I would ignore anything with a score below that as well. Note: this is simply an arbitrary example; I do not mean to suggest that this score is the threshold for all queries (or, even this one - I assume a higher relevance score might be a better fit for this query, but I’ll defer to you).
Please let me know if you have any additional questions.
Yes, I do understand this now. And a long overdue thanks to Isaac @ifarley.
But, sometimes,selecting an appropriate threshold value (the same query on a given country, for example, more so with country names having space in between) are essentially research works.
Regards
contributors’ first and last names, or name of a contributing organization
grant numbers
funder names
For “query.bibliographic”, the following fields are searched:
publication and print publication year
issue
volume
first and last page
ISSN
ISBN
title
container (journal, conference, etc.) title
contributors’ last names and initials, or name of a contributing organization
Unfortunately, we doubt this will help with the threshold, and the most important reason is: in search engines, such as ElasticSearch, scoring is not designed to be meaningful across different queries, i.e. the score is not some sort of objective global measure of similarity. The number is not scaled to any known range, and it will depend a lot on the query itself. Scores are only supposed to allow us to compare the similarity of different indexed documents with the same query , and so it only enables us to sort the results for a given query. Our best advice for finding such a threshold is: 1) try normalizing the score by the query length (i.e., just divide the score by the number of words in the query (possibly excluding stopwords), to get a score that is a bit more comparable between queries), and 2) find the best threshold from experimenting on a real representative dataset.
The second one has double scores. This is what I meant: the scores make sense when you compare them in the context of a given query, much less sense when you compare the scores between queries.