How does Elements match publication records to each other?
As new publication records are harvested, Elements checks whether they match any publication records already in the system. Certain bibliographic data fields are processed and stored as 'match attributes'. Elements' matching algorithm uses three types of match attributes: ACCEPT, REJECT and FUZZY REJECT.
ACCEPT attributes are:
DOI
Publication ID (e.g. a PubMed ID, WoS publication ID, Scopus article ID, etc. )
REJECT attributes are:
DOI
Pagination
Volume and issue (both volume and issue must be set)
FUZZY REJECT attribute is:
Title
The algorithm performs these steps when matching publication records:
If any of the ACCEPT attributes match, immediately match the records and finish.
If any of the REJECT attributes that have a value on both records don't match (i.e. the values are present but different), the records don't match and finish.
If FUZZY REJECT attributes that have a value on both records are above a reject threshold, the records don't match and finish.
The records are considered a match.
Note: We don't match fields like Authors and Publication Type. This is mostly because those fields are not reliable enough to be used for matching (e.g. not all data sources provide us with a publication type). Additionally, the DOI for chapters is not an Accept attribute, but a Reject attribute. This is to prevent distinct chapters from the same book merging.
Currently, the algorithm doesn't look for a 'best' match. As soon as it finds a record that does match, it stops and merges the records.
