A researcher working alone – apart from the world and the rest of the wider scientific community – is a classic but flawed image. Research is, in reality, built on continuous exchange within the scientific community: First you understand the work of others, and then you share your findings.
Reading and writing articles published in academic journals and presented at conferences is a central part of being a researcher. When researchers write a scientific article, they should cite the work of colleagues to provide context, detail sources of inspiration, and explain differences in approach and results. A positive citation by other researchers is a key measure of visibility for the researcher’s own work.
But what happens when this citation system is manipulated? A recent Society for Information Science and Technology article by our team of academic scientists – which includes information scientists, a computer scientist and a mathematician – has revealed a sneaky method to artificially inflate citation counts through metadata manipulation : hidden references.
Hidden manipulation
People are becoming more aware of scientific publications and how they work, including their potential flaws. Last year alone more than 10,000 scientific articles were retracted. The issues surrounding the citation game and the damage it causes to the scientific community, including damage to its credibility, are well documented.
Citations of scientific work adhere to a standardized referencing system: Each reference explicitly mentions at least the title, authors’ names, year of publication, name of the journal or conference, and page numbers of the cited publication. These details are stored as metadata, not visible in the text of the article directly, but assigned a digital object identifier, or DOI – a unique identifier for each scientific publication.
References in a scientific publication allow authors to justify methodological choices or present the results of past studies, emphasizing the iterative and collaborative nature of science.
However, through a chance encounter we discovered that some unscrupulous actors have added additional references, invisible in the text but present in the metadata of the articles, when they submitted the articles to the scientific databases. Outcome? The number of citations for some researchers or journals has skyrocketed, even though these references are not cited by the authors in their articles.
Incidental discovery
The investigation began when Guillaume Cabanac, a professor at the University of Toulouse, wrote a post on PubPeer, a website dedicated to post-publication peer review in which scientists discuss and analyze publications. In the post, he detailed how he had noticed a discrepancy: a Hindawi magazine article that he suspected was fraudulent because it contained objectionable phrases had far more citations than downloads, which is highly unusual.
The post caught the attention of several sleuths who are now the authors of the JASIST article. We used a scientific search engine to search for articles that cited the original article. Google Scholar found none, but Crossref and Dimensions found references. The change? Google Scholar is likely to rely primarily on the main text of the article to extract the references that appear in the bibliography section, while Crossref and Dimensions use metadata provided by publishers.
A new type of fraud
To understand the extent of the manipulation, we examined three scientific journals that were published by the Technoscience Academy, the publisher responsible for the articles that contained questionable citations.
Our investigation consisted of three steps:
-
We listed references explicitly present in the HTML or PDF versions of an article.
-
We compared these lists with the metadata recorded by Crossref, revealing additional references added to the metadata but not appearing in the articles.
-
We checked Dimensions, a bibliometric platform that uses Crossref as a metadata source, finding further inconsistencies.
In the journals published by the Academy of Technoscience, at least 9% of the recorded references were “cryptic references”. These additional references were only in the metadata, skewing the citation count and giving certain authors an unfair advantage. Some legitimate references were also missed, meaning they were not present in the metadata.
Moreover, when we analyzed the cryptic references, we found that they were greatly benefited by some researchers. For example, a single researcher affiliated with the Technoscience Academy benefited from more than 3,000 additional illegal citations. Several journals from the same publisher benefited from several hundred cryptic citations.
We wanted our results to be externally validated, so we posted our study as a preprint, informed both Crossref and Dimensions of our findings, and provided them with a link to the preprint investigation. Dimensions acknowledged the illegal citations and confirmed that their database reflects Crossref’s data. Crossref also confirmed the additional references in Retraction Watch and noted that this was the first time such a problem had been reported in its database. The publisher, based on Crossref’s investigation, has taken steps to fix the problem.
Implications and possible solutions
Why is this discovery important? The number of citations greatly influences research funding, academic promotions, and institutional rankings. Citation manipulation can lead to unfair decisions based on false data. More worryingly, this finding raises questions about the integrity of scientific impact measurement systems, a concern that has been highlighted by researchers for years. These systems can be manipulated to foster unhealthy competition among researchers, tempting them to take shortcuts to publish faster or achieve more citations.
To combat this practice we suggest some measures:
-
Rigorous metadata verification by publishers and agencies like Crossref.
-
Independent audits to ensure data reliability.
-
Increasing transparency in the management of references and citations.
This study is the first, to our knowledge, to report a metadata manipulation. He also discusses the impact this may have on the evaluation of researchers. The study underscores, again, that overreliance on metrics to evaluate researchers, their work, and their impact can be inherently flawed and misguided.
Such overreliance is likely to encourage questionable research practices, including hypothesizing after results are known, or HARKing; splitting a single set of data into several cards, known as sausage slicing; data manipulation; and plagiarism. It also hinders the transparency that is essential for more robust and efficient research. Although the problematic citation metadata and obscure references have now apparently been fixed, the corrections, as is often the case with scholarly corrections, may have come too late.
This article is published in collaboration with Binaire, a blog for understanding digital issues.
This article was originally published in French.