The elephant in the room and issues in scientific publishing.
The elephant in the room and issues in scientific publishing.
Recently, I engaged with a LinkedIn post from Dr. Robert Biswas-Diener on the topic of publication quantity, citation counts and the glorious H-Index.
In my brief response to the post, I questioned whether the quantitative metrics of publications and citations counts are increasingly losing their significance in quantifying a researcher’s “impact” and contribute to issues in scientific publishing. Another casualty of Goodhart’s Law.
Dr. Biswas-Diener replied, “The job of an academic is to publish research. Just because the number of publications is a metric of success does not mean it loses its value. Do you see some specific shortcomings occurring in science based on these metrics?” At first, I thought I would respond to the post on how the metrics are contributing to shortcomings in science but then I realized the nuance would be lost in a 1250-character limited LinkedIn post. I found myself pondering this deeper and decided to explore these thoughts more thoroughly during this loooong UA838 flight. Especially given my ideological bias that science would benefit from slowing down (Frith, 2019)
The issues I would like to unpack here are that a) the number of publications is adequate measures of “productivity” and b) whether citation counts should be taken as a direct measure of “impact” on a field.
I do this by evaluating how these metrics may function and the bad practices that recur. Given the precedence of publications and citations serving as the metric for “scientific value” over the last few decades, there is reason to suspect that these metrics are depreciating. I will not elaborate on the H-index given that it’s a rank-ordered quantitative metric that summarizes publications and citations. Specifically, h-index = # of publications with a citation number ≥ h. For example, if an author has an h-index of 11, this means that their top 11 cited papers have ≥ 11 citations.
An important caveat: I don’t want this post to give off the impression that I don’t believe in the dissemination of research. Given the topic that will be discussed, the post will by default have a negative perspective of some practices but this does not mean I am sour on the entire enterprise and many researchers involved. Whether dissemination occurs via Talks, Presentations, Books and Publications, each venue serves its purpose.
I don’t believe that the job of an academic is to publish research articles in journals. Rather, I believe the job of an academic is to deeply think about issues (which scientists need to do more) of complex systems, society and medicine, and conduct research to make progress in these areas. The publication is a by-product that the sciences have settled on to disseminate the work and use in hiring processes. In the same way that the job of an undergraduate student isn’t to get a high grade-point average (GPA) but to learn and understand the material and apply it to the real-world. The GPA happens to have become the standard to measure ‘success’ in undergrad. Would you prefer a student that performed well by cramming and achieving a 3.89 GPA over the student who thoughtfully engaged with the content and received a 3.1 GPA? The more that the output of publications and the GPAs have become fundamental to the academic’s and undergrad’s job, the more that these metrics get impacted by Goodhart’s and Campbell’s Laws.
Full disclosure: I am not a prominent researcher with a lot of citations, articles or a high h-index. Even my parents don’t know what I do. So maybe I am just a… hater? Furthermore, much of the below has been covered, in one way or another, more eloquently by others in other outlets.
“People with targets and jobs dependent upon meeting them will probably meet the targets – even if they have to destroy the enterprise to do it.” – William Edwards DemingData from Desai et al. (2018) and Hanson et al. (2024) show a significant rise in the number of papers published each year. This surge is often accompanied by an increase in “Special Issues” from journals like MDPI and Frontiers, which some critics argue are predatory journals (Wikipedia Definition).
“I spent 13 years at NIMH really pushing on the neuroscience and genetics of mental disorders, and when I look back on that, I realize that while I think I succeeded at getting lots of really cool papers published by cool scientists at fairly large costs—I think $20 billion—I don’t think we moved the needle in reducing suicide, reducing hospitalizations, or improving recovery for the tens of millions of people who have mental illness. I hold myself accountable for that.”This statement underscores a significant issue in research and scientific publishing. While the quote is specific to mental health, replication and reproducibility issues exist in psychology (The Atlantic), management research (Hensel, 2021), biomedicine (News-Medical), economics (The Conversation), cancer research (Mullard, 2021), and other fields. Despite thousands of articles being published and billions of dollars spent, more often than not, the impact on society remains marginal. Sure, sometimes we get fun, absurd headlines such as “Is Smelling Farts Healthy? Research Says Maybe” (Healthline) or “‘Hot’ academic study tests Cardi B’s claim that ‘a hoe never gets cold’” (NY Post). But reading the related papers tells you either a) no, the study didn’t say that or b) the study is flawed in many ways and makes unfounded connections and/or claims. In the most controversial cases, the issues can even be extremely costly, such as falsified work related to key theories in Alzheimer’s research (Piller, 2022) or continued detriment from falsified vaccines and autism work (Thou that shalt not be named in The Lancet). When findings don’t replicate or reproduce, there are various items that authors will point to.
One presumes that the more publications that a researcher has, then they must be more prominent. Is that an accurate representation of this quantitative metric?
As illustrated in the figure above, there are over 100,000 articles published on PubMed every year (Desai et al., 2018). This surge is likely due to increased access to data and a growing number of PhD graduates. According to two separate reports from the American Psychological Association (2016 and 2017), there has been a 40% increase in the number of PhDs awarded between 2004 (n = 4933) and 2017 (n = 6915). This influx of graduates means more papers, more citations, and more competition for awards, grants, and jobs. With fewer faculty positions available, the academic job market has become increasingly competitive.
This high demand and low supply in the job market places immense pressure on trainees to meet metrics that are perceived to enhance future academic success. The demand for receiving grants and publishing in top journals further complicates this, encouraging bad practices to meet the goals and expectations of the field.
The systemic pressure to produce “clean” results contributes to a climate where overly manicured work fulfills self-fulfilling narratives— “of course the finding makes sense, because…“. Some argue that publications should move away from telling perfect, clean stories. In 1990, Paul Meehl wrote, “I daresay a group of faculty or graduate students could come up with a dozen plausible alternatives to the theory of interest if allowed a morning’s conversation over coffee and Danish”. I daresay, this remains true in 2024. Scientific publications are the currency in academic research, representing years of hard work. As the thresholds for awards, grants and positions continue to rise, so does the pressure to publish. Psychology, in particular, has a long history of reporting positive results and confirming theories that researchers believe to be true (Haeffel, 2022).
In the last decade, new approaches to publishing have emerged. Traditionally, a researcher proposes a question, methods and a sample, acquires data, plans and runs analyses, reports results and forms an introduction/conclusion around those components. This report is then submitted to a journal for review before being accepted, revisions are requested or a rejection is issued. More recently, researchers have started using registered reports (Chambers et al., 2014). In a registered report, the introduction, methods and analyses (full plan) are reviewed and “in-principle” accepted by a journal before data are acquired. Comparing traditional publications with registered reports, Scheel et al. (2021) observed that traditional reports support their research questions 96% of the time, compared to only 43% for registered reports. While registered reports and pre-registrations (which is effectively a registered report that isn’t reviewed/accepted) alone don’t guarantee quality (Lakens et al., 2024), the discrepancy between significant results in traditional and registered reports highlights a meaningfully problem in published research: Researchers are reporting more positive results than is reflective of what actually happened, often leading to overly tidy findings and discussions.
So, what may be happening? Is it just a coincidence that traditional reports support their hypotheses more often than registered reports? Probably not…
Some argue that researchers tend to engage in questionable research practices (QRPs), such as p-hacking, when using traditional reports. Stean & Schömnrodt (2023) reviewed a range of poor practices researchers might use. In fact, some surveyed researchers have admitted to engaging in QRPs (John et al., 2012). This is often related to the flexibility in the workflow that traditional reports allow (Simmons et al., 2011). Sometimes these may not initially feel malicious but they do impact the results. In registered reports, you already have a plan so any deviations have to be justified otherwise the paper may be rejected. Given that researchers are rewarded for novel findings, clean stories and more publications, there is added opportunity and flexibility. When you combine these incentives with the internal biases we all hold and the drive from tacit competition, it leads to a cyclical cycle of suboptimal practices and goals. Thus, more publications may reflect creativity in getting published rather than being productive.
Some may ask, “Well, people can’t be p-hacking that much, can they?” This is a fair question. Others may claim, “It probably happens at bad labs and low-quality programs, not the good ones.” To that I’d respond, that couldn’t be further from the truth. Having worked as a volunteer, research assistant, research coordinator, PhD student, postdoc and consultant on projects in research for almost 15 years, I have personally observed and heard of individuals engaging in all sorts of research practices to either get published or submit to a top journal.
I’ve witnessed and heard of people taking questionable steps to publish. One well-respected researcher encouraged their trainees during a meeting to dredge their dataset to find a significant result. In another example, a researcher converted a p-value of p <.05 to p <.001. In other cases, trainees focused solely on getting into top programs or journals for the prestige, rather than selecting the right outlet for science. In various settings students have reported in private how faculty take a flexible approach to re-check analyses until they get a result, rerun things until a result is significant and/or prioritize the prestige of journals rather than the quality of the work. On Twitter/X, I observed PhD students openly celebrate that they achieved a target during their PhD program of 20 or 40 publications by the end of their PhD. While publishing is important to a researcher’s success, setting quantity as a target, rather than quality, is troubling. Yet, as is common on social media, this success in hitting an arbitrary whole-number target was applauded. While this has been my experience, I’d hypothesize others have experienced similar or worse.
So, are publications really a sign of productivity, impact or value?
I’d opine that using ‘number of publications’ as a metric is losing its muster. These days, it is not uncommon to see PhD students graduating with 35+ publications and/or tenure faculty applications with over 80+ publications. One junior scholar published over 30 papers before starting a PhD – all with their parent. In 2023, José Lorenzo was reported to have published a research paper EVERY TWO DAYS . Based on his google scholar, he has published over 2000 items and has over 53000 citations. He must then be prolific and have won at science, right?
There are countless other controversies in the world of scientific publishing. There are issues in data fabrications for a major hypothesis in Alzheimer’s research, data fabrication accusations relating to Amy Cuddy’s “Power Posing” work, data fabrications by prominent Psychologists and Behavioral economist Dan Ariely on ‘honesty research’, accusations of fraud by Harvard’s Khalid Shah for data fabrication and image manipulation and Francesca Gino for data misconduct and manipulation. Marc Tessier-Lavigne, the former president of Stanford, even stepped down after accusations relating to publications with falsified information/images. This is probably only the tip of the iceberg – there are likely issues that some are aware of but they have not reported.
I would argue that something is amiss… it’s becoming increasingly easy to run analyses that report significant results and use the hundreds of thousands of published works to craft a narrative around that finding. These stories will make for engaging news headlines, but using the “number of publications” will continue to be diluted as an indicator of productivity/impact.
The idea is that each citation a paper receives reflects some level of impact directly or indirectly on the field. But how accurate is that?
There are common practices that, while often used behind closed doors, are frowned upon in public-facing venues. Some researchers tend to favor their own or their colleagues’ measures and theories, resulting in increased citations within more connected labs and prominent schools. This phenomenon is sometimes referred to as the “Toothbrush Problem” (Mischel, 2008). During the journal article review process, researchers might recommend a colleague’s paper or even their own. In other scenarios, scientists take advantage of social and news marketing campaigns. For example, researchers use social media platforms like X (formerly Twitter), LinkedIn, BlueSky and Mastodon to disseminate their work to followers, peers and colleagues. Using social media is not a bad thing, it depends on how it is used. Some groups and individuals have mastered the art of marketing their work across platforms. They may combine platforms such as WhatsApp, Slack and email communication to increase the number of colleagues willing to boost their engagement on social media. Consequently, better-connected individuals and labs often gain greater exposure for their work. (Note, I am privileged to be in a prominent lab and reasonably sized group so I have that unfair advantage, too.) Others leverage their institution’s communication departments to disseminate their work through public platforms like news or blog outlets. These are just a few ways to boost engagement with your work, encouraging other researchers to think of it when writing their papers, conducting research or recommending something to a colleague/trainee, rather than citing someone else’s work that may better support their point but not as well connected. Which, in my opinion, is unfair to some stellar scientists.
There are also more sinister practices used to increase citation counts. In August 2024, a report was published investigating the citation black market. Here, researchers pay for citations, artificially driving up their ‘quantitative impact’. Some experts have been accused of demanding citations. For example, Juan Corchado, the president of the University of Salamanca in Spain, organized a citation group to become one of the “best” scientists from Spain in his field. Editors may also engage in self-citation biases, such as the case of an editor at Psychological Science. Receiving greater citations partly due to an editorial role at a flagship journal.
This reflects a range of practices, some more sinister than others, that researchers may use to boost their citation counts. The number of citations give off the illusion of ‘impact’ when sometimes it may just be the result of connectedness, prestige, strategy and unethical practices.
Should we just burn it all down? No.
Should we stop publishing? No.
I still trust and believe in the scientific process. Science is a human endeavor that makes it messy and there will be issues in publishing. The way forward is to try and do better science, and recognize problems and request corrections along the way (like many do). And, for the love of god, stop falling victim to the reward structure and engaging in bean counting. At the end of the day, we are funded by taxpayer dollars and people are counting on us to do good science (uhm, notice I didn’t say perfect). Whether that is via one publication every 4 years, 2 years, 1 year or 6 months, that should suffice as long as we’re moving the needles on the issue and not spinning in a circle. Don’t push a pause on your integrity just for another publication.
I believe a lot of great initiatives exist like registered reports, estimate errors and reproducibility, Retraction Watch, PubPeer and more. More and more will be developed. There is a lot of great research and scientists out there, just don’t judge them by publication and citation counts. Instead, read their stuff and be inspired.
I try to be inspired by great work and not worry about the fluff like winning cool awards, getting papers into Nature or having highly cited work. Maybe I’m a subpar and boring scientist. I don’t know. You be the judge. When I’m dead, no one will care about my H-index. Partly because it’s reeeeally low but mostly because it won’t matter.
With all this being said. This is my personal opinion reflecting my journey. I would love to hear your thoughts.
PhDs wear many hats.
The following project is about video games, but this workflow can also power many real world applications. You can use this same method to digitize multi-page reports, consolidate photographed receipts, or reconstruct contracts from a series of images. Basically, it’s a technique to convert fragmented sets of images into a single, usable piece of text.
The modern solution to SEO metadata maintenance.
Using AI to refine structured descriptions of products in Shopify to maximize the customer experience.
Using AI to refine structured descriptions of products to maximize the customer experience.
The hype doesn’t live up to the results.