• Scalene
  • Posts
  • Scalene 45: CoCoNUTS / Laziest / Inflection

Scalene 45: CoCoNUTS / Laziest / Inflection

Humans | AI | Peer review. The triangle is changing.

How are we all doing out there? Survived the onslaught of webinars, events, and blog posts that constituted Peer Review Week? I do hope so, but it is hard to write something new and interesting at the end of PRW. Here are my nuggets from the last week or so, and a few things I’ve come to accept: 1) AI-mediated evaluation of academic research will not be appropriate in all subject areas (see story 4 below), 2) AI-mediated evaluation of research is not peer review - and I’m fine with not calling it peer review at all, 3) If I had the option, quick and thorough human review is still my preferred way of evaluating academic research - but it simply isn’t an option for the vast volume of manuscripts we produce - hence the work we describe in this newsletter.

21st September 2025

1//
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection

arXiv.org - 28 August 2025

While LLMs offer valuable assistance for reviewers with language refine- ment, there is growing concern over their use to generate substantive review content. Existing general AI-generated text detectors are vulnerable to paraphrasing attacks and strug- gle to distinguish between surface language refinement and substantial content generation, suggesting that they primarily rely on stylistic cues. When applied to peer review, this limitation can result in unfairly suspecting reviews with permissible AI-assisted language enhancement, while failing to catch deceptively humanized AI-generated reviews. To address this, we propose a paradigm shift from style-based to content-based detection. Specifically, we introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews, covering six distinct modes of human-AI collaboration. Furthermore, we develop CoCoDet, an AI review detector via a multi-task learning framework, designed to achieve more accurate and robust detection of AI involvement in review content.

2//
AI and the Future of Academic Peer Review

arXiv.org - 17 Sept 2025

Peer review remains the central quality-control mechanism of science, yet its ability to fulfill this role is increasingly strained. Empirical studies document serious shortcomings: long publication delays, escalating reviewer burden concentrated on a small minority of scholars, inconsistent quality and low inter-reviewer agreement, and systematic biases by gender, language, and institutional prestige. Decades of human-centered reforms have yielded only marginal improvements. Meanwhile, artificial intelligence, especially large language models (LLMs), is being piloted across the peer-review pipeline by journals, funders, and individual reviewers. Early studies suggest that AI assistance can produce reviews comparable in quality to humans, accelerate reviewer selection and feedback, and reduce certain biases, but also raise distinctive concerns about hallucination, confidentiality, gaming, novelty recognition, and loss of trust. In this paper, we map the aims and persistent failure modes of peer review to specific LLM applications and systematically analyze the objections they raise alongside safeguards that could make their use acceptable. Drawing on emerging evidence, we show that targeted, supervised LLM assistance can plausibly improve error detection, timeliness, and reviewer workload without displacing human judgment. We highlight advanced architectures, including fine-tuned, retrieval-augmented, and multi-agent systems, that may enable more reliable, auditable, and interdisciplinary review. We argue that ethical and practical considerations are not peripheral but constitutive: the legitimacy of AI-assisted peer review depends on governance choices as much as technical capacity. The path forward is neither uncritical adoption nor reflexive rejection, but carefully scoped pilots with explicit evaluation metrics, transparency, and accountability.

3//
The world’s laziest peer reviewer

Reese Richardson - 15 Sept 2025

Reese Richardson looked into peer reviews after finding a shallow review in BMC Cancer. He found many nearly identical reviews that pushed citations to Alessandro Rizzo and his colleagues. Some reviews were signed by Rizzo or his coauthors, and others were anonymous but used the same wording. Strong evidence of a review mill, probably aided and abetted by superficial AI reviews.

4//
Artificial-intelligence-based peer reviewing: opportunity or threat?

The Lancet Global Health - March 2025

One way to avoid stories every has heard during Peer Review Week is to go back to March and this editorial highlighted by Zoe Mullan on LinkedIn. It helped reframe some of my more gung-ho attitudes to AI in peer review:

The peer review process is time-consuming, delaying publication and increasing the workload of already overburdened researchers as both reviewers and authors receiving these reviews. But it plays a crucial role in maintaining the integrity of scientific investigation, and it is an opportunity to communicate with the research community. A crucial element of peer review that cannot be replicated by AI is human perspective. Lived experiences of disease and—particularly relevant to work we publish in The Lancet Global Health—the contextual knowledge held only by those with lived experience in the location of research are unlikely to be present in LLM-generated peer review reports.

5//
The Coming Inflection Point: AI as Science’s Gatekeeper

Silverchair - 15 Sept 2025

A pragmatic and forward-looking analysis of how AI is likely to be incorporated into editorial and review workflows. Stuart Leitch makes the excellent point that, in coming years, AI risks being the de facto gatekeeper of what counts as science.
We can’t control the opaque base models, but we can control prompts, workflows, context, and validation to build as transparent a system as we can to protect scientific integrity.

And finally…

I thoroughly enjoyed this Sci-Train webinar looking at four leading AI peer review assistance tools from KnowDyn, Reviewer3, Paper-Wizard, and World Brain Scholar: Which AI Peer Review Tool Is Best For You?

Some self promotion to attend to as well:

And finally an erratum of sorts. I missed crediting the authors from a presentation I highlighted in last week’s newsletter. This is why you should never write something when severely jet-lagged, but apologies to Dr Leslie McIntosh, VP of Research Integrity at Digital Science and her co-authors Hélène Draux, Elizabeth Smee, and Cynthia Hudson Vitale. This link gives a fuller sense of the problem in the abstract:
https://peerreviewcongress.org/abstract/how-a-questionable-research-network-manipulated-scholarly-publishing/

One year ago: Scalene 14, 22 Sept 2024

Let’s chat

I’ll be at the STM Frankfurt and Frankfurt Book Fair in mid-October. Wanna meet?

Curated by me, Chris Leonard.
If you want to get in touch, please simply reply to this email.