- Scalene
- Posts
- Scalene 50: NEJM AI / Rigorous / UKRI
Scalene 50: NEJM AI / Rigorous / UKRI

Humans | AI | Peer review. The triangle is changing.
Happy 50th to us! I never thought when I started this newsletter that it would become what it has (whatever that is) and that it would allow me to share my (largely optimistic) opinion that AI-assisted reviews are going to be better, quicker, and more constructive than the average human review. After 18 months, I think we may be getting closer to accepting that human-only review can’t scale forever, and no one should have to wait 5 months for an editorial decisions. Some of the stories below highlight that, and also the potential downsides. AI conferences in particular get a rough ride this week.
8th December 2025
1//
AI reviewers are here - we are not ready
Nature - 03 Dec 2025 - 4 min read
Giorgio Gilestro shares a viewpoint article on the status of AI in peer review. He argues that peer review serves two purposes: one of which AI can help with, one where it can’t. It can help with the validation of the routine majority of scientific work, but cannot recognise genre-changing work which make us question the principles on which a field is based.
However, even that first role is questionable. Regression to the mean and the lack of human ‘noise’ means we get very well-written but ‘average’ reviews that possibly miss some newer developments and are susceptible to hyped-up claims in the existing literature.
2//
RIGOUROUS
Website - 08 Dec 2025 - 2 min read
2025 has definitely been the year of AI & agentic peer review startups and initiatives. Here’s another one from Robert Jakob and Kevin O’Sullivan of ETH Zurich with some very researcher-centric aims:
During our doctoral studies, we became deeply frustrated by how the formal peer review process at conferences and journals not only slows down the dissemination of scientific knowledge, but also diverts valuable time and resources away from actual scientific discovery.
They go on to explain their vision of augmenting human review, not replacing it:/
Most importantly, this project is not about automating judgment—but about scaling support, preserving rigor, and reclaiming researchers' time for creativity, discovery, and deep review.
3//
Accelerating Science with Human + AI Review
NEJM AI - 26 Nov 2025 - 9 min read
Ever since I saw this concept presented at the Peer Review Congress in Chicago in September, I’ve been waiting for the world at large to hear about it too. That happened a couple of weeks ago when this editorial appeared in NEJM AI, detailing the accelerated human + AI review process they have developed to review selected papers within 7 days.
The editorial itself is accessible and full of details, but there are a few things that are noteworthy:
This service has been developed and promoted under the NEJM brand. It is not stretching credulity too far to suggest that could be rolled out to other journals in the NEJM family soon
The editors are to be commended for using the latest available LLMs. Sadly not something that is as common as it should be in this field. Ditto benchmarking against human expertise.
The supplementary file, with the prompts, responses, and iterations are truly mind-blowing if you haven’t kept up to speed with how AI can assist human reviewers. This is where the real ‘magic’ is. I urge you to read both the editorial AND delve into the Supplementary Appendix.
Supplementary Appendix: https://ai.nejm.org/doi/suppl/10.1056/AIe2501175/suppl_file/aie2501175_appendix.pdf
4//
UKRI opens up grant proposal data to explore using AI to smooth peer review
Chemistry World - 03 Dec 2025 - 5 min read
The world of research funding is undergoing the same stresses and strains that conferences and journals are currently experiencing - namely an explosion in submissions that are often of questionable quality. In the case of UKRI, applications are 80% up on 2018 figures.
UKRI are therefore opening up the underlying data to nearly 2,000 grant applications to a team lead by Mike Thelwall (see Scalene 46 - AI Peer) to see if AI tools can accurately predict the scores peer reviewers gave the proposals and the ultimate recommendations to fund or not.
It’s a interesting problem given a) the high stakes of funding, b) the fact that human judgement is not always perfect, and c) the ability to identify interesting novel solutions that may go against what is published in the existing literature (and therefore used as training data for LLMs) - a point made in the first story above too. I will be following this one closely.
5//
To Err is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis
arXiv.org - 05 Dec 2025 - 26 min read
An AI Correctness Checker sounds more than a little Orwellian, but actually is intended to systematically identify academic doublespeak (objective mistakes) in AI conference submissions. There is a great Guardian article on the problems of AI conference submissions below too.
The authors found an average of 4.66 mistakes per paper ‘with an upward temporal trend’. Human validation suggests a 83% precision rate, but only a 60% recall on intentionally-injected errors. Furthermore the system can suggest fixes for three-quarters of all recognised mistakes.
The service is quick, cheap, and likely to get better over time. The real interest here (for me) is in extrapolating to having a service like this look at all published literature and identify errors and fixes for the whole corpus of research publications. Can we create a verified, error-free database of research?
And finally…
It’s been a while since I did some round ups of Scalene-adjacent links and stories, so here we go. I’m not sending another Scalene before Christmas, but there may be one between Christmas and New Year if I’ve become an annoyance to my family by then - otherwise we’ll be starting afresh in January. Merry Christmas and Happy New Year to you all!
Artificial intelligence research has a slop problem, academics say: “It’s a mess” - The Guardian - 06 Dec 2025
Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models - arXiv - 19 Nov 2025 [despite sounding like something from the Hitchhikers Guide to the Galaxy, this is a fascinating insight into how LLMs can override their own guardrails if you ask them to respond in the form of a poem!]
Identity Theft in AI Conference Peer Review - CACM - 24 Nov 2025
SemanticCite: Citation Verification with AI-Powered Full-Text Analysis and Evidence-Based Reasoning - arXiv - 20 Nov 2025 [another initiative seemingly taking an aim at Scite’s USP]
Let’s chat
I’ll be at the STM London meeting on both days in early December. Come and say hi.
Curated by me, Chris Leonard.
If you want to get in touch, please simply reply to this email.