Scalene
Posts
Scalene 59: Jagged / Exoskeleton / AAAI-26

Scalene 59: Jagged / Exoskeleton / AAAI-26

Chris Leonard
May 17, 2026

Humans | AI | Peer review. The triangle is changing.

Everyone at Scalene Towers has been getting excited for SSP next week, so if you’re there, come and say hello. I’m presenting on Wednesday afternoon at 3pm on the concept of personal peer review, if I get my slide deck together before then. Anyway, lots of non-SSP things happening in the world of AI and peer review, so let’s get stuck in.

17th May 2026

1//
Jagged AI in Scientific Peer Review: Evidence from POMP Data Analysis

arXiv.org - 08 May 2026 - 15 min read

I’m always surprised at the range of attitudes to AI(-assisted) peer review: some people think it’s fantastically useful and others hate it for its perceived shortcomings. What I never really considered before is that both can be true depending on the subject matter being analysed. This preprint looks at the jagged nature of AI in peer review, where there are strong ability spikes in some domains and deficiencies in others. Additionally the authors restrict papers to one domain and consider the usefulness of various Claude agents and prompting strategies. Their conclusions?

AI and human reviewers largely detected different problems.
The addition of skill files tuned jaggedness rather than resolving it.
The human-unique layer was characterized by judgment and context that AI consistently lacked.

https://arxiv.org/abs/2605.07855

2//
Reviewer 2 and the Ministry of Truth

Substack - 13 May 2026 - 5 min read

The subtitle of this blog post gives us some clues as to what’s to be discussed here: Peer review, AI, and the quiet politics of truth. Paul Jones looks at peer review as a corollary of Orwell’s 1984. But some of his observations made me stop and think about whether the true value of gold-standard peer review can ever be automated (see also previous story)

A claim should not become credible simply because it is fluent, confident, popular, or institutionally convenient. Research needs challenge. Methods need scrutiny. Evidence needs testing.
At its best, peer review slows down the seductive speed of certainty. It asks whether the evidence supports the claim, whether the method is appropriate, whether the conclusion has outrun the data, and whether the work contributes something more than another decorative brick in the career wall.

Jones observes that AI can easily mimic academic signals, exposing the gap between something that looks ike knowledge, and actually having.

The future of academic knowledge cannot be a retreat into old rituals simply because they feel familiar. Nor can it be a collapse into unfiltered fluency, where everything that sounds plausible is treated as true.The harder task is to decide what kinds of scrutiny deserve trust when the performance of knowledge has become so easy to reproduce.

My feeling: We won’t ever replicate the very best human peer review for some of these reasons, but the other 90%? Much more feasible.

https://amusingpaul.substack.com/p/reviewer-2-and-the-ministry-of-truth

3//
More Versus Better, Part II

Substack - 30 Apr 2026 - 10 min read

Someone kindly sent me part 3 of this series this week - and it definitely is worth going through them all - but it was part 2 that caught my eye in particular. Lamar Pierce opens with a problem I hear a lot at the moment:

…submission volume at Organization Science is up 42% since the launch of ChatGPT in November 2022, driven almost entirely by manuscripts with substantial AI-generated text. These papers are also of a worse quality across linguistic measures and editorial outcomes. They are overwhelmingly desk rejected, and those that make it through to peer review rarely make it to the second screen. In other words, the system is catching these papers, but the burden on editors and reviewers is growing.

And while increased submissions would sound good on the whole, there are not 42% more editors or reviewers, so it is not surprising more people are looking to AI assistance in evaluating these submissions. Pierce used Pangram AI detector to look at likely AI usage in peer review reports over the last 5 years.

Nearly 40% of reviews at Organization Science now show some degree of AI-generated text. The fastest-growing segment is reviews in the 30-70% AI range, which suggests meaningful AI involvement in the drafting process. A smaller but growing tail of reviews scores above 70%.

Further analysis looked at 100% AI-generated reviews, and various combinations of AI-generated, human edited / human-generated, AI edited reports. It’s one of the best practical guides to identifying AI in peer review reports I’ve read. There is a nagging feeling that in 2 years’ time, this won’t matter - but for now, essential reading to anyone involved in managing peer review processes.

https://orgsci.substack.com/p/more-versus-better-part-ii

4//
The Collaborative Exoskeleton of AI Science

Tim O’Reilly - 15 May 2026 - 11 min read

I hadn’t heard from Tim O’Reilly for a while, and then as if by magic, he posts this essay on AI’s interactions with the current scholarly infrastructure, or rather the lack of them. LLMs which integrate and contribute to Crossref, ORCID, ReractionWatch, arXiv (for example) could reduce some of the errors around hallucinated citations, citations of retracted papers, and he notes how a central citation graph will drive science (and peer review of same science) to new levels.

https://asimovaddendum.substack.com/p/the-collaborative-exoskeleton-of

5//
AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot

arXiv.org - 15 Apr 2026 - 15 min read

Here’s a story we haven’t heard too much about. Nearly 23,000 submissions to the AAAI-26 conference were subjected to an AI review. The reviews were generated in 1 day and were clearly labelled as AI generated. Authors were given these reviews and human reviews and evaluated them side-by-side:

…participants not only found AI reviews useful, but actually preferred them to human reviews on key dimensions such as technical accuracy and research suggestions

… these results show that state-of-the-art AI methods can already make meaningful contributions to scientific peer review at conference scale, opening a path toward the next generation of synergistic human-AI teaming for evaluating research.

https://arxiv.org/abs/2604.13940

And finally…

Let’s chat

If you are attending SSP, please do two things. 1) come to my talk and 2) come and talk to me. If you want to set up a meeting about anything to do with this newsletter, just reply to this email. Thanks and see you there.

Curated by me, Chris Leonard.
If you want to get in touch, please simply reply to this email.