Scalene
Posts
Scalene 38: I'm busy / ReviewerThree / predicting retractions

Scalene 38: I'm busy / ReviewerThree / predicting retractions

Chris Leonard
June 15, 2025

Humans | AI | Peer review. The triangle is changing.

A bumper week of news at the intersection of humans and AI and peer review. I don’t have the space to cover all of the arXiv developments this week, so look forward to that next week instead. Ready? Let’s go!

15th June 2025

1//
Using AI/LLMs for detecting scientific errors

Metascience 2025 - 17 June 2025

I don’t do this very often, but I’m starting with a recommendation for a virtual symposium ahead of the Metascience 2025 event later this month. This symposium is on Tuesday - so don’t hang about. Link to registration is below. I’m very excited about this one:

With this symposium, we will bring together people from four ongoing projects to discuss how the latest LLMs and other AI tools are being developed, evaluated, and used to detect errors in scientific publications: The Black Spatula Project, RegCheck, SocEnRep, and Psy-RAG. These projects involve people from different countries, and academic disciplines as well as people outside academia

https://nomadit.co.uk/conference/metascience2025/p/16952?utm_source=conference&utm_medium=sendy&utm_campaign=ms25_reg_open_virt_final

2//
I’m busy: The search for peer reviewers

Matter - 04 June 2025 - 6 min read

A restrained, yet illuminating editorial which acts as a reminder of how hard it is to get a representative sample of experts to peer review manuscripts these days. Some of the remedies here may give the existing volunteer-dependent workflows a bit more life, but it’s hard to see a long-term solution here (IMHO):

We don’t just want three names out of a hat, we want three (or more) reviewers to encompass the technical aspects of a given manuscript. Say a paper involves the synthesis of metal-organic frameworks (MOFs) for carbon capture. We seek an expert in MOF synthesis, another in carbon capture. Maybe someone with a background in structural characterization, just to make sure. Not only that, to try to be as unbiased as possible, we have targets for reviewer representation—explicit goals for global distribution, aims for gender representation, and considerations of career level.

https://www.cell.com/matter/fulltext/S2590-2385(25)00204-8

3//
ReviewerThree.com

Natalie Khalil on LinkedIn - 13 June 2025 - 1 min read

Professors still do peer review for free, journal publishers are getting sued, reviewers are biased, and the entire process can sometimes drag out to years...

Last week we attended the AI Engineer World’s Fair Agents Hackathon and made it to the finalists!

In an intense hour and a half, we ideated, built, and shipped reviewerthree.com, a multi-agent peer review platform that gives feedback on research papers. And yes, we even bought the domain, named after the infamous reviewer number three!!

See the LinkedIn post here, and click below for a feel of what they are trying to do. I found the approach of using 3 agents fascinating, even though the consolidated feedback was superficial to say the least (one paragraph from each agent). However this is an easy fix (and this was a hackathon remember!) and it’s easy to imagine journal-specific agents alongside these more general ones. Impressive first steps.

https://www.reviewerthree.com/upload

4//
Predicting retracted research: a dataset and machine learning approaches

Res Integrity & Peer Review - 11 June 2025 - 20 min read

Retracting scientific articles is essential for safeguarding the integrity of the research record, but the growing number of retractions also reveals weaknesses in peer review and editorial oversight

A strong opening statement from the authors of this research paper which sets about using machine learning and a dataset of retracted and non-retracted papers to ascertain if certain characteristics of a paper can predict it’s future potential for retraction. Spoiler alert - not really, at least not yet.

Experiments showed that, with the exception of the recently released Llama 3.2 base model, traditional feature-based classifiers, such as gradient boosting machines and SVMs, outperformed contextual language models like BERT, BioBERT, and Gemma in terms of precision. The best-performing model achieved a precision of 0.690, indicating that while machine learning techniques hold promise, there remains a need for significant improvement before they can be effectively integrated into the peer review process.

https://doi.org/10.1186/s41073-025-00168-w

5//
Research quality evaluation by AI in the era of Large Language Models: Advantages, disadvantages, and systemic effects

arXiv - 09 June 2025 - 24 min read

Mike Thelwall uses this paper to look specifically at post-publication research quality evaluation using LLMs (the kind you might have to conduct for a research assessment exercise like - ahem - REF2029 in the UK). It’s a great read, from a key figure in the field, and has made me reconsider some of my personal assumptions about LLM deployment. If you’ve got a plane or train journey coming up - use it productively to read this in full.

.https://arxiv.org/pdf/2506.07748

And finally…

Tangential stories from around the web:

Peer Review Week 2025 is on the theme of Rethinking Peer Review in the AI Era
Automation of Systematic Reviews with Large Language Models. This preprint (describing agent workflows to construct SRs) appeared at almost the same time as Anthropic’s release of how they built their multi-agent research platform. One thing that stood out for me here was the embedding plot toward the end of that page that suggests a primary use of agents has been in assisting with academic research.
My thoughts on how Editorial Boards could be used as human experts in a hybrid human/AI workflow. Genuinely interested in your opinions on this.
DAVE: Open the pod bay doors, ChatGPT.

One year ago: Scalene 5, 16 June 2024

I like to finish on a short read that is funny, uplifting, or pithy in some way. This examination of the state of peer review reminds us of why we need AI to speed up and unbias peer review:
Pee Review: The Enshittification of Science?:

Let’s chat
Many of you may know I work for Cactus Communications in my day job, and one of my responsibilities there is to help publishers speed up their peer review processes. Usually this is in the form of 100% human peer review, delivered in 7 days. However, we are now offering a secure hybrid human/AI service in just 5 days. If you want to chat about how to bring review times down with either a 100% human service, or you’re interested in experimenting with how AI can assist, let’s talk: https://calendly.com/chrisle1972/chris-leonard-cactus

Curated by me, Chris Leonard.
If you want to get in touch, please simply reply to this email.