Scalene
Posts
Scalene 40: PersonaReview / LimitGen / Illusions

Scalene 40: PersonaReview / LimitGen / Illusions

Chris Leonard
July 06, 2025

Humans | AI | Peer review. The triangle is changing.

There has been so much brou-hou-ha about prompt injections to direct peer review in manuscripts recently (covered in Scalene 37) that we have almost forgotten about the ancient history of the MIT paper showing extensive LLM use erodes critical thinking, meaning my timelines and alerts have become a little homogeneous in recent days. However, have no fear - I have ventured deep into the internet and found some (hopefully) new stories for you.

6th July 2025

1//
PersonaReview + CS Paper

via LinkedIn - 3+6 July 2025

It was only a matter of time before someone brought agentic technology to peer review workflows and that person seems to be Gaurav Gulati. Upload a pdf or Word file and you can have it ‘reviewed’ by up to 20 reviewer agents. There are a few caveats (you need to upload your Google AI Studio API key - and the use cases are predefined right now) but it’s a first step toward authors (in this case) being able to get multiple viewpoints of feedback on their work before submission.

https://persona-review.vercel.app

Related, is the new beta tool known as CSPaper, which simulates computer science conference reviews using tailored AI agents. Upload a pdf or arXiv link and get detailed feedback super-quickly. It is intended for evaluating the likelihood of acceptance of a paper at ICML, NeurIPS, or ICLR and provides actionable revisions based on the requirements of that conference. The community messageboard is a good read too.

https://review.cspaper.org

2//
The Illusion of Peer Review

Substack - 05 July 2025 - 5 min read

Desi Ivanova has written two blog posts on how to improve peer review, and suggests something I’ve not seen before:

Here’s a radical idea that could kill three birds with one stone — the submission volume, the reproducibility crisis and the peer review bottleneck: for any paper an author or a team of authors want to submit, they first must reproduce an existing paper in the same field that is currently under review. For example, if a team is planning to submit to NeurIPS (typically due in early May), they would need to replicate work currently under review at ICML (whose review period usually runs from mid-February to mid-March). In this way, replication acts as peer review. The first iteration of this new peer review process would involve replicating work from a previous conference.

https://substack.com/home/post/p-167352768

3//
Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers

arXiv -03 July 2025 - 53 min read

The basis of this preprint is displayed in this very easy-to-understand graphic outlining the main research questions:

The paper goes on to describe LimitGen, the first comprehensive benchmark specifically designed to evaluate LLM’s ability to identify and address limitations in scientific research papers (in computer science and biomedical sciences). To answer the research questions above in an overly simplistic way:
RQ1: Not great. Even best LLM only identifies 52% of limitations that humans consider obvious.
RQ2: Yes. Best for literature reviews and experimental design.
RQ3: Early stage feedback for researchers & a complement to human peer review.

https://arxiv.org/abs/2507.02694

4//
Understanding how Peer Reviewers Use AI Tools

YouTube - 25 June 2025 - 88 min watch

More YouTube recommendations this week as HighWire hosted a webinar looking at how peer reviewers use AI tools. The panelists were Fabio Di Bello, Lucia Steele, and Sven Fund, and all contributors had unique insights. I particularly enjoyed Lucia’s talk from the point of view of a small publisher and their editors. You can see it all in full here:

https://www.youtube.com/watch?v=RDCi5sEXMvw

5//
Bergstrom time x2

Two things worthy of your time this week have Carl Bergstrom as a central player. Carl is a strong defender of the human aspect of peer review, and you can get a sense of that in two links below. The first, a co-authored editorial piece in Nature, reflecting on the time and deep thinking required in passing judgement on research papers - whilst accepting that AI can do some of the more mundane checks.

…writing a good peer review is not rote work. Like any kind of critical analysis, it requires that we triage, rank and organize our unstructured thoughts. We perhaps start with praise, then raise a few of the most pressing issues that need to be addressed. From there, we enumerate quick fixes, minor concerns and points of confusion. The whole process constitutes a negotiation with ourselves over what is important enough to mention, so that we can negotiate with the editors and authors over what should be changed. We might discover that our original impressions were misguided; that some of our comments need to be revised or omitted; or that seemingly minor issues are, in fact, fundamental flaws. The process of summarizing and synthesizing helps us to engage more deeply with the manuscript.

https://www.nature.com/articles/d41586-025-01839-w

And then there is this thread on BlueSky from a talk Carl gave, attended by Selene Fernandez.

I enjoyed Carl Bergstrom's @carlbergstrom.com talk today @unswbabs.bsky.social on the current 'Peer review meltdown' phenomenon. Misery loves company, so I was glad to find out I'm not the only editor struggling to find reviewers. Carl et al. have narrow it down to a few factors including ->
— Selene FeRNAndez (@selfdz.bsky.social)2025-06-23T08:39:46.768Z

You can even watch the whole presentation here: https://www.youtube.com/watch?v=TZC0-ghtvqE [only last third is around research evaluation - but the rest (teaching students to think in an AI world) - is fascinating too]

And finally…

Let’s Review II - Company of Biologists
AAAI Launches AI-Powered Peer Review Assessment System
What Shapes Writers' Decisions to Disclose AI Use?

One year ago: Scalene 7, 7 July 2024

Let’s chat
Many of you may know I work for Cactus Communications in my day job, and one of my responsibilities there is to help publishers speed up their peer review processes. Usually this is in the form of 100% human peer review, delivered in 7 days. However, we are now offering a secure hybrid human/AI service in just 5 days. If you want to chat about how to bring review times down with either a 100% human service, or you’re interested in experimenting with how AI can assist, let’s talk: https://calendly.com/chrisle1972/chris-leonard-cactus

Curated by me, Chris Leonard.
If you want to get in touch, please simply reply to this email.