- Scalene
- Posts
- Scalene 33: Nature / PaperBench / unstoppable
Scalene 33: Nature / PaperBench / unstoppable

Humans | AI | Peer review. The triangle is changing.
I was at ALPSP Redux in Oxford last week and I was struck by how many threats there are to the integrity of scholarly publishing right now. Aside from the ‘normal’ threats of reductions in library budgets, research integrity, peer review backlogs, and many of other things which cause sleepless nights, there is now the additional threat of foreign government interference with DEI programmes outside of their jurisdiction (comment in a session - not a presentation), but with the implied threat of withholding subscriptions if things don’t pan out their way. I wouldn’t relish the job of leading a press right now, but it feels like the morals and ethics of research communication are about to sorely tested.
7th April 2025
1//
AI is transforming peer review — and many scientists are worried
Nature - 26 March 2025 - 14 min read
A review which missed the last newsletter by minutes, and first for me, a quote in Nature, but despite that the piece is generally excellent. It’s a balanced view of how peer review could evolve over the coming year/s alongside other general advances in AI. I say ‘balanced’ as it thankfully looks at the positive aspects that AI could bring to peer review, alongside some of the more traditional concerns. I particularly like this simple graphic which shows AI feedback in peer review is already at least as good as human feedback according to 40% of respondents:

2//
PaperBench: Evaluating AI's Ability to Replicate AI Research
arXiv.org - 02 April 2025 - 64 min read
I’m increasingly of the opinion that automated review of manuscripts is inevitable. And if I’m right, what then? A fascinating area of research that has only recently cropped up is the concept of the AI scientist. Sakana AI had a paper kind of accepted to a machine learning conference, but OpenAI have detailed something more impressive in ambition here:
We introduce PaperBench as a challenging benchmark for assessing AI agents’ abilities to replicate cutting-edge ma- chine learning research. Each included paper represents ex- citing work in a contemporary domain of interest – such as reinforcement learning, robustness, and probabilistic meth- ods – and is evaluated against a rigorous rubric co-developed with the original authors. By requiring AI agents to build en- tire codebases from scratch, conduct complex experiments, and generate final results, PaperBench offers a demanding real-world test of ML R&D capabilities.

CL: Note the results are underwhelming for this first attempt. The strongest evaluated agent (Claude 3.5) achieved an average replication score of just 21.0%. However, as a proof of concept it is exciting and lays the pathway for future research.
3//
The inevitable future of peer review: Human and AI integrated peer review system
PJMS - 20 Mar 2025 - 6 min read
I enjoyed reading this editorial for its quotes and quirky misspellings (evidence that it wasn’t written by AI?). And while there are some great numbers in here, the thing that caught my eye was this sentence:
As peer review is considered as an unpaid, spare time academic activity subject to the good will of researches, a smaller number of researchers are willing to accept review invitations because of increased workload and peer reviewer’s fatigue.
This is something that was the subject of conversation at the ALPSP Redux conference too. Many viewed peer review as an essential part of the job of an academic, and yet it is rarely factored in to annual appraisals or promotion decisions. It can’t be both an essential part of the role and unappreciated. One of these things will give before the other.
//4
Is research misconduct becoming unstoppable?
IJME - 13 Feb 2025 - 4 min read
Peer review is but a small part of what makes the current mix of research integrity, but this editorial has a fairly pessimistic tone about our ability to ever get on top of it. I had some great conversations about this in Oxford, however. Tracking the provenance of research through gradual publication (micropubs and pre-registered reports) and open peer review is a start. But the usual issues about universities using unsuitable metrics to determine career paths comes up again and again. How can universities wean themselves off these metrics and how can they be replaced. This feels like as big an issue as how to handle AI nowadays.
https://ijme.in/articles/is-research-misconduct-becoming-unstoppable/?galley=html
//5
Guarding against artificial intelligence–hallucinated citations
arXiv.org - 24 Mar 2025 - 4 min read
Sometimes the most high-level problems can be solved with the simplest approach. How to stop authors citing hallucinated references? Make them submit the full text of each source with their journal submission. Not sure how much better this is than a DOI-based approach that matches citation text to cited work, but it’s an admirable attempt to solve something without indulging in an escalating technological arms race. One benefit of this approach, authors can’t cite things they haven’t read or don’t have access to, which is another problem altogether.
And finally…
Something different this week. Looking for analogies in the literature for AI’s impact presses I came across many stories, two of which I highlight here which parallel the situation with AI and university presses:
"Gradually, Then Suddenly": The Profound Wisdom of Hemingway's Insight on Change - https://blog.practicaljournal.com/articles/gradually-then-suddenly-the-profound-wisdom-of-hemingways-insight-on-change
“How the National Library of Medicine should evolve in an era of artificial intelligence” - https://doi.org/10.1093/jamia/ocaf041
Finally, a tweet(?), sky post(?) that has been living in my head for the last few weeks, which I’m sharing here so that it can live in yours too:
Not to be a broken record, but AI critics who insist that AI "doesn't work" and is going to just disappear are misleading - that just isn't true, as controlled studies like this one show. There are many issues with AI & many things that need critique, but pretending it is going away is not helpful.
— Ethan Mollick (@emollick.bsky.social)2025-03-04T03:11:32.045Z
Free consultation calls
Many of you may know I work for Cactus Communications in my day job, and one of my responsibilities there is to help publishers speed up their peer review processes. Usually this is in the form of 100% human peer review, delivered in 7 days. However, we are keen to experiment further with subtle AI assistance. If you want to chat about how to bring review times down with either a 100% human service, or you’re interested in experimenting with how AI can assist, let’s talk: https://calendly.com/chrisle1972/chris-leonard-cactus
Curated by me, Chris Leonard.
If you want to get in touch, please simply reply to this email.