Scalene
Posts
Scalene 26: Black spatula / voice reviews / upstream

Scalene 26: Black spatula / voice reviews / upstream

Chris Leonard
January 12, 2025

Humans | AI | Peer review. The triangle is changing.

May I be one of the last people to say Happy New Year and Welcome to 2025! Years really feel like epochs in this AI era, and so much has happened since the last newsletter before Christmas that it was a little overwhelming to write this. Since the last newsletter we have seen significant new LLMs in the shape of o1 pro, 03, and Gemini’s multimodal chain-of-thought model. They are going to change the scope of what is possible in research evaluation. Indeed, my most bullish of forecasts looks like being beaten this year. I thought by the end of 2026 we’d have a way to evaluate manuscripts with AI which is at least as good as the average (not best) peer review. With some caveats, I expect that to happen this year. So each week represents significant progress in one way or another. Let’s start here:

12 January 2025

// 1
The Black Spatula Project
Discord/WhatsApp - Dec 2024/ongoing

Not a specific thing to link to, more a collection of things which quickly escalated into a ‘movement’ either side of Christmas. It all started off with a blog post by Steve Newman, which you can read (along with his follow-up) on the new project website: https://the-black-spatula-project.github.io

Briefly, a research paper had an order of magnitude error which meant the perceived risk from using black plastic utensils at high temperatures was massively over-stated. A subsequent post by Ethan Mollick showed that this error was relatively easily unveiled using o1 and a specific prompt. This lead to a logical next step question: can we do this for all papers to highlight critical problems in the published literature?

A large community suddenly swung into action on both WhatsApp and Discord (292 and 266 members respectively at time of writing) - remarkable given this was over the holidays and everyone was volunteering their time and expertise. There are vibrant and informed conversations to follow here, with a bias-to-action group of people who are making this happen before our eyes.

It is clear that this can be applied to unpublished papers as well as to published ones, and the range and depth of suggested prompts and analysis is way beyond what I see in most blog posts and articles. Take a look and lose yourself in the rabbit hole.
https://the-black-spatula-project.github.io

// 2
What is wrong with peer review?
Think Open - 14 Nov 2024 - 5 min read

This analysis of peer review by a Finnish researcher looks at some secondary criteria which are distorting the peer review process, namely elitism, relevance bias, significance bias, and rule following. He suggests that these secondary criteria (aside from the primary one of validity and correctness) are resulting in sub-standard peer review processes from narrow-minded reviewers with strong biases (his words, not mine).

// 3
Gemini Research Reviewer
GitHub - Dec 2024 - 1 min read

A great example of how the new LLM models are opening up new possibilities from Devin White. This project uses the newest Gemini 2.0 Flash model to provide reviews of research papers to attempt to replicate the feedback from top conferences in the field of AI research (yes, very meta - or indeed Meta). As well as straightforward feedback, Devin is developing a chat interface for interactive feedback with your AI reviewer. This kind of transparency and iteration may be the thing that drives acceptance of AI as a reviewer. One to watch.

https://github.com/Dev1nW/Gemini_Research_Reviewer

// 4
Instant feedback via Gemini Multimodal
LinkedIn - December 2024 -6 min watch

One of the features of Gemini which we are just staring to exploit is the use of multimodality - i.e. not just processing and understanding text, but also images and video. Images are important in academic research evaluation due the all the figures which are sometimes, ahem, not totally correct (Dicky Mouse). However, this video shows a great use of screen sharing for the purpose of code evaluation via voice conversation.

https://www.linkedin.com/posts/heikohotz_demo-alert-googles-gemini-multimodal-activity-7272890349709058048-AQ9M?utm_source=share&utm_medium=member_desktop

A quick look around Heiko’s YouTube channel shows him also chatting (voice) with a PDF - which starts to get interesting for us and peer review. Would you review more easily via a voice interface?

// 5
The Time for AI-generated Peer Reviews is Now
Replication Index - 23 Dec 2024 - 4 min read

A provocative title for sure, and not one I wholeheartedly stand behind yet - but I am including it here for another reason. The author of this piece was annoyed at the perceived lack of quality in a peer review process which lead to a rejection decision. So, he decided to review the paper himself using ChatGPT.

Unhappy, or rather frustrated, I decided to ask ChatGPT for a CRITICAL review of the manuscript and just pasted the manuscript in the dialogue box. Less than a minute later, i had a objective review that showed understanding of the issue, acknowledged strength, and pointed out several limitations that can be used to strengthen the manuscript. Wow. This is a gamechanger.

This is a practical example of how AI can empower authors to review their own work before it is submitted to a journal. Indeed, the whole function that peer review solves for may one day happen much further upstream at the point of authoring. And why not? Who doesn’t want their manuscript to be as good as it can be? An interesting thought experiment at that point is to think of journal submission systems filtering for ‘quality’ and ‘relevance’ at the point of submission too. Things are changing fast.
https://replicationindex.com/2024/12/23/the-time-for-ai-generated-peer-reviews-is-now/

Kind of related is this: Journals should formalize AI "peer" review as soon as possible — they are getting them anyway. The author shares a prompt model for multiple views on a manuscript which he claims will be better than 80% of human reviews. I’ll let you be the judge of that.
https://blog.miljko.org/2025/01/08/journals-should-formalize-ai-peer.html

And finally…

Lots and lots of ‘other’ stuff here, as usual. A few articles behind paywalls this time, so I am only recommending them on their titles/abstracts.

Let's do coffee!
My first outings of 2025 look set to be:
- Researcher 2 Reader conference, London - Feb 25-26
- London Book Fair, London(!) - Mar 11-13
- ALPSP UP Redux, Oxford - April 3-4 [I’m giving the keynote speech on the 3rd]
Let me know if you’re at any of these and we can chat all things Scalene-related.

Free consultation calls
Many of you may know I work for Cactus Communications in my day job, and one of my responsibilities there is to help publishers speed up their peer review processes. Usually this is in the form of 100% human peer review, delivered in 7 days. However, we are keen to experiment further with subtle AI assistance. If you want to chat about how to bring review times down with either a 100% human service, or you’re interested in experimenting with how AI can assist, let’s talk: https://calendly.com/chrisle1972/chris-leonard-cactus

Curated by me, Chris Leonard.
If you want to get in touch, please simply reply to this email.