Scalene
Posts
Scalene 16: o1 / community review / lessons from radiology

Scalene 16: o1 / community review / lessons from radiology

Chris Leonard
October 06, 2024

Humans | AI | Peer review. The triangle is changing.

A couple of stories this week are not specifically about AI & peer review, but around the infrastructure and culture that supports it. Firstly, the release of the o1 & o1-mini model from OpenAI has excited many with its reasoning capabilities. We also take a look at lessons from the world of radiology, where AI was assisting doctors, but actually diagnoses better on its own without human bias. Then there’s also a few stories I missed in the whirlwind of Peer Review Week.

I’m in Frankfurt for the whole of 14-18 October. I’d love to meet you if you’re there. Just reply to this email and we find a mutually convenient time. Otherwise I’ll be around the Cactus stand, Halle 4 F64.

6th October 2024

// 1
‘In awe’: scientists impressed by latest ChatGPT model o1
Nature - 01 October 2024 - 7 min read

This very accessible article on the new LLM model released by OpenAI shows just how the ‘chain-of-thought’ reasoning model delivers much better answers to PhD-level science questions than previous versions.

https://www.nature.com/articles/d41586-024-03169-9

Very rock-n-roll of me too, but I spent a large part of my Saturday night reading this in-depth analysis of the strengths (and still, weaknesses) of this new kind of LLM. It’s worth skimming over to find the subject areas that interest you and how it performs:
https://arxiv.org/abs/2409.18486

So what - you may be thinking - how does this affect peer review. That aspect only really struck home when I started seeing what ‘regular’ scientists were saying on social media. It’s clear that o1 may result in some acceleration in automated review in the coming months (the replies to the tweet below are also good).

I asked OpenAI o1-preview to read one of our fairly complex papers and come up with 10 follow-up questions. It came up with 10/10 insightful and excellent questions! A couple of them required a very deep understanding and reasoning of the findings only an expert could think of. I… x.com/i/web/status/1…
— Derya Unutmaz, MD (@DeryaTR_)
2:59 AM • Oct 2, 2024

CL: Although (see next story) there are many ways human input is still desirable, the direction of travel, and indeed speed, are pointing to better-than-human research evaluation within a few years. It won’t be peer review, as it’s not done by peers, but it will be a validation/evaluation process that possibly replaces it. A note of caution though - this current o1 (Strawberry) model is relatively slow and very compute-intensive.

// 2
Human-driven community peer review: Stacks
Nature - 01 Oct 2024 - 5 min read

All of this excitement around peer reviewing and AI stems from the very real problem that we can’t get enough peers to agree to review the papers that are submitted to journals. Given we published 5 million articles per year, maybe this isn’t surprising. But perhaps it’s also because it feels like a chore to the reviewer. They don’t get to interact with other reviewers and frequently don’t even get to know if a paper was accepted or not until it is published.

So it’s refreshing to hear of a human-driven model that is thriving. Stacks Journal was featured in this week’s Nature Index for its community peer review model which is so successful they have had to cap the number of reviewers at 7(!).

Now, most journals have two reviewers who assess a manuscript separately. At the Stacks, we bring together communities of reviewers to collaborate. It’s double-blind, to ensure fairness, and reviewers can see each other’s comments and discuss whether they agree.All the peer-review reports, underlying data and code are publicly posted, along with the names of the reviewers.
[…]
We’ve created a ‘credibility score’ for each published article, so readers can quickly get a sense of the reviewer’s feedback. The credibility score is calculated as the percentage of reviewers who voted to accept the article for publication. So, for example, if six out of seven reviewers think an article should be published, its score will be 86%.

https://www.nature.com/articles/d41586-024-03039-4

Also, a former colleague of mine has written a nice piece on the indispensible role of humans in peer review (hi Ashutosh!):
https://www.csescienceeditor.org/article/human-attention-observation-editorial-peer-review/

// 3
AI and scholarly publishing: unfashionable glimpses of hope
Davidworlock.com - 01 October 2024 - 9 min read

David Warlock has observed the scholarly publishing industry for many years, and his ‘unfashionable’ optimism about how AI will affect it is detailed in this blog post.

Is anyone under any doubt that we will create fully automated peer review systems which operate more successful than human beings? I have been watching this space since the work of UNSILO in Aarhus almost a decade ago, and I cannot now conceive that we will fail in the search for systems that detect plagiarism, copyright theft or papermill inventions that work at a higher percentage of efficiency than human peer reviewers. While the systems will all require human supervision, audit and checking, they will counter the ability of AI to be misused until we come to a further level of technological development which requires a further wave of watchdog development.

https://www.davidworlock.com/2024/10/ai-and-scholarly-publishing-unfashionable-glimpses-of-hope/

// 4
A harm reduction approach to improving peer review by acknowledging its imperfections
FACETS - 18 Sept 2024 -42 min read

I read this and marked it for the newsletter a few weeks ago, but it somehow slipped through the net. I’m really pleased it was simply delayed though, because this is a wonderful piece of work from authors from Canada, India and Norway.

The authors address each of these identified shortcomings of the peer review process and suggest ways to ameliorate or remove these acknowledged imperfections. While there is no specific mention of how AI might help here, these points are valuable markers for anyone designing some alternative system.

https://www.facetsjournal.com/doi/10.1139/facets-2024-0102

// 5
Radiology & AI: Lessons for peer review?
Various sources

I’ve been seeing anecdotal evidence of the improving quality of AI in medical diagnoses recently, and wanted to pull together a few threads where I couldn’t help but see parallels with peer review. In the case of medical conditions, these can be life-or-death decisions, so at least on a par with the importance of peer review. Are their lessons to learn here for the peer review process? If I may be permitted to answer my own question, the answer is yes. Fast decisions, unbiased diagnoses, high-throughput - isn’t that what we want for our authors and journals?

https://www.linkedin.com/posts/emollick_a-preview-of-the-coming-problem-of-working-activity-7247261073634918403-zcgO?utm_source=share&utm_medium=member_desktop

https://www.linkedin.com/pulse/radiology-reports-under-pressure-impact-workload-shortage-ai-driven-uqfwc/

https://x.com/DeryaTR_/status/1834630356286558336

And finally…
Due to the unusual location I found myself in last week, I don’t think I did justice to the PeerArg paper. Thankfully someone has gone to the effort of writing this up in an understandable way:
https://www.azoai.com/news/20240930/PeerArg-System-Enhances-Peer-Review-Transparency-With-Argumentation-and-AI.aspx

Anyone who has had the pleasure of traveling around the south of England on a train will relate to this: https://www.instagram.com/reel/C-FJDTegYDR/?igsh=aDUxdjczNm14bWQ%3D

Let's do coffee!
It’s conference season again. I’m setting up meetings at STM & Frankfurt Book Fair in the middle of October, so please get in touch (reply to this email) if you want to say hi at either of those events.

Curated by Chris Leonard.
If you want to get in touch with me, please simply reply to this email.