Scalene
Posts
Scalene 7: SciMON / Bad incentives / Hybrid approaches

Scalene 7: SciMON / Bad incentives / Hybrid approaches

Chris Leonard
July 07, 2024

Humans | AI | Peer review. The triangle is changing.

Hi all. Sorry for the absence of a newsletter last week. I was all tired and Covid-y, but thankfully feeling better now. There has also been a little hiatus in related outputs on arXiv, which allows us to catch up on some other aspects of peer review and how it is evolving with AI.

As an aside, (and indulge me here for a moment) part of my research for this newsletter involves me trawling twitter/x once a week with various variants of ‘peer review AND something’ and it is abundantly clear by the tweets I read on there that peer review as it is today is not scaling, is not understood by the public, and has an increasingly negative ‘brand image’ - mainly due to delays. I wonder if we would should embrace the dawn of the AI-assisted era as an opportunity to rebrand peer review as some form of quality assurance. Yes, we look at and comment on the content, but also the declarations, metadata, ethics and integrity (as far as we can) and publish these alongside the manuscript.
Maybe the validation of academic research needs to be reinvented as more than just peer review?

7th July 2024

// 1
SciMON: Scientific Inspiration Machines Optimized for Novelty
arXiv.org - 03 June 2024 - 23 min read

We take a dramatic departure with a novel setting in which models use as input background contexts (e.g., problems, experimental settings, goals), and output natural language ideas grounded in literature. We present SCIMON, a modeling framework that uses retrieval of “inspirations” from past scientific papers, and explicitly optimizes for novelty by iteratively comparing to prior papers and up-dating idea suggestions until sufficient novelty is achieved. Comprehensive evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our methods partially mitigate this issue. Our work represents a first step toward evaluating and developing language models that generate new ideas derived from the scientific literature

https://arxiv.org/abs/2305.14259

CL - What is this? Well one problem with AI-assisted peer review is placing the work in context of what has preceded it. Is it novel? Is it standing on the shoulders of giants? What is it adding to the field? Whilst this is seemingly unrelated, we can imagine using this as a basis for a reverse procedure which compares findings in a paper to prior literature for a comparison of the novelty of the main claims. Please someone do this!

// 2
LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected?
ACL Anthology - 16 June 2024 - 46 min read

A perfect example of Betteridge’s Law: the short answer is ‘no’. I’ve long been sceptical of AI-text detectors, and this shows that a human editing AI output means it’s effectively undetectable again. I imagine the same results apply with paraphrasing tools:

https://aclanthology.org/2024.findings-naacl.29/

Why do we a) try to fight a losing battle with detecting AI output, and b) try to fight a losing battle with insisting authors declare their use of AI tools?
I was away at a wonderful meeting last week where this very topic came up. We were about to list all the things we should ask authors to do to make the use of LLMs in academic manuscripts acceptable, but stopped when we considered we don’t do the same for mathematicians and calculators. We can’t perfectly replicate the output from a LLM even with the exact same prompt. What are we going to do with these declarations in 20 years time? It’s time for the authors to declare they stand by all elements of the work submitted, and maybe leave it at that.

On a related theme, PAIR - Peer AI Review, aims to provide a repository for human-identified instances of AI-generated text in the published literature. From a post on LinkedIn by Kenneth Hallenbeck:

PAIR is a constantly growing database that stores confirmed cases of GenAI in the scientific literature and notes which content is impacted. AI isn't inherently bad, but it should be flagged. I hope this will be a useful tool as the community navigates rapidly growing access to models that can produce convincing academic language, hallucinate references, and generate images.

Anyone can join PAIR to search, and submit papers they find suspect. We'll validate those reports and update the database.

PAIR: https://peeraireview.com

// 3
The Misplaced Incentives in Academic Publishing
UnDark.org - 04 July 2024 - 5 min read

Surely, scientists’ participation in the process can help their own productivity: Scholars build relationships with editors of journals in which they might publish their own manuscripts, and reading and reviewing manuscripts exposes them to new work. But at best, these benefits are indirect. Plainly speaking, virtually no one in the history of professional science has been promoted or meaningfully rewarded because they provide stellar reviews of others’ work.

CL: Another examination of why current forms of peer review are failing, and where real change could originate (hint: not with publishers). A good read.
https://undark.org/2024/07/04/opinion-misplaced-incentives-academic-publishing/

// 4
Guido Hermann on AI in the scholarly ecosystem
Spotify - 5 July 2024 - 50 min listen

A few of you may know that my day job is with Cactus Communications (all things peer review, get in touch!) - and that we have podcast called Insights XChange:

In this episode of Insights XChange Dr. Guido F. Herrmann, Managing Director at Wiley, joins Nikesh to discuss the evolving landscape of academic research, AI, and scientific publishing. Dr. Herrmann shares insights on maintaining research integrity amidst challenges like fraud and predatory journals. The conversation explores how AI is revolutionizing the scholarly ecosystem, enhancing peer review, and ensuring ethical standards. Dr. Herrmann offers advice for early career researchers on conducting high-quality research and using AI tools effectively. He emphasizes the role of publishers in adopting AI to support researchers and foster innovation.

https://open.spotify.com/episode/4f0C34HuXrI9U2otwaEtGi?si=aom1XavJT9iiQmdzFh8otQ

// 5
Hybrid approach to peer review yields best of both worlds
LWW - 09 May 2024 - 4 min read

This accepted author version of a Correspondence to the International Journal of Surgery makes some good points:

Their findings showed that ChatGPT possess the capacity to significantly transform the functions performed by both peer reviewers and editors within academic publishing. By aiding both parties in the expeditious composition of insightful evaluations or decisive correspondence, ChatGPT serves as enablers for enhancing the caliber of peer review processes while also mitigating concerns stemming from reviewer scarcity. Another study has compared the assessment results of twenty-one research articles among two humans, ChatGPT 3.5 and ChatGPT 4. Their results show that the subjective review opinions given by human reviewers and ChatGPT, especially ChatGPT 4, are similar.

Therefore, these studies suggest that the use of ChatGPT in the peer review process seems feasible. Reviewers could generate a clearly review report by inputting their own notes into ChatGPT, and the opportunity to streamline the review process may be enough to encourage reviewers to accept the invitation.

https://journals.lww.com/international-journal-of-surgery/citation/9900/not_just_disclosure_of_generative_artificial.1458.aspx

CL - By the way, if anyone at LWW is reading this, you can improve your download numbers by making the download link appear in Safari browser. I spent a good 20 mins looking for it before I tried a different browser!

And finally…
Sorry to any Portugal fans out here, but the Euro2024 championships are playing out in Germany right now, and this tickled my fancy (although as an England fan I have no right to be posting this!)

— Out Of Context Euro2024 (@NoContextEPL)
10:40 PM • Jul 3, 2024

Let's do coffee!
I’m travelling to the following places over the next few weeks. Always happy to meet and discuss anything related to this newsletter. Just reply to this email and we can set something up:

Oxford: 10 July
ALPSP (Manchester): 11-13 September

Curated by Chris Leonard.
If you want to get in touch with me, please simply reply to this email.