- Scalene
- Posts
- Scalene 20: MAMORX / witch hunt / sword
Scalene 20: MAMORX / witch hunt / sword
Humans | AI | Peer review. The triangle is changing.
I'm increasingly perplexed by the preciousness of academic publishers when it comes to even experimenting with AI approaches to tackling peer review. Other industries adopt and adapt - yet we seem to be reluctant to do either. A conference I attended this week underscored much of this and it feels like we haven’t really moved on much in our attitudes - even though the technology most definitely has. If you want to see what’s possible today (in parallel with traditional review methods) - get in touch.
10th November 2024
// 1
MAMORX: Multi-agent Multi-Modal Scientific Review Generation with External Knowledge
Neurips 2024 - 02 Nov 2024 - 9 min read
An AI system that improves scientific review quality by integrating multi-agent, multi-modal analysis with external knowledge sources? Yes please.
MAMORX replicates key aspects of human review by integrating attention to text, figures, and citations, along with access to external knowledge sources. Compared to previous work, it takes advantage of large context windows to significantly reduce the number of agents and the processing time needed. The system relies on structured outputs and function calling to process figures, evaluate novelty, and build general and domain-specific knowledge bases from external scholarly search. To test our system, we conducted an arena-style competition between several baselines and human reviews on diverse papers from general machine learning and NLP fields, calculating an Elo rating for each model based on human preferences. MAMORX has an estimated win rate of 93% against human reviews and outperforms the next-best model, a multi-agent baseline system, losing only 12% of the time and never losing against other prominent models
CL: This is exciting. A way to measure something that has been missing from AI assessment thus far: novelty. This is a short paper and worth reading in full to get yourself excited about what is possible right now - and not some theoretical future time. It also highlights the shortcomings of human review, as they get the lowest scores in a number of tests as judged by other humans.
// 2
ChatGPT is transforming peer review — how can we use it responsibly?
Nature- 05 Nov 2024 - 4 min read
The tidal wave of LLM use in academic writing and peer review cannot be stopped. To navigate this transformation, journals and conference venues should establish clear guidelines and put in place systems to enforce them. At the very least, journals should ask reviewers to transparently disclose whether and how they use LLMs during the review process. We also need innovative, interactive peer-review platforms adapted to the age of AI that can automatically constrain the use of LLMs to a limited set of tasks.
CL: It’s fair to say I don’t agree with every aspect of this opinion piece, but this paragraph reminded me that publishers could be developing their own AI platforms for peer review to make the job easier, more interactive, and dare I say, fun? If they don’t, someone else will.
// 3
Machine Learning in Peer Review: Game Changer or Double-Edged Sword?
YouTube - 22 Sept 2024 - 14 min watch
Jeffrey Robens, Head of Community Engagement at Nature Portfolio, explores the transformative impact of machine learning on the peer review process in this interview with Maryam Sayab.
// 4
The great AI witch hunt: Reviewers’ perception and (Mis)conception of generative AI in research writing
Science Direct - 05 Nov 2024 -52 min read
Reviewers consistently struggled to distinguish between human and AI-augmented writing but their judgements remained consistent. They noted the loss of a “human touch” and subjective expressions in AI-augmented writing. Based on our findings, we advocate for reviewer guidelines that promote impartial evaluations of submissions, regardless of any personal biases towards GenAI. The quality of the research itself should remain a priority in reviews, regardless of any preconceived notions about the tools used to create it. We emphasize that researchers must maintain their authorship and control over the writing process, even when using GenAI's assistance.
// 5
Adventures with an inconsistent Gemini
LinkedIn - 16 Oct 2024 - 24 min read
I loved this ‘real-world’ examination of what playing with LLMs looks like for the average user (no offence Jay!). It’s humorous and frustrating in equal measure, but is definitely time well spent IMHO
LLMs work by choosing the most probably next word in a sequence, and then the next, and then the next. There is some randomness thrown in, so even the exact same prompt will not always get the same response, but in this case, prompts that were trying to achieve broadly the same outcome, but with different wording, often generated responses varying from mind-blowing to 'computer says no'.
But, you learn by trying. I'll keep throwing tasks at various LLMs until I work out what they are good at and where they need help.
And finally…
I’ve tried to stop going on X/twitter, especially this week, so there is little in the way of memes to brighten up your day. But that’s not necessarily bad as I have some ‘other’ links to share with you. Not all strictly related to peer review, but - like me - you may find something worthwhile in here:
Pearls and Pitfalls for LLMs 2.0 - Radiology - 29 Oct 2024
Be attentive - Tony Alves - 31 Oct 2024
How Islamic teachings approach the use of AI in healthcare - Wired - 07 Nov 2024
Let's do coffee!
I’m in London for the STM meeting on December 4th if you are too (± 1 day).
Curated by Chris Leonard.
If you want to get in touch with me, please simply reply to this email.