• Scalene
  • Posts
  • Scalene 25: NeurIPS / Oath / WithdrarXiv

Scalene 25: NeurIPS / Oath / WithdrarXiv

Humans | AI | Peer review. The triangle is changing.

This is the final Scalene of 2024. I’ll be spending my Sundays in holiday mode for the next few weeks and hope to be back and have something sitting in your inbox on 5th January. Honestly, that feels like a long time away at the moment with the promising Gemini 2 and ChatGPT Pro just coming out, but they’ll just have to wait. We’re going to round out 2024 by looking at a bumper crop of stories for you to digest with your mulled wine.

15 December 2024

// 1
Results of the NeurIPS 2024 Experiment on the Usefulness of LLMs as an Author Checklist Assistant for Scientific Papers
NeurIPS Blog - 11 Dec 2024 - 6 min read

It’s great to get some quantitative and qualitative insight on the use of LLMs in assessing academic research from a cohort of people who know what they’re talking about. Enter the chairs of the Neural Information Processing Systems conference, who ran an experiment with their conference where authors could opt in to an automated evaluation of their submission to see if it complied with submission standards - a fairly low-risk experiment.

What was eye-opening for me, was how enthusiastic the submitters were and how they were excited to see what feedback they got - and then the subsequent reality check of it not living up to their expectations:

Responses to survey questions before and after using checklist verification (n=63 unique responses.)

The reasons for their unhappiness largely stemmed from the LLM being inaccurate, or too strict. There is also a fascinating discussion into whether and how this automated system could be gamed (it can, and therefore cannot replace human judgement right now).

// 2
Near future academic publishing – a speculative social science fiction experiment
Learning, Media and Technology- 08 Dec 2024 - 8 min read

Rather than simply look forward to 2025 in this editorial, the authors extrapolate current shifts in academic publishing to 2030 and 2035. As with all these thought experiments, while I agree with some of the predictions, I think they will all happen much sooner than the authors expect. There is a doomsday scenario predicted for the near future, but a positively Utopian longer-term vision outlined for 2035 (see 2030 and 2035 quotes below).

Journal editors have had to adapt to working with AI. Many editors have become ‘humans in the loop’ ensuring – within the 7-day turnaround now normalized across the industry – that peer reviews are acceptable. The ‘AI-reviewer 2’ controversy remains a cautionary case of automation in peer review: when an automated peer review unleashed a discriminatory attack on an author who had left a personally identifying citation in their manuscript, it led to widespread outrage about the weak ethical and regulatory frameworks governing AI in peer review and publishing.

The publishers’ new public-facing activities have shown the value of (critical) research, including blue skies research with no measurable immediate impact, and have thus strongly contributed to reducing the rift created in the 2020s by some segments of society between universities and the public. The publishing houses agree that the co-design processes they initiated with editors, authors and other stakeholders helped them to shift their priorities, and to find innovative and creative solutions that truly benefit the academic communities that have long found a home for debate and field-building through their journals.

// 3
An Oath of Research Integrity?
AJ Boston - 06 Dec 2024 - 3 min read

I love Arthur’s writing for its style, and there is always something of substance to make it doubly enjoyable. Here he articulates the solution I can see having the most pragmatic likelihood of success in an AI-mediated scholarly communication landscape.

Make the authors (and I would add, reviewers) take an oath, like a doctor’s Hippocratic oath, where they publicly declare they will adhere to the best accepted standards in research.

A stronger and publicy-visible declaration of responsibility for your own output may be the only sensible way to absorb AI assistance in research. Sure, use LLMs for your peer review, but you sign your name on it and be prepared to stand behind any recommendations therein.

// 4
“Does it feel like a scientific paper?”: A qualitative analysis of preprint servers’ moderation and quality assurance processes
MetaArXiv - 18 Sept 2024 -16 min read

This is an analysis of the ‘acceptance’ criteria for various preprint servers as told by the moderators for those services. As you may know I believe preprints + automated reviews will become an important channel of communication in the near future, so I’m intrigued to see what factors are considered important for a ‘desk reject’ at the moment. The tables at the end of the document are particularly rich in value for prospective authors.

In conjunction with another piece of work published recently [Mehmani & Malički’s Structured peer review: implementation and checklist development] there are a wealth of granular prompting suggestions someone could use to come up with a very useful tool for authors.

// 5
WithdrarXiv (+SciFy)
arXiv - 10 Nov 2024 - 17 min read

I love this, a dataset of 14,000 manuscripts withdrawn from arXiv, annotated with their reasons for withdrawal. The authors describe the development of a taxonomy of retraction reasons, identifying 10 categories, and a simple yet accurate automatic classification of the same.

Distribution of reasons for paper withdrawals on arXiv.

But the paper becomes more interesting when discussing WithdrarXiv-SciFy

We want to highlight WithdrarXiv-SciFy, an enriched subset of WithdrarXiv that includes scripts for parsed full text PDFs specifically designed to facilitate scientific feasibility studies. The creation of this dataset was motivated by a deep dive into the largest (greater than 40% of our dataset) category of withdrawal reasons — “Factual/methodological/other critical errors in manuscript” — corresponding to 6,018 pre-prints. We clustered comments in this category to understand major themes, and we discovered eight themes:

I won’t steal their thunder by listing them here, you can click on the link below to read them in section 7. But needless to say it represents an opportunity to enable some automated verification of scientific claims and mathematical theorem proving, as well as discrepancies between data and figures/text.

And finally…

Lots and lots of ‘other’ stuff here. here. Why not click on them all to open in a new tab and read them when time permits over the holidays?

And I couldn’t let the festive period pass without linking to this. Merry Christmas to you all.

Let's do coffee!
I’m at home for the rest of the year. If you’re in Devon at some point, I’d love to meet up (and ask why you’re here) - otherwise it looks like R2R in February is my next official outing.

Free consultation calls
Many of you may know I work for Cactus Communications in my day job, and one of my responsibilities there is to help publishers speed up their peer review processes. Usually this is in the form of 100% human peer review, delivered in 7 days. However, we are keen to experiment further with subtle AI assistance. If you want to chat about how to bring review times down with either a 100% human service, or you’re interested in experimenting with how AI can assist, let’s talk: https://calendly.com/chrisle1972/chris-leonard-cactus

Curated by me, Chris Leonard.
If you want to get in touch, please simply reply to this email.