RFC111 AI/LLM tool policy: significant revision, limiting drastically their use by rouault · Pull Request #14500 · OSGeo/gdal

rouault · 2026-05-06T16:27:21Z

No description provided.

lnicola

(placeholder commend because GitHub threw a tantrum)

lnicola · 2026-05-06T18:08:56Z

Argh, I can't comment. Regarding the "not copyrightable" part:

I'm not convinced this is true. I suspect it refers to a common misinterpretation of Thaler v. Perlmutter. Code developed by an LLM with substantial input from a human is likely still copyrightable. Thaler specifically tried to disclaim any contribution to the work discussed there.

lnicola · 2026-05-06T18:10:52Z

-Warning
-------
+Commit messages and pull request messages must be fully written by the author,
+besides potential translation to English and typo/grammar fixing. This is


We'll see how this develops, but I suspect it will encourage contributors to post LLM-(re)written walls of text under the pretense of fixing the grammar.

lnicola · 2026-05-06T18:13:08Z

Typo: "The content must be written by a human. Use of AI/LL tool for translation"

lnicola · 2026-05-06T18:15:37Z

with the general principle that there must be a human in the loop

I think this is a bit misleading as part of the summary. If I prompt an LLM to make a focused change, review the code, drop half of it because it's useless, I'll argue that I'm very much in the loop. But this is still disallowed under the new policy.

A different approach I've seen some projects take is to require contributors to understand (or be able to explain) their changes. No opinion on how useful this is in practice.

rouault · 2026-05-06T18:20:13Z

I think this is a bit misleading as part of the summary. If I prompt an LLM to make a focused change, review the code, drop half of it because it's useless, I'll argue that I'm very much in the loop. But this is still disallowed under the new policy.

Please propose a best wording

rouault · 2026-05-06T18:21:08Z

Typo: "The content must be written by a human. Use of AI/LL tool for translation"

fixed

lnicola · 2026-05-06T18:24:31Z

Please propose a best wording

Maybe

courts (in particular the US ones) have not definitely determined whether LLM outputs are derived works of the training data, or whether LLM-written code can even be copyrighted by a human

… their use Co-authored-by: Laurențiu Nicola <lnicola@dend.ro>

rouault · 2026-05-06T18:29:46Z

courts (in particular the US ones) have not definitely determined whether LLM outputs are derived works of the training data, or whether LLM-written code can even be copyrighted by a human

thanks, adopted

…st be the (primary) author

rouault · 2026-05-07T01:32:13Z

changed "there must be a human in the loop" to "the human must be the (primary) author"

elpaso

I just left a few minor remarks but I fully agree with the proposal.

Co-authored-by: Alessandro Pasotti <elpaso@itopen.it>

ldesousa · 2026-05-07T09:08:17Z

If you allow me to intrude into this discussion. The European Directive on the copyright of computer programmes
limits protection to the "author’s own intellectual creation". It also states clearly that only a "person or group of people" can hold copyright.

Beyond copyright, if you believe the CRA will ever be enforced, then you should assume distribution of a programme whose source is not understood by a legal entity responsible for its distribution or manufacturing will no longer be legal.

lnicola · 2026-05-07T09:18:43Z

It also states clearly that only a "person or group of people" can hold copyright.

That matches Thaler v. Perlmutter. The human holds copyright, not the LLM. This has been a very popular strawman among anti-AI activists lately.

The European Directive on the copyright of computer programmes
limits protection to the "author’s own intellectual creation".

If I implement a merge sort, it's not meaningfully my intellectual creation, but nobody will argue that I've stolen the work of von Neumann.

But if I ask an LLM to implement an external merge sort for my large table, you'll say that I have no intellectual contribution, and it's not copyrightable.

ldesousa · 2026-05-07T09:39:16Z

That matches Thaler v. Perlmutter.

That is going on in the US, I guess they don't care much about European directives over there. Also note that in the US there is the figure of "Public Domain" which does not exist in the EU. When eventually something of the like reaches a European court, the decision will between assigning copyright to the legal entity responsible for the LLM and that owning copyright over the training data.

Beyond that, you are conflating copyright of a computer programme with the copyright of an algorithm.

gdt · 2026-05-07T13:31:32Z

That is going on in the US, I guess they don't care much about European directives over there. Also note that in the US there is the figure of "Public Domain" which does not exist in the EU. When eventually something of the like reaches a European court, the decision will between assigning copyright to the legal entity responsible for the LLM and that owning copyright over the training data.

Projecting out the observation that different legal jurisdictions have different rules, I interpret your comment as agreeing that the situation is unclear now and that there is no basis for confident predictions about how it will be resolved, if ever. Thus I would expect you would approve of the suggested text from @lnicola

lnicola · 2026-05-08T13:20:30Z

the decision will between assigning copyright to the legal entity responsible for the LLM and that owning copyright over the training data

Or to the LLM user. There's been some conflicting statements from the EU, but extrapolating from what I've seen, the people working in those institutions tend heavy users of US LLMs. I doubt they'll legislate that LLMs infringe on the copyright of others.

jedbrown · 2026-05-08T14:30:11Z

I doubt they'll legislate that LLMs infringe on the copyright of others.

A German court ruled last fall that OpenAI infringed copyright from training data (and OpenAI tried to throw users under the bus for prompting, which the court rejected). Note that LLMs can emit entire books verbatim, thousand-word passages via commercial models [demo], and with organic prompting. The question is not whether LLMs are capable of infringing copyright of the training data, but who will be liable for that infringement and what due diligence would be necessary to mitigate the risk to levels that a project can tolerate (and I guess, whether the project adopts the smol bean hypothesis that the project and its users are not worth suing).

lnicola · 2026-05-08T14:47:24Z

Note that LLMs can emit entire books verbatim

As long as you put part of the book in the input prompt.

jedbrown · 2026-05-08T15:35:09Z

As long as you put part of the book in the input prompt.

A few words (in that study), but see also the organic prompting study, the German case, and others such as typing //sparse matrix transpose<TAB> and getting a page of near-verbatim code, which is still working its way through the courts. There is no simple procedure to ensure that output is non-infringing.

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

Co-authored-by: Kyle Barron <kylebarron2@gmail.com>

jerstlouis · 2026-05-15T08:43:27Z

-behavior to play nicely with AI will involve natural selection over many generations,
-be prepared for not getting a very warm welcome if you misuse those tools.
-You have been warned!
+    * Submission of `vibe-coded <https://en.wikipedia.org/wiki/Vibe_coding>`__ contributions is *banned*.


It seems to me that an explicit definition of what consitutes "vibe coding" should be included directly in this policy rather than as a reference to Wikipedia.

There are probably also several degrees of "vibe coding".
A developer might start a work item as "vibe coding" and then spend several hours or days refining the code (with various degrees of AI assistance) to the point that the problems inherent to "vibe coding" may no longer be problems?

I am really concerned that a blanket ban on vibe-coding for open-source projects are problematic, as closed-source solutions would likely allow whatever achieves best development velocity (still taking maintenance cost into consideration).

It seems to me that an explicit definition of what consitutes "vibe coding" should be included directly in this policy rather than as a reference to Wikipedia.

We may argue about the exact definition, but fundamentally using LLMs is an anti-social / anti-ethical behavior in the context of open source software were maintainers are still human. That make them even more the contention point that before. As raised by @dbaston https://lists.osgeo.org/pipermail/gdal-dev/2026-May/061636.html we review code from others in the hope that a small percentage of them will grow as maintainers and will take a bit of that burden. Reviewing LLM generated code is just helping tech giants improve their model and doesn't grow any new maintainer.

as closed-source solutions would likely allow whatever achieves best development velocity (still taking maintenance cost into consideration).

By that metrics, closed-source solutions have always been "better" than us because they can align more developers. Let them do their thing and we'll talk again about the end result they've achieved in 5 years

Why is velocity so important ? Are there so many missing features in GDAL that they need to be rushed ?
Code added is more a long-term liability than an asset.

If velocity of development is so important then people can contribute to Oxigdal, and it is written in Rust. Or create their forked GDAL with agents reviewing & automatically merging code.

By the way, it is great to see so many people reviewing this PR. I'd wish they'd do the same for the more "boring" ones 😜

fundamentally using LLMs is an anti-social / anti-ethical behavior in the context of open source software where maintainers are still human

I strongly disagree with that statement (and indicated as much in the OSGeo charter member survey about LLM usage), but I fully understand where you're coming from as a key maintainer with an unbelievably large number of things to review.

Reviewing LLM generated code is just helping tech giants improve their model and doesn't grow any new maintainer.

While this is true, this should also improve the quality of future PRs.

Isn't maintainers productivity also improved by LLMs?

Why is velocity so important ?

Long term if a closed-source solution becomes much more capable than an open-source alternative, the recent open-source adoption gain trend might reverse. It's also about being able to do more with one's time and being able to make more contributions (which can still be of a certain quality).

I'd wish they'd do the same for the more "boring" ones

I think part of the solution to get more maintainers may actually be through LLM usage for performing human-in-the-loop reviews.

I am not against rejecting contributions where there's clearly no human in the loop, but I think "vibe coding" can actually be a very efficient way get started on implementing new features in particular (even if careful self-review / refinements should be essential).

Going back to the original question of definitions, I would argue that by the time we come up with a good definition for vibe coding, it will already be obsolete. We should also appreciate how quickly this landscape is evolving and treat any policies we come up with as living documents.

Adopting those new toys under the pressure of a feeling of urgency would be a terrible idea. That's exactly the narrative that tech giants want to instigate in our minds. My personal opinion regarding e.g. adopting a new programming language has been "let's way 15 years and see if it is still there" (Rust is almost at that point :-))

GDAL has always been a very conservative project in terms of tech adoption (I have to fight each time I want to bump the C++ version!). I guess if it is still there after 28 years, this must be part of the reason of its success.

I see. In that case I suggest dropping the term "vibe coding" altogether and focus directly on LLM assist. The wording used by Oracle is a good starting point, making it clear which activities are allowed and those which are not.

has been a terrible experience and showed me that it would ultimately lead to burn out if becoming the norm.

I really believe the solution for this should not be banning LLM assistance.

A possible solution is to have parallel tracks of pull requests for different level of LLM assistance.

Perhaps indicating the amount of human time invested by the contributor(s) in preparing a PR.

Each maintainer could decide which of these tracks they invest how much efforts in.

Each maintainer could decide which of these tracks they invest how much efforts in.

I haven't heard about any existing GDAL maintainer who was keen in reviewing LLM generated PRs. As a proof of that, notice the 3 ones flagged as such at the bottom of https://github.com/OSGeo/gdal/pulls that have been sitting there for many weeks. It is disingenuous to say to potential contributors that they may use LLM assisted coding if there are no maintainers willing to review such PR.

GDAL is already a much too big beast compared to the size of its maintainer community. We don't need more code coming from people interested in drive-by contributions.

There was an initial version of this policy, at it was confirmed that is was not enough. If I remember correctly it was "softened" after some discussions. Now we have to harden it.

It has been explained many times: the reviewing and maintenance resources are very, very limited. The experience is that throwing LLM generated code saturates those resource. So we have to cut this. Immediately. Otherwise the project CANNOT CONTINUE.

GDAL is not in a hurry to implement anything. It is mature, stable and well known (and good quality). And these qualities are valued out there, by open and proprietary code.

For sure in some time we will review this policy. Maybe to make it even harder, or softer. Let's see. But now we have to act to protect the project and its maintainers.

https://dl.acm.org/doi/full/10.1145/3807518 might be interesting to this discussion. They offer their own definition, and explain why, based on current literature, vibe coding in large, long-lived projects can be problematic.

alexgleith · 2026-05-16T12:53:06Z

+Additionally, legal systems across the world (including US and EU) have not
+definitely determined whether LLM outputs are derived works of training data or
+if LLM-written code can even be copyrighted by a human. This is despite it
+being latently extracted and originated from open source software in the first


Is "latently" the right word here? Possibly should be "largely"?

rouault · 2026-05-18T12:04:18Z

Adopted with +1 from PSC members KurtS, NormanB, MikeS, JavierJS, DanB, HowardB, JukkaR, DanielM and EvenR.

… their use (#14500)

rouault marked this pull request as draft May 6, 2026 16:27

melissawm mentioned this pull request May 6, 2026

Add GDAL policy when revision is done melissawm/open-source-ai-contribution-policies#68

Open

rouault force-pushed the rfc111_revision branch from 997094f to f79090c Compare May 6, 2026 17:41

lnicola reviewed May 6, 2026

View reviewed changes

rouault force-pushed the rfc111_revision branch from f79090c to e487fb0 Compare May 6, 2026 18:20

RFC111 AI/LLM tool policy: significant revision, limiting drastically…

79bb580

… their use Co-authored-by: Laurențiu Nicola <lnicola@dend.ro>

rouault force-pushed the rfc111_revision branch from e487fb0 to 79bb580 Compare May 6, 2026 18:29

gdt reviewed May 7, 2026

View reviewed changes

Comment thread doc/source/community/ai_tool_policy.rst Outdated

ai_tool_policy.rst: there must be a human in the loop -> the human mu…

83bc436

…st be the (primary) author

kmuehlbauer mentioned this pull request May 7, 2026

DOC: AI Usage Policy (draft for discussion) openradar/xradar#363

Draft

3 tasks

elpaso reviewed May 7, 2026

View reviewed changes

Comment thread doc/source/community/ai_tool_policy.rst Outdated

Comment thread doc/source/community/ai_tool_policy.rst Outdated

Comment thread doc/source/community/ai_tool_policy.rst Outdated

Update doc/source/community/ai_tool_policy.rst

35b491e

Co-authored-by: Alessandro Pasotti <elpaso@itopen.it>

ai_tool_policy.rst: precise use of AI tools for issue reporting

e834903

choldgraf mentioned this pull request May 9, 2026

AI-assisted code policy: "Literature Review" jupyter/governance#326

Open

adamjstewart reviewed May 12, 2026

View reviewed changes

Comment thread doc/source/community/ai_tool_policy.rst Outdated

rouault and others added 2 commits May 12, 2026 16:03

Update doc/source/community/ai_tool_policy.rst

4634425

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

hobu takes a swing

a0d34fe

rouault commented May 12, 2026

View reviewed changes

Comment thread doc/source/community/ai_tool_policy.rst Outdated

Re-introduce the legal aspect, and the ban for vibe-coded contributions

d5ef0bc

rouault marked this pull request as ready for review May 13, 2026 01:48

hobu added 2 commits May 13, 2026 08:34

a few language tweaks and more text about GSP

b5be1c4

small language edits

0d4f988

rouault commented May 13, 2026

View reviewed changes

Comment thread doc/source/community/ai_tool_policy.rst

Apply suggestion from @rouault

7114cca

adamjstewart reviewed May 13, 2026

View reviewed changes

Comment thread doc/source/community/ai_tool_policy.rst Outdated

adamjstewart reviewed May 13, 2026

View reviewed changes

Comment thread doc/source/community/ai_tool_policy.rst Outdated

adamjstewart reviewed May 13, 2026

View reviewed changes

Comment thread doc/source/community/ai_tool_policy.rst Outdated

rouault and others added 2 commits May 13, 2026 19:03

Apply suggestions from code review

5780ed3

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

Update doc/source/community/ai_tool_policy.rst

67eb787

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

kylebarron reviewed May 14, 2026

View reviewed changes

Comment thread doc/source/community/ai_tool_policy.rst Outdated

kylebarron reviewed May 14, 2026

View reviewed changes

Comment thread doc/source/community/ai_tool_policy.rst Outdated

Update doc/source/community/ai_tool_policy.rst

9be3095

Co-authored-by: Kyle Barron <kylebarron2@gmail.com>

rouault commented May 14, 2026

View reviewed changes

Comment thread doc/source/community/ai_tool_policy.rst Outdated

Apply suggestion from @rouault

e52c8b5

jerstlouis reviewed May 15, 2026

View reviewed changes

rouault mentioned this pull request May 15, 2026

Optimize Lanczos resampling with Byte source density #14586

Closed

alexgleith reviewed May 16, 2026

View reviewed changes

RFC111: add voting histoy

81b0d69

rouault merged commit 314bb15 into OSGeo:master May 18, 2026
2 checks passed

rouault added a commit that referenced this pull request May 18, 2026

RFC111 AI/LLM tool policy: significant revision, limiting drastically…

33399bf

… their use (#14500)

Uh oh!

Conversation

rouault commented May 6, 2026

Uh oh!

lnicola left a comment

Choose a reason for hiding this comment

Uh oh!

lnicola commented May 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lnicola commented May 6, 2026

Uh oh!

lnicola commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rouault commented May 6, 2026

Uh oh!

rouault commented May 6, 2026

Uh oh!

lnicola commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rouault commented May 6, 2026

Uh oh!

Uh oh!

rouault commented May 7, 2026

Uh oh!

elpaso left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldesousa commented May 7, 2026

Uh oh!

lnicola commented May 7, 2026

Uh oh!

ldesousa commented May 7, 2026

Uh oh!

gdt commented May 7, 2026

Uh oh!

lnicola commented May 8, 2026

Uh oh!

jedbrown commented May 8, 2026

Uh oh!

lnicola commented May 8, 2026

Uh oh!

jedbrown commented May 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerstlouis May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rouault May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

lnicola commented May 6, 2026 •

edited

Loading

lnicola commented May 6, 2026 •

edited

Loading

jerstlouis May 15, 2026 •

edited

Loading

rouault May 15, 2026 •

edited

Loading