Microsoft Copilot: Show Me The MoneyCode

No, I’m not interested in developing a powerful brain. All I’m after is just a mediocre brain, something like the President of the American Telephone and Telegraph Company. – Alan Turing¹

Posted by Lelanthran

2023-01-31

Bit of an argy-bargy all over the internet; Microsoft grabbed all of the open source (and some closed source too, I presume) code from Github, ran it as the training data for a machine-learning process, and then sold the results to anyone who wants to have Copilot automatically insert code for them.

There’s two main groups, both trying to win worthless internet points, arguing over this, and the two arguments can be summed up as:

It is okay to use open source code to train an AI that writes code.
It is not okay to use open source code to train an AI that writes code.

Okay, I’m simplifying a bit here - there’s more nuance to each argument than I presented but I believe I have the general thrust of it correct.

Muddying the waters, there’s a lawsuit in progress claiming that copilot is violating all of the licenses of the code used to train it.

The argument “But it’s only LEARNING from the code, not redistributing” is always countered with evidence that copilot output is sometimes literally copying the input

I don’t think, at this point, that anyone is going to claim with a straight face that copilot doesn’t occasionally copy copyrighted code verbatim. The defence from Microsoft, Github (owned completely by Microsoft) and OpenAI (whose largest share block is 100% controlled by Microsoft) is that the claim “fails on two intrinsic defects: lack of injury and lack of an otherwise viable claim”.

The important point is that Microsoft, Microsoft (via Github) and Microsoft (via OpenAI, as the controlling interest in OpenAI) don’t believe that copilot is emitting any of the code it was trained on.

It “learned” how to code, but it isn’t simply spitting out snippets of code it was “trained” on.

Great! Can Microsoft, Microsoft (via Github) and Microsoft (via OpenAI) please tell us, the court and the authors of all that code why did Microsoft specifically did not use their own code to train the AI?

Why did Microsoft refuse to allow their AI to look at their own code? They have millions of lines of code, in Windows, in Office, across Azure…

Surely if copilot isn’t simply copying much of its input verbatim, it should be safe to let it “learn” on the source code for Microsofts cash cows.

Why isn’t the AI learning from the source code for Visual Studio? Or Windows? Or Office? Or any of the millions of lines of code owned by Microsoft?

is it possible that Microsoft, as a collective of programmers separate from the AI trainers, actually know that copilot will happily give away the Microsoft Crown Jewels if prompted to?

So, sure, the AI team (and the lawyers) are claiming that all that copilot does is learn; it doesn’t copy. But the rest of Microsoft doesn’t believe that for one bit.

If Microsoft themselves don’t believe that copilot is emitting its input verbatim, why should the courts believe it? Why should any of us.

Postscript

You read this argument here first - the date is on this blog and no doubt there’s an archive somewhere too. Please link this whenever you are fighting for worthless internet points and want to convince the audience of whichever forum you are in that their support for copilot is misguided at best.

Also, can someone please send a link to whoever is in charge of managing the Creative Commons Licenses (and all the other open-source licenses too); instead of an expensive lawsuit, this could have been more easily solved by adding an extra restriction to all the popular open source licenses, restricting the licensee from using the material as a machine learning input.

Make it opt-in, not (as it is currently) opt-out. If you want your code to be used as input for training AI, then go ahead and remove that clause from the license.

Posted by Lelanthran

2023-01-31

en.wikiquote.org ↩︎