Malus and the "Death of Opensource"

This is not zig-specific and is probably too off-topic to pass muster; I invite mods to strike it if so.

But I’m personally motivated, as I just got a little personal zig project buttoned up enough to codeberg it, and, for the first time ever, I’m thinking twice about the best way to proceed.

I love opensource, and want to share freely… but only with humans, really. I’m not ignorant of AI training on everything it can find, but the “news” (thus my dubious choice of this topic for the post) of Malus, which I realize now is not so “new”, and some recent reflections on it (like this), about automating clean-rooming… well, it’s got me in a bit of a funk, on principle.

Many developers, and certainly many zig developers, seem to share my leery attitude about AI. This is really the only forum I participate in (that probably sounds crazy… but I don’t even have ANY social media accounts, which probably sounds crazier)… anyway, there’s so much deserved respect here in Ziggit, and there seems to be enough conceptual crossover, so I’ll try…

I’m not interested in starting a fire. But I’m interested in knowing if there are thoughts on how to share code with developers (publicly, openly) without sharing code with non-people. It honestly seems impossible, but I’d love to be schooled.

(Here’s the “Death of Opensource” presentation post from 4 weeks ago that inspired the recent interest, but note that it has a lot of fluff in the middle, and over-loud music throughout.)

Um… just to anticipate, a little, and bound the conversation, maybe: one of the onerous provocateurs is the thought of somebody (or maybe some machine) making quick money by steeling an idea and marketing it as their own. If I’m truly drunk on the opensource kool-aid, then I perhaps I shouldn’t care if a million people “steal” something I worked on (after all, I probably stole a million little ideas to make my thing, right?)… but to clean-room away any attribution at all, and make a bunch of money… well, it’s certainly not likely to happen with what I’ve written, but the principle…. There’s the onerous rub.

If there really is an answer, is it pencil-and-paper and scrolls in caves, to which only people are invited? Well, what about OCR? :slight_smile:

Or, perhaps your advice will be: just trust Dylan and Mike – “you really don’t want to be on the wire when that massive CVE hits the fan and YOU are on the hotseat to get it fixed yesterday”.

One thing I take comfort in: code can be beautiful, like art. Zig code can be pretty beautiful. I wonder if I would think a clean-room-repro implementation could look as beautiful. I guess I would be a little embarrassed if it looked more beautiful… I think I’d be a little less embarrassed if it was more performant, but some people would flip those. Anyway, I could always take pride in whatever bit of ingenuity I brought to bear, first, even if it amounted to nothing in the material world, and even if attribution records never revealed my name.

16 Likes

The tool doesn’t read the source code, only the docs, so how could this possibly work? Is there an assumption that it will actually read the source code, even though they claim it doesn’t?

Without porting the source code faithfully, the result will not be compatible. Writing two implementations from anything less than a formal, rigorous specification, that comes with a very thorough test suite, will not result in compatibility.

And then there is the issue of testing. Why would businesses trust this to produce working code?

1 Like

The spirit of free open source software is to share our creations, and to build off others to create the best thing for all to use (which is subjective, hence so much similar software).

The spirit of for-profit companies is to “win” the market competition, “win” in the sense of being the only one remaining so they can jack prices into extortion.

Those cannot coexist, you cannot make maximum profit with FOSS, you cannot “win”, in the perverted for-profit sense, if your competitors have access to your improvements.

No, it is not in the spirit of FOSS to sacrifice our work to the for-profit evils because they are antagonistic to us.

They are already trusting LLMs to do far more than they should, it doesn’t matter if it’s not compatible, it doesn’t matter if it has CVEs.

What matters is it’s cheaper than paying people to make an alternative, or buying an existing one, so they can increase their short term profits to show off to investors.

It doesn’t matter if it comes crashing down on them as long as they made a buck from it.

5 Likes

If there really are companies using this seriously, it is probably more like: they give the generated version to their devs and make them fix it and support it.

1 Like

Yes, they still save money on the bulk of work, if not they would drop it quickly.

1 Like

Yes. It’s such a crappy thing overall that it doesn’t worry me.

If you don’t want your code vacuumed up, you shouldn’t release it. Period. The reality is that even proprietary code being hosted is being mined to train AI systems. It shouldn’t be that way, but it is. If you don’t want your code mined, it needs to be private and self-controlled. At some point, there will be a reckoning about licenses and copyright, but it won’t happen anytime soon. Until then, AI companies can and will vacuum up with impunity anything they can get their hands on.

It’s not “Open Source KoolAid™” to believe that the megacorps are breaking a social contribution contract by vacuuming up stuff that people put work into and then making lots of money from it and not giving anything back. This was the whole impetus behind the AGPL license trying to close the “hosting loophole”. AI is simply that loophole but supercharged.

Consequently, your choices are only “Don’t release code” vs “Released code licenses will be completely ignored and the code used for whatever, whenever, however and you have no recourse.” Yes, this sucks. But, for now, that is reality.

If you can’t accept that, then your only option is to not release the code.

8 Likes

I might be wrong, but the spirit of clean-rooming code should certainly allow one bot to inspect the code, every line and letter of it, in order to produce a “spec”. Then another bot simply needs to implement that spec (details left to the bot, as long as it implements the spec). And voila. You’re within the law because two separate entities were involved with an allegedly clean break between them, and so only the “ideas” (which can’t be copyrighted) were … um, borrowed. (Which might be fine with the original author, with proper attribution, but that’s another conversation.)

It sounds like there’s a bit of a loop - “implementer” tries to implement, but stuff doesn’t quite work, so, “spec-writer” is asked to whittle a little more, and improve the “spec”, so that “implementer” can accomplish the task acceptably well. Dylan and Mike even referred to “a little bit of blurring of the lines about what could and couldn’t be shared” between the two as they went back and forth between IBM’s BIOS and what would become Phoenix’s.

I think one of their examples was Apple 2 - I think it took them “a whole hour” to clean-room Apple 2 (as opposed to 5 minutes for most lib/app projects)… and that some such “back and forth” was needed. I’d be curious about an analysis of such a bot-clean-roomed product against original code wrt/ code legibility, performance, corner-case bug-proneness, etc.

2 Likes

Yeah, that’s my understanding also. What I said earlier is based on this from their home page: “Our proprietary AI systems have never seen the original source code.” I understand what you’re saying about the two bots, sounds like they’re fudging it.

1 Like

Ah! I missed that. Yeah. Fishy.

There is a technique of reproducing an executable function by function: it gets the release binary, get a function, and produce high level code until the output matches the binary’s version. Rince repeat, so the input is a legally obtained binary executable, the output is in C or w/e, and the AI system never saw the source code.

3 Likes

So, pardon my fixation, but, to take a step further, certainly some FOSS developers (of much more import than me) are going to begin thinking carefully about github and codeberg being scraped, when it comes to “sharing” all their best work in the future. Perhaps they’ll hope that bot clean-rooming will become illegal (that’ll be fun to enforce and litigate!), or perhaps they’ll just hope for a miracle… but perhaps they won’t share. I can imagine it. Then what? Ug, I sound so apocalyptic.

I’m imagining something ridiculous, like an AI-excluded OS, net, and ecosystem. A forum might not be too hard to “close”, allowing newcomers in only by screening. A VCS would have a harder time with a “human” effort like that - it would probably never be economically attainable. Zig is a perfect language, since AI can’t get its mind wrapped around zig in its flux. Anyway, sounds like the stuff of a movie. I wonder if some new trend, called something like “AI firewalling”, might be a possible “thing”…? But it’s senseless… how could such a thing meaningfully co-exist in our world? It wouldn’t co-exist; it would really have to exist separately altogether, if that was even possible.

Well, I’m guessing I’ll wake up in a better mood in the morning.

1 Like

Remember that this works both ways. Corpos can slopfork GPL code, but nothing stops one clanker from decompiling, reverse engineering, and writing a spec and tests from closed source binaries. This isn’t the death of FOSS, it’s the dead of software as a product, something most companies were already moving away from.

Corporations have always profited of the unpaid labor of open source and rarely gave something back. And how many of them are in violation of the GPL without us knowing. Corporations being shitty is nothing new. You won’t be able to change the system, but you can build one parallel to it. If anything, Nothing has changed.

4 Likes

I don’t have a direct answer to OP’s question, but my recommendation would be to focus on making software you can love. Make software that you think maximally expresses your ability to create delightful experiences for your users, and leave all the AI stuff to edgy twitter users.

It’s surely reasonable to want to reason about AI both when it comes to trying to understand what is happening / will happen to our job market, and also more in general to keep up with the news / scientific progress. But when it comes to passion projects, where you’re not trying to build a business, but just create software you can love, then all this stuff doesn’t matter much anymore.

AI or no AI, you can’t go wrong when striving to create software you can love is your north star.

24 Likes

Long question it is. And the only answer should be: “I don’t worry about anything”. Like in Groundhog Day.
Philosophically speaking: as humanity we create problems and then try to solve these.
Why make things complicated?

2 Likes

Yes, but slopforking and reverse engineering both felt, to me, like “dark, shady” operations. I think the slap in the face of Malus is the broad daylight nature of it. Indeed, for me, I have nothing to lose, financially, by posting some hobby code. It is remotely possible that somebody decides it’s worth tons of money, but I’m honestly not very troubled about that, either… so I have nothing “real” to complain about! It’s the principle of broad-daylight theft, with no shame, and plenty of justification-story plus “joking (not-really)” that is jaw-dropping.

This really is the anecdote. Love the anticipation, love the process, love the product - at least as much for how it shines as for how it vrooms. Indeed, this is one motivation for using zig - plenty of times, the same “salable product” could be made in many languages, but there’s something more attractive about the zig version: anticipated, crafted, and completed. So, to hell with the daylight-theft world, I know what fun I had doing it, and nobody can stop me from using it and enjoying it myself! (And there will always be people who recognize originality, or even look for it, if attribution or reputation don’t make original creators/contributors obvious.)

(I realize that neglects those for whom financial gain is a necessary ingredient; everybody has to support their families. But in your case, I guess, there’s always the challenge of competition, legit and nefarious, and you always have the challenge of protecting your secrets and crafting your business as much as your software; you’re not in my camp of deciding whether to push to codeberg, and you’re perfectly within your right to keep private. You can still take Loris’ advice and make the software you love and love the software you make.)

2 Likes

Malus is just satire though, I think. Did you read the website? Specifically the customer reviews or the blog? To me, it reads like the person behind that website wants to point out how Open Source devs are being exploited and the licenses are mostly being viewed as obstacles to for-profit companies.

1 Like

I can’t disagree… I didn’t analyze with my tongue-in-cheek-meter carefully enough. Their experience of bot-clean-rooming many things themselves (in their talk) sounded grounded. Indeed, if I’ve raised my eyebrows a little too high, then… I might just be a few months ahead of myself. :slight_smile: As others have noted, in other ways, I’m many months behind… the transition seems to be from dark-market to daylight-market for practice of this sort. (I hope I don’t regret the temptation to note: it coincides well with recent remarks like both Altman and Zuckerman(?) have made, akin to, “thanks for all your hard trailblazing work, humans; it’s nice that we can just auto-theft it now, and take it from here, so we don’t need to rely on beating hearts any more.”)

2 Likes

This particular site is satire, but I suspect the problem it describes will only get more real over time.

(Doh, meant to add more but hit the wrong key and accidentally posted before I was finished.)

Anyway, yes, I also suspect that even after the bubble bursts one of the real leftover effects we’ll see is that disassemblers gain a level. “Create a program that generates this machine code” feels like the sort of problem it would be able to iterate on. I don’t even know that it would legally violate a license!

3 Likes

Kinda. The site definitely has satirical qualities, but they actually have the product to do it and charge actual money for their service. Stops feeling like it’s a joke when you are charging people for a service and starts feeling more like a duck blind.

4 Likes