Fractionally Yours v6: Bitcoin's All Time High, and the Legal Lowdown of Using UGC to Train LLMs
In this edition for "Fractionally Yours" I'm looking at the current state of blockchain regulation in the US and the legality of using User Generated Content to train AI LLMs.
Between AI and the corporate governance challenges that pop up in our current era of ego-driven tech (accidentally) both eating itself and the world, there's been no shortage of fascinating legal developments to write about over the past few weeks. And now, seemingly out of the blue, an old friend is back: Crypto!
In this edition, I’m going to review the current US regulatory landscape for blockchain, especially around securities compliance. Because the AI world is moving fast, and basically making law in real time, I’ll cover that too. Read on!
How Not to Train Your AI on Other People’s Copyrighted Content
It's slowly dawning on the world that Artificial Intelligence is only as good as the dataset it's trained on . Most datasets are messy, and the makers of AI models are working hard to figure out how to most effectively get those models to understand the data and spit it back in a way that makes sense. That produces weird results. For example, since humans are biased, training data is biased, and therefore the AI being trained on it is biased. Understanding this, the folks building AI models overcompensate for that bias, and we get Google’s AI model providing us with non-white 1940s German soldiers. Oops! I don’t think it's such a big deal that we’re getting politically correct, if not historically correct, images, and over time, we will figure this out.
As I've pointed out in the past, the most interesting issue from a legal perspective is that the tech platform is the publisher here and can’t escape that since those images of gender-and-ethnically-diverse Nazi soldiers are created by Google. No DMCA section 230 protection for you! That is why tech companies are trying hard to make sure their publicly available AI products are as benign (boring) as possible.
And that leads us to a super interesting legal question: Because AI companies are very likely “publishers,” are their LLMs allowed to be trained on, and then spit back, content that enjoys copyright protection? That’s essentially the issue in the New York Times v. OpenAI case and its imitators being filed at a pretty decent clip by everyone who's invested time and money creating content. That content is now being used to train the LLM, which, they contend, is essentially violating their copyrights. From a legal standpoint, I think that the New York Times and the content creators have the better argument. From a public policy standpoint, it's sort of a toss-up: making sure that professional newsgathering is a viable profession going forward vs. corporate control of creative output. Argue amongst yourselves.
This raises another intriguing legal issue: what about the huge amount of data on the internet where no one is actually concerned about copyright because it's user-generated content? It is a super interesting issue because the big social media platforms, and the smaller question and answer sites where you might find a great recipe or how to fix your faucet, actually go to great lengths NOT to own the content. For example, Reddit’s terms of service very clearly state “You retain any ownership rights you have in Your Content.” The TOS for the legal question and answer site Justia, which allows lawyers to submit answers to legal questions, states “Justia does not claim ownership of Content you submit or make available for inclusion on the Service.” Almost every platform of that type is going to disclaim ownership of the content that populates the site. I’m not a technical person, but my understanding is that large social networks have sophisticated anti-scraping protections. However, not everyone does, and they are certainly not foolproof. You can be sure that someone can get around them (as the New York Times claims that OpenAI was able to do, by the way).
So, does a platform have a copyright claim if someone scrapes its user-generated content? Under current law, no. The Computer Fraud and Abuse Act (CFAA), a 1986 law predating the actual consumer internet that, among other things, provides civil remedies (damages) for hacking into a computer network, has been interpreted to prevent only unauthorized access. So, just taking publicly available information that is not behind a paywall – the goal of scraping – is not covered. That's a reasonable interpretation, I think.
That means there is no federal law prohibiting scraping, and when a website is scraped, the owners of the site are left with common law breach of contract claims because every website’s terms of service has some type of prohibition on scraping. But that common law remedy is limited because the damage to the website is hard to quantify. Really, what are your damages if someone amplifies what you have already made publicly available? Copyright law could protect the website owner but (1) there is pretty good law that the anti-scraping provisions of a TOS are actually preempted by copyright law and therefore invalid, and (2) the website does not actually have the copyright on the user-generated content.
That's not to say that the actual owners of the user-generated content don’t have a copyright claim, assuming that the New York Times (and other content creators) win their AI lawsuits and are able to assert their copyright claims. If so, the creator of the content (but not the website) can send a “DMCA Takedown” notice to the LLM makers, and their content won’t be used to train the LLM. But it's just a literal drop in the bucket and really won’t matter much.
So yes, go ahead and train your LLM on user generated content, this newsletter too! Substack won’t have a claim. I might. Fun times.
Crypto is Back, Baby! (Sort of)
If you’ve been on social media, or just generally talked to any of your friends who were super into crypto and had been quiet about their passion for it for the 18 months (“Crypto Winter”), you will know that Bitcoin recently hit an all-time high of almost $70,000 per coin. And then dropped again – it's a highly volatile asset, after all.
I’m not invested in Bitcoin in any serious way, having bought and sold way before the current highs (never take my investing advice, BTW) but with the “halving” happening in April and the fact that the Web3 industry seems to have cleared out some of its worst players (so long SBF and CZ!) I think Bitcoin is going to stay at the elevated levels for a while, which, in turn, is going to bring a renewed interest in the underlying blockchain technology that powers Bitcoin and tons of other tokens, Dapps, chains, NFTs, and all the ancillary tech around blockchains and Web3. I like to think that a rising Bitcoin lifts all blockchains.
I spent about 18 months deep in the blockchain rabbit hole, first as just someone fascinated with the technology and the governance of the so-called “distributed ledger” and then as a full-blown in-house lawyer at a crypto project. I saw the boom and then the bust firsthand. Now I have a bunch of clients doing awesome stuff in the Web3 space, and it's nice to see that the world seems to be paying attention again.
My lived experience in blockchain has taught me that the tech is powerful, but it was hyped beyond what it could actually do, which attracted a bunch of scammers who never really appreciated the open-source decentralized ethos of blockchain and used those same principles to fight regulation for as long as possible to make gobs of money, mostly on the back of unsophisticated investors. I’m actually happy regulators in the US are cleaning that up because it's not creating any lasting value, just a bunch of get-rich-quick schemes. Luckily, that era has passed, and the world seems to understand the power of the blockchain, and its limitations. In our new Crypto spring, it's a good time to think about where things are going.
To review where we left things the last time the world was paying attention to crypto (say late 2022): Bitcoin is fine. It's actually “sufficiently decentralized” that US regulators seem to think that it's a commodity and not a security and even, reluctantly, approved Exchange Traded Funds that consist of contracts to buy Bitcoin. I wrote about the Bitcoin ETF in the third edition of this newsletter. So Bitcoin is here to stay.
The status of the thousands of other tokens out there is not as clear, except to say that under US law, you can’t issue a token as a means of raising money for your crypto project because that would be a security, and unless you register it as such (basically going through the whole initial public offering process) or get it exempted from registration. If you’re just raising money from accredited investors to fund your project (so-called private placement to accredited investors under Regulation D for US and Regulation S if it's all foreign investors) and don’t envision any secondary sales, then seeking an exemption is a viable option.
Unfortunately, most investors are going to see value in the resale in the secondary decentralized finance (“DeFi”) market. If there are resales, the issuer is going to have liability for that, and there are tons of ongoing compliance issues to address in those secondary sales, so it's really not a great option. If you’re thinking of creating a token that trades, your token needs to stay completely offshore. This “offshore issuer/onshore development team” model is how many token-issuing crypto projects are structured and has been a real boon to the luxury resorts and golf courses of the Caymans and British Virgin Islands!
Another avenue is the US crowdfunding option, where you're saying you have a security but it's exempt from most SEC registration requirements, a so-called “Regulation A” exemption. Seeking a Regulation A exemption is intriguing because it allows for resale (provided certain disclosure requirements are met). In reality, despite the millions of dollars that have been spent, the SEC is not “qualifying” any Regulation A offerings for crypto projects. Also, assuming that you can get through the Regulation A exemption process, because you’ve now admitted that you’re issuing a security, your token needs to trade on a registered securities exchange, and the options for that are limited since most crypto exchanges operate on the principle that they are NOT securities exchanges so they won’t be excited to list your token that you just admitted is a security, albeit exempt from registration requirements. A tangled Web3.
That leaves tokens that are not really related to a company and are not raising money for anything, the so-called “meme coins” (the most famous of which is Dogecoin). The jury is still out on that – certainly, true meme coins that are pure gags that no one would mistake for an investment contract are probably fine, and even if you add some utility via a (real) Decentralized Autonomous Organization (“DAO”), provided that the US-based development team is not pumping it, could be completely exempt. But obviously, the risk of enforcement exists for the founding team, not to mention the risks to retail investors who are investing in it because, well, it does not have any actual value.
Where do we go from here? One thing I think is certain is that crypto is never going to be generally accepted by US regulators. That is because of the unique place of the US dollar as the world’s reserve currency. My logic goes like this: The people who built the original blockchains after the 2009 financial crisis had good faith skepticism toward centralization and regulation and, of course, the avowed belief that the world needed a different reserve currency - one that was not subject to being debased by foolish government actions. Naturally, the folks who run that reserve currency, and get the vast majority of their power and influence from it, namely the United States government, were not so keen on that. So we’re never seeing the US government really accepting crypto and accelerating its development, as we see with space, chip making, and any number of other industries (although oddly not housing, which should be the subject of a separate post). The best we can hope for in crypto is regulatory clarity and I think we’re going to get that, slowly.
For now, stay tuned and let me know if you’re interested in learning more. I’ll be sure to accommodate by writing more on this fascinating subject!
Keep building, keep thinking,
Jesse
I'm Jesse Strauss, Your Fractional General Counsel. I'm a lawyer with a private practice based in New York City, assisting clients both in the United States and globally with their U.S. legal needs. My expertise covers various areas, including raising funding rounds, addressing employment issues, negotiating master service agreements, managing intellectual property, ensuring compliance, overseeing legal process management, and facilitating dispute resolution. My focus is on founding and nurturing great companies from seed to exit. Discover more at www.yourfractionalgc.com and book a complimentary 30-minute consultation at https://www.yourfractionalgc.com/contact-yourfractionalgc.