Who is Suchir Balaji, ex-OpenAI researcher who worked on ChatGPT and exposed AI’s ‘dark side’?

26-year-old Suchir Balaji was found dead in his San Francisco apartment on November 26. He was a researcher at OpenAI for four years before quitting the the Microsoft-backed startup over differences of opinion on fair use and copyright policy. 

His sudden death has shocked the industry as the news comes three months after Balaji spoke to The New York Times, in a lengthy interview, against OpenAI’s unabashed use of copyrighted data and how it was illegal and why it could ultimately harm the Internet. 

Balaji revealed that he had quit in August because he couldn’t stand behind OpenAI’s actions. He shared that he intended to pursue his own ambitions in the AI sector. 

During his time at OpenAI, Balaji had also worked on the company’s industry-defining product, ChatGPT, for 1.5 years. 

After graduating from Berkeley, Balaji interned at OpenAI and data labelling startup Scale AI. In 2020, he eventually joined OpenAI full-time. A couple of years later, he started collecting data to pre-train new flagship AI models at OpenAI, including GPT-4. As long as his research was confined for internal use, Balaji was fine with the use of digital data to train AI models. 

But his view changed after OpenAI released ChatGPT generally in November 2022. The AI chatbot struck a chord with users and became a smash hit, pushing it to pursue a profit-making route. 

Finally, after it surfaced that OpenAI had scraped through data indiscriminately, prominent news publishers including The New York Times itself filed copyright lawsuits. 

Then, in October, Balaji reached out to the outlet to talk about his findings. 

In an tweet, Balaji said he initially “didn’t know much about copyright, fair use, etc. but became curious after seeing all the lawsuits filed against GenAI companies. When I tried to understand the issue better, I eventually came to the conclusion that fair use seems like a pretty implausible defense for a lot of generative AI products, for the basic reason that they can create substitutes that compete with the data they’re trained on.” 

Separately, he posted a blog saying companies like OpenAI and Microsoft argue that they could train their AI models using freely available data from the Internet because they meet the “fair use” requirement in the U.S. Copyright Act. 

According to his blog, even though these AI models did not directly copy the training data, they would most likely include regurgitated information that is mixed throughout the output. These large language models had to be trained on each data point multiple times repetitively, and by the end they usually tend to memorise the text verbatim. 

He also explained that in this manner the AI chatbot was simply pulling traffic from other communities, like Stack Overflow, a public Q & A website for computer programmers, to itself. Essentially, this would lead to duplication of data online. 

Finally, he also added that this wasn’t a problem exclusive to ChatGPT or OpenAI, but an issue facing any other generative AI product. 

The day before San Francisco police found Balaji’s body in his apartment, a court filing had named the researcher in a copyright lawsuit against OpenAI. As a good faith compromise, OpenAI agreed to search Balaji’s custodial file for documents related to the concerns he had raised.

During the time Balaji’s interview came out, OpenAI was going through another phase of turmoil. Investors had just poured in $6.6 billion worth of funds and the company was facing an exodus of talent. High level executives, including the chief technology officer Mira Murati, chief research officer Bob McCrew and vice president of research, Barret Zoph had all quit. 

A report by Wired shared that the departures were due to the company’s change in direction from research to product which was a contentious shift in mindset for many employees. 

OpenAI has steadily been moving away from its origins as a non-profit. In 2019, it was converted into a “capped profit” in which the nonprofit was made the governing entity for a for-profit subsidiary. Then, on September 26, it was reported that OpenAI was in process of converting into a fully for-profit corporation.

Published - December 14, 2024 03:29 pm IST