Last Month in AI: LLM Privacy, Piracy, Personalization & Open Source

As the sci-fi writer Isaac Asimov once said, "today's science fiction is tomorrow's science fact," and with the ongoing AI arms race, this quote rings truer than ever. Over the last few weeks, we've seen pirated and leaked large language models being tweaked to create personal AIs, Boston Dynamics robodogs being hooked up with ChatGPT, and groundbreaking developments in generative AI amidst a backdrop of growing concerns and regulatory efforts. Let's take a closer look at these developments and their implications for the future of AI.

Software Engineering

As expected, we're seeing continuous improvements within the Enterprise AI software engineering landscape.

Amazon announced new tools for building generative AI applications with the launch of Amazon Bedrock and Amazon Titan models, making it easier than ever for developers to access and integrate powerful Foundation Models from AI21 Labs, Anthropic, Stability AI, and Amazon.

Google also claims that its AI supercomputer is faster and more environmentally friendly than Nvidia's A100 chip, highlighting the importance of addressing the environmental impact of running these massive models.

Enterprise AI

Google's Bard AI faced criticism for underperforming compared to GPT. The company's CEO acknowledged the shortcomings, comparing Bard to a "souped-up Civic … in a race with more powerful cars" and committed to upgrading it with more capable PaLM models.

Elon Musk reportedly bought thousands of GPUs for a Twitter AI project and announced plans to launch a ChatGPT competitor,

Research startup Anthropic aims to raise as much as $5 billion over the next two years to compete with OpenAI and enter over a dozen major industries, signaling an ambitious plan to establish a strong market presence.

Bloomberg introduced BloombergGPT, a large-scale generative AI model specifically trained on financial data, which outperforms similarly-sized open models on financial NLP tasks without sacrificing performance on general LLM benchmarks.

Meta launched the Segment Anything project to democratize image segmentation, releasing a new model and the largest-ever segmentation dataset, SA-1B, fostering research and broad applications for foundation models in computer vision. Meta also plans to use generative AI for ad creation, with its CTO, Andrew Bosworth, expecting commercialization of the tech this year.

Alibaba and Huawei are set to debut generative AI chatbots to cater to local demand in China following the release of OpenAI's ChatGPT.

These developments highlight the global enthusiasm for AI advancements and the need for localized solutions geographically and within different industries.

LLM's & Privacy

In light of the recent privacy concerns and ethical dilemmas, some of which were highlighted in our March roundup, companies and governments are becoming increasingly vigilant about AI's implications on privacy, especially following Italy's ban on ChatGPT.

At the end of March, Italy's data-protection regulator banned ChatGPT in the country, citing privacy concerns and a lack of age verification giving Open AI 20 days to respond. Meanwhile, other countries and regulatory bodies were considering doing the same. With the company on the back foot, OpenAI swiftly implemented data controls allowing users to opt out of their chats being used to train models and stop conversations from being saved. It also added functionality to export chats, a pop-up to satisfy the regulator's age verifications where users can click a button stating "I meet OpenAI's age requirements," and clearer links to their privacy policy when signing up to the platform. These swiftly implemented features have been enough to get them back up and running in Italy.

OpenAI's also introduced a Bug Bounty Program to address potential vulnerabilities and improve the safety of their technology, no doubt a response to the March data breach, which allowed some users to see others' chat history and payment details during a nine-hour window. In addition to all of this, they posted a blog entitled "Our Approach to AI Safety."

For now, OpenAI seems to be out of hot water, but there will likely be more regulatory challenges ahead that will affect OpenAI and other AI companies.

AI Regulation

The United Kingdom, European Union, United States, and China have made strides in AI regulation, with each region taking distinct approaches. The UK government released a white paper outlining a principles-based, adaptive approach to AI regulation. This contrasts with the more prescriptive EU AI Act and aims to ensure the UK remains an innovation-friendly jurisdiction for AI developers. However, the UK's approach may leave gaps in regulation, unlike the more holistic approach of the EU. Conversely, the EU has been urged to expand its AI Act to regulate general-purpose AI, as the legislation is currently going through the legislative process in the European Parliament.

China has announced measures to monitor generative AI products closely, requiring companies planning to launch such products to undergo security assessments by the Cyberspace Administration of China (CAC).

In the United States, Senate Majority Leader Chuck Schumer proposed a framework for AI regulation, which has not yet been drafted into legislation. His proposal would require companies to allow independent experts to review and test AI technologies before public release or updates and grant users access to findings. Schumer cited China's recent release of its own AI regulations as a wake-up call, emphasizing the need for the US to lead in shaping the rules governing AI.

Canada is considering the Artificial Intelligence and Data Act (AIDA) as part of Bill C-27. Following the release of a companion policy to AIDA that sets a two-year strategy for regulation development, critics argue that AIDA gives too much power to the government, such as fines hefty fines of up to $25m or 5% of gross global revenue, especially with the specifics being left to unpublished regulations. Despite this, the bill is closer to becoming law as legislators voted to send it to the Standing Committee on April 24th.

Overall, AI regulation is becoming a global focus, with different countries taking varied approaches to balance innovation, safety, and ethical concerns. This is great for consumers of Enterprise AI, but many of the regulations above don't consider individuals. With AI becoming more democratized by the day, with leaked and open source AI available to anyone, there's another world of complexity to consider.

Piracy, Personalisation & Open Source

Remember when 'piracy' meant Limewire, a slow dial-up connection, and the delicate art of burning DVDs? Today, we've cast off into uncharted territories, with AI systems as the new bounty.

On February 24th, Meta launched an open source LLM with no instruction or conversation tuning called LLaMA. The LLaMA model was intended to only be available to researchers. It was leaked within a week and is now downloadable via torrents, despite Meta's attempts to quell distribution with DCMA requests. Since then, we've seen waves of innovation and experimentation from individuals and research institutions alike.

On March 13th, Stanford released Alpaca, which added instruction tuning to LLaMA. This allowed anyone to fine-tune the model on a beefy laptop, effectively kicking off a race to develop low-budget fine-tuning projects. By March 18th, LLaMa could be run effectively on a MacBook CPU. 10 days later, on the 28th, a LLaMa adapter was developed with a technique called Parameter Efficient Fine Tuning (PEFT), which supercharged personalization/tuning even further. The LLaMA-Adapter, using PEFT, introduced a mere 1.2M learnable parameters, turning the model into a quick learner that could adapt in less than an hour. This breakthrough not only expedited the fine-tuning process but also democratized it, making high-level AI technology accessible, efficient, and user-friendly even on everyday devices. Also, by this point, open source GPT-3 models were already outperforming existing GPT-3 clones, proving that the community was no longer dependent on LLaMA.

This rapid growth in open source and individual LLMs has pressured both Google and OpenAI to reconsider their stance on model control. As the leaked internal Google document published by SemiAnalysis suggests, Google should establish itself as a leader in the open source community, taking the lead by cooperating with, rather than ignoring, the broader conversation. By embracing open source, LLM companies can continue to drive innovation while adapting to the changing landscape of AI development. If not, it seems likely they will be left behind.

In Conclusion

With continually blurring lines between science fiction and fact, we're seeing the open source community drive innovation at an unprecedented pace, pushing established companies to rethink their strategies and embrace a more collaborative approach. At the same time, Enterprise AI products are still being released in droves while governments across the globe continue to grapple with regulation. This will only heighten as open source and pirated AI continue democratizing the technology.

As open source veterans, having worked on groundbreaking projects (such as PyTorch, Microsoft PowerToys, and React Native), we're incredibly excited to see and contribute to the growth of Open Source AI. We're also cognizant of the need to remain cautious and ensure that AI is used safely and responsibly (which is why we're proud to be incubating a first-of-its-kind, AI Safety start-up, led by our very own open source tech guru Zak Greant, but more on that later). The path ahead may be uncertain, but one thing is clear: the realm of AI is full of potential, and the coming months and years promise to be just as eventful as the last.