Generative and Agentic AI

The technology, the applications, and the potential

Dec 23, 2022

Interest in large language models (GPT, Dalle, StableDiffusion) is sweeping the Bay Area's startup scene like a tidal wave. Recent advances have spurred lots of new interest from investors and builders. If you haven't played with ChatGPT yet, stop reading right now, click this link, and try asking it the most difficult question you can think of. (Ask it to compose a really thoughtful comment on this blog post.)

Right now, the industry is segmented into a few layers. The base layer is made up of companies like OpenAI, the largest player, Midjourney, and Stability AI, which have built the best models and are providing access to those models through hosted APIs, essentially providing super advanced machine learning as a service. These companies are largely made up of teams of machine learning specialists who are deeply knowledgable about model architecture, passionate about the potential to build a post-scarcity world, and have strong opinions about AI risk.

The next layer is made up of builders and innovators exploring how these technologies can be used for content generation. Companies like Jasper and CopyAI have created great writing products out of these models, which can produce first drafts of emails, essays, blogposts, and social media posts. Companies like Lensa AI and DreamBooth have been doing the same for images, giving people the ability do generate amazing portraits, art, and image content with a simple prompt. These companies have all been built on the first layer's model APIs, leading some to question whether they'll be vulnerable to being gobbled up and picked off as the API players dig into vertical-specific applications. ChatGPT is an example of this, potentially throwing a wrench in the gears of companies that were hoping to build chat as a core product.

The next layer is the one I'm most excited about, and it's about creating "agentic AI", i.e. giving AI-driven systems the capacity to act in the world. To make this more concrete, take a look at the exchange below, in which I ask ChatGPT to write me a scraper for a basic webpage. Although this is a toy example, this is real code, and if you're able to ground ChatGPT by showing it the actual website you want to scrape, it will adapt its code generation to match. Once this is done, all that's left is to give it the means to execute that code.

Let's look at another example, which embodies the use-case that at least two dozen teams are working on right now. I'd really like to be better about reaching out to people who's work I enjoy, especially to express appreciation for really thoughtful essays. This kind of outreach seems like a generally nice thing to do, and also might lead to interesting outcomes such as conversations and even collaboration with people I admire.

I’ve recently worked with John on a few projects, and he wrote a great post on action-driven AI, which was very much an inspiration for this one. Can ChatGPT help me?

What happens when this agent is sitting in my browser and has access to my email client?

Ends and Means

Discussion about this post