This week might go down in history, or at least has very likely broken some kind of record because on Tuesday really was something else. For whatever reason, this Tuesday every AI company decided to announce something, and big things, making it feel more like a month or year of progress all culminating in a single day.
In case you missed it, here's what happened, not this week, not this month, but on Tuesday of this week.
✨ Anthropic released a "Computer Use" API - a foundational breakthrough that allows Claude to operate a computer just like a humanWe've built an API that allows Claude to perceive and interact with computer interfaces.
— Anthropic (@AnthropicAI) October 22, 2024
This API enables Claude to translate prompts into computer commands. Developers can use it to automate repetitive tasks, conduct testing and QA, and perform open-ended research. pic.twitter.com/eK0UCGEozm
Introducing Mochi 1 preview. A new SOTA in open-source video generation. Apache 2.0.
— Genmo (@genmoai) October 22, 2024
magnet:?xt=urn:btih:441da1af7a16bcaa4f556964f8028d7113d21cbb&dn=weights&tr=udp://tracker.opentrackr.org:1337/announce pic.twitter.com/YzmLQ9g103
Pro Search is now more powerful. Introducing Reasoning Mode!
— Perplexity (@perplexity_ai) October 22, 2024
Challenge your own curiosity. Ask multi-layered questions. Perplexity will adapt.
Try it yourself (sample queries in thread)👇 pic.twitter.com/NHlxA34nLd
Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use.
— Anthropic (@AnthropicAI) October 22, 2024
Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text. pic.twitter.com/ZlywNPVIJP
Introducing, Act-One. A new way to generate expressive character performances inside Gen-3 Alpha using a single driving video and character image. No motion capture or rigging required.
— Runway (@runwayml) October 22, 2024
Learn more about Act-One below.
(1/7) pic.twitter.com/p1Q8lR8K7G
Today, we’re introducing Ideogram Canvas, an infinite creative board for organizing, generating, editing, and combining images.
— Ideogram (@ideogram_ai) October 22, 2024
Bring your face or brand visuals to Ideogram Canvas and use industry-leading Magic Fill and Extend to blend them with creative, AI-generated content. pic.twitter.com/m2yjulvmE2
I think this week will likely go down in history as a tipping point in the AI world. Yes, I know it sounds a bit dramatic but honestly, I think there are some pretty foundational things that happened this week that we'll look back at ten years from now and say, "that was one heck of a week!"
So let's just jam through it all, and as I try to do on here, quickly so you can spend 3-5 mins getting up-to-date and then back to your day. First, yes, OpenAI made history this week with the biggest fundraising round ever, a $6.6B raise on an $157B valuation.
For those not keeping score, OpenAI is losing about $5B a year so while this funding round is big, and record-breaking, they need it to do what they're doing. One interesting note on this round is that Apple, who previously was expected to participate, dropped out, and OpenAI also told their investors not to invest in Anthropic, Perplexity, X.ai and a few other AI companies they're going head-to-head with.
And while we're on the topic of OpenAI, they also announced Canvas - new AI coding tool. This announcement probably wasn't received super well by the team over at Cursor who have been the front-runners in the AI coding space and now have a new competitor that just raised $6.6B...but, so it goes.
So that's already a lot for one week, but like I said above, I think this week will go down in history as a tipping point, and Meta was able to jump on the train with their announcement of Movie Gen, their solution for creating and editing video with AI.
For anyone who knows me, you know I take wayyyy too many videos, and I never have time to edit them. I was just talking with a friend on a backpacking trip this weekend and he was asking me when I was ever going to have time to edit my videos. I told him, "I won't need time, AI will solve it for me sometime very soon."
Of course, I didn't know it would be this soon, but it's happened, and with Meta making a move like this, my guess is there's going to be a lot more to follow.
So yeah, that was this week in AI. I'll be in SF for tech week next week drinking from the AI firehose, can't wait to soak it all in. For now, thanks for reading and have a great weekend!
]]>This week OpenAI released AVM, Advanced Voice Mode, but you already know that. This was the top story about OpenAI until most of the leadership team quit the next day, then that became the focus of the news cycle. But I'm not here to comment on that because I don't think anyone who isn't at OpenAI, and doesn't know Sam and the team, can realistically comment, what the heck do we know?
So let's talk about something I do know about, because I've been playing with it every single day since it came out - AVM. OpenAI's advanced voice mode is as good as they said it was going to be, maybe better - in short, it's insanely useful. Since it came out people have been sharing all the different ways they've been using it from tuning a guitar to using it as your personal assistant.
After using AVM and then getting in my car and needing to use Siri with Apple Carplay I realized instantly, Siri doesn't just feel outdated, it feels prehistoric in comparison. In all seriousness, the gap between Siri and OpenAI's newest models is so wide that it makes me honestly wonder if it's now actually impossible for Apple to catch up.
And here's why I think this is important. Three days ago I wrote about a hardware company that Sam Altman and Jony Ive are starting. If this hardware company ends up competing, in any way with the iPhone, then Apple could be in trouble in a market that is absolutely critical to the company.
While I'm not saying this is happening, it could happen, and I think voice is a sleeping giant that has been slept on for far too long. We live in a world today where people use a keyboard as the primary input device into their computer. I firmly believe that this week marks the beginning of the end for the keyboard, a true tipping point for voice taking over as the primary input device for computers. And yes, an iPhone is just a computer in your pocket.
If the primary input mechanism for all computing devices, i.e. your laptop, tablet, and iPhone, then the company with the best version of that new input device has a massive edge. But with AI, this edge is amplified, massively, because the company in the lead sees their lead grow exponentially, so the companies playing catch up just fall further behind.
I'd encourage you to try an experiment. Take five simple questions, things that you know Siri can answer, and ask them to ChatGPT, then ask them to Siri. Don't try to stump Siri, pick real questions it can answer. Compare the results. This is going to be a big deal. AVM changes is a lot, and quickly.
]]>Today AI exists in a hardware agnostic space. You can run ChatGPT or Claude on your phone, on your tablet, on your desktop, dealers choice. And while companies like Apple have released new hardware like the iPhone 16 which is apparently optimized for AI, this is really more of a marketing gimmick than a reality. Sure, AI applications will run better on the new iPhone 16, but they will also run on the iPhone 15 and the Samsung Galaxy.
Back in April I read this article in The Information which got me thinking about how the hardware world might change over the next few years, and with this change, open the door for new entrants to the market.
A few days ago, Ives sat down with the New York Times and shared more about what him and Sam are up to sharing some nuggets like this one that I think shows the scale of what the duo is preparing to do:
In February, Mr. Ive found office space for the company. They spent $60 million on a 32,000-square-foot building called the Little Fox Theater that backs up to the LoveFrom courtyard. He has hired about 10 employees, including Tang Tan, who oversaw iPhone product development, and Evans Hankey, who succeeded Mr. Ive in leading design at Apple.While the company has ten employees now, they're looking to raise up to $1B so I think we're likely going to see a massive hiring blitz over the next year after that capital hits the bank account.
If you're on Twitter/X, there's a very good chance you know who Greg Isenberg and Riley Brown are. Just in case you don't, here's the quick TLDR on both. Greg is a serial entrepreneur who has started and sold quite a few companies, he now runs Late Checkout, a holding company that builds community-based businesses. I could probably write a whole post about all the cool stuff Greg has done over the last decade plus, but I'll leave it at that for now.
Riley Brown has quickly become one of the top minds in the AI world, not as a traditional software engineer using AI, but instead as a "software composer" - a term I'm pretty sure Riley coined himself. Riley is on a mission to become a Senior Software Composer without writing one line of code and pretty much every video he makes goes viral.
At the beginning of this month, Greg and Riley sat down for what I honestly think will be one of the most watched videos on the new AI stack at this moment in history. In the video (which is just a little over an hour) they build an app, from start to finish, using V0, Cursor, Claude, and Replit. I'm a big believer that the tools Greg and Riley use in this video represent an entirely new tech stack, modernized for the world of AI coding.
Here's a rundown of what they cover in this video from Greg's You Tube account:
1) Riley breaks down how to create a functional app using AI tools like V0, Cursor, and Replit - without writing a single line of code! 🛠️ • Key insight: You can build complex apps in 10-15 hours with practice, even without coding experience.• Using V0 for front-end design• Cursor for code generation• Replit for deployment Pro tip: Screenshot designs you like and describe them to AI - instant working prototype!
4) Major hurdle: Connecting AI features to the app. It's tough, but persistence pays off! • Riley's advice: "Once you get the aha moment... you realize you're in charge. You don't need to ask anyone."1. Visualize the app2. Describe it to AI3. Troubleshoot errors4. Repeat until it works "You will your way to a working app. It's guaranteed." - Riley
• Extracts ideas from transcripts• Categorizes as "SIP" or "SPIT"• Saves ideas to user profiles
8) Riley's closing thoughts: Composing code with AI is about persistence and creativity.• They help pinpoint issues.• Riley's hack: If you're not getting errors, ask AI to "add error logs" to your code.• Makes troubleshooting WAY easier! 7) The power of AI coding tools:• Recreate complex apps (like Notion) in hours• Customize existing apps with new features• Rapid prototyping and iteration "To me, it's a no-brainer." - Riley on the cost-effectiveness of AI tools vs. hiring designers
• "I love the act of learning things and diving deep into rabbit holes."
While we still aren't even a week into OpenAI's release of o1, devs around the world are already pumping out some pretty interesting stuff. As happens every time someone releases a new model, people release code that you might not use, but that they built quickly, like building Flappy Bird in o1 in less than five minutes that I shared in yesterday's post.
Of course examples like this show how quickly you can code using AI, but they don't necessarily show you how you can use AI to solve real problems that you, as a software engineer, might be trying to solve right now.
So lately I've been on a quest to identify people who are really going deeper building, and sharing things that we can all use to do things better, faster, and/or more efficiently. What I saw today on Twitter/X from Eric Ciarla is one of the best examples I've seen yet of something super useful, built in o1, and shared with the world.
Like the title of the post says, Eric built a web crawler in o1. He did this using FireCrawl (firecrawl.dev) a pretty handy tool for turning websites into LLM-ready data. To use Eric's crawler all you need to do is state an objective and it will navigate + return the requested data in a JSON schema. Pretty neat isn't it?
You can check out Eric's o1 crawler by hopping over to this tweet - https://x.com/ericciarla/status/1835775368407461904.
Thanks for building and sharing Eric, you rock!
]]>I've now had a few days to play around with OpenAI's latest models, o1-preview and o-1 mini and like most people, I'm pretty blown away by what a major step forward this is. A couple of weeks ago I thought that Claude 3.5 Sonnet was quickly becoming the go-to model for software engineers, but I can tell you, o-1 mini has instantly taken its place.
First, for everyone who might still be playing catch up, here's the TLDR in a few bullet points:
Over the last few days I've played around with both of these models, put o1-mini to the test writing some Python and Node.js code, and used o1-preview to do some European travel planning. Both are absolutely far and beyond the best LLMs out there today and while I feel a bit bad saying this, I haven't used Claude 3.5 Sonnet since they were released.
Cursor also pretty much immediately announced support for both models so you can now use them there just as you would Claude 3.5 Sonnet which has been by go-to for coding until now.
If you haven't read OpenAI's official Read Me about the new models, don't sit here listening to me, they do it much better, you can read it all here - Introducing OpenAI o1-preview.
Lately, one of my favorite things to do when I wake up in the morning is hop on Twitter/X and see what people are building with AI. Given what a massive update o1-preview is, people have been building like crazy and releasing some pretty wild stuff. Below are three that I think are worth checking out:
1. Mckay's o1 AI Playground - https://github.com/mckaywrigley/o1-ai-playground
2. Full weather iPhone app in under ten minutes - https://x.com/ammaar/status/1834348042637521031
3. Flappy Bird built in 3-4 minutes - https://x.com/slow_developer/status/1834614755153350809
There's still a lot more to uncover since it hasn't even been a week since OpenAI released the o1 preview. Right now my goal is to start to cost out some of the stuff I want to building using o1-mini through the API and make sure the volume I'm planning on sending it isn't going to cost me a small fortune. Measure twice, cut once right? Except in 1's and 0's.
]]>Hello world, this is the start of Linton.ai, my new blog about AI. I have no course to sell you, no monthly subscription, and no set schedule. Come and go as you please, I'll be here.
]]>