
All episodes
Real-Time Feedback: AI Game Changers and Flops: Emerging Insights for Transformative Product Growth
Hosts:
Brian Balfour & Fareed Mosavat
Topics:
AI Game Changers and Flops, AI Reflections, AI Product Strategy
Listen on
This week, we held a “Real-Time Feedback Session” where we focused on AI while we’re eagerly awaiting the start of Season 2.
Check out our key takeaways below, or feel free to watch the conversation on Youtube.
There's a disconnect between user expectations of AI and its capabilities.
Retention rates for new products are dropping compared to last year. Why?
As our friend Nabeel Hyatt says, there are like 2 million people that will try everything new in AI. They are the dream beta users. The early adopters. But, they are quick to flee.
Is AI really ready for the rest of the 7 billion out in the world?
And, if it isn’t what is the right strategy to capitalize on today and prepare for tomorrow?
Before we get there, let’s talk about where AI stands today, and where it’s heading.
🪐 How well do we understand what LLM’s are capable of? 🪐
We actually don’t fully understand how they work and what they are capable of. Jeff Bezos has a good analogy for where we are on LLM’s today. LLM is a discovery, not an invention. The telescope was an invention. The moons of Saturn were discovered using the telescope. But we are still figuring out a ton of what exists on the moons (LLMs), how they work, and what they are capable of.
🤔 Can LLM’s “think?” 🤔
Sort of. Currently, LLMs rely on system 1 thinking, which means they deliver quick responses based on instinct.
Example: If I ask you what 2+2 is, you don't need to do the math; you simply know the answer. It's like an instinct.
However, LLMs struggle with system 2 thinking, which requires a conscious effort to think through multiple steps to arrive at an answer.
Example: If I ask you what 17 X 24 X 102 is, you don’t have that answer at the tip of your tongue. You are going to work it out in multiple steps.
Due to this limitation, LLMs' accuracy tends to decrease as tasks become more complex. That's why you sometimes receive answers that seem superficial when dealing with intricate questions. It’s not reasoning to get to the nuance.
But, this is where research around reasoning techniques is evolving in order to increase accuracy around tasks that require complex reasoning.
🪜The main challenge for autonomous AI lies in compounding error rates. 🪜
When there are multiple steps to achieve a goal, AI agents break it down into several sub-steps. Let's say there are 12 steps. Even if each step has a high accuracy/success rate of 95% (which is quite impressive), resulting in a 5% error rate, the final accuracy/success rate is only 54%. That's why techniques like improved reasoning, memory, knowledge bases, and reinforcement data are vital for raising accuracy rates.
🎩 Users desire "System 2," but AI is currently limited to "System 1.” 🎩
The bottom line - Mass market users won’t understand this distinction and will expect AI to perform System 2 thinking, leaving most users disappointed and churning.
🤖 What is being done to improve accuracy? 🤖
Roughly speaking, there are two vectors of how people are trying to improve LLM’s:
Capabilities - What type of tasks a LLM is able to do, like write code, solve a math equation, and grab something from the internet.
Accuracy - The accuracy of the output the LLM gives you based on the input, especially as the complexity of the input increases. Accuracy is increasing through things like giving it additional knowledge, better reasoning techniques, longer context, longer memory, etc.
There are roughly four areas that are trying to be improved on to get gains on capabilities and accuracy. They are:
Knowledge - How to add proprietary/specialty knowledge databases to the LLM.
Reasoning - Giving the LLM a better way to “think” through a task, especially as the complexity of the input/task increases.
Memory - Memory is essentially about how you get the LLM to remember previous interactions so that it can use them to influence and improve future interactions.
Tools - Giving the LLM access to various tools to interact with the outside world and execute things it’s not capable of on its own.
When developing AI products, it's crucial to grasp these four conceptual areas. Depending on the product type, the specific area that requires attention and focus may vary.
Beyond the capabilities of LLMs, we’re also seeing a product development shortfall:
🌐 AI Should Explore A New World, Rather Than Replicate The Old World 🌐
Too many products try to replicate human tasks. We should be asking, "How might we do things differently because of this technology?" rather than just copying and pasting existing processes into an AI format.
For example, remember when digital textbooks were just scanned versions of their physical counterparts, complete with page-turning animations? It was a direct replication of the offline experience. But as technology evolved, we moved from digital textbooks to online courses - a completely different, more interactive, and engaging learning experience.
🚀 Phase One, Two, Three Framework... 🚀
We suggest a three-phase framework to think about the AI products we’re seeing in the market today.
Phase 1 is replicating the old thing using the new thing.
Phase 2 is creating a slightly more native experience, but still similar to the old one.
Phase 3 is when we start to see the real magic - AI helping us do things that were previously impossible.
We're largely still in Phase 1, headed toward Phase 2. What we really want is to get to Phase 3.
Given what we know about the above, here’s our thoughts on the products we’re playing with in the market today, but first a commercial break:
AI is undoubtedly the new frontier, and here at Reforge, we've been making some exciting moves:
We've launched multiple AI courses, including Design and Build Conversational AI starting February 19 from Polly Allen & Rupa Chaturvedi.
We've integrated AI case studies into our flagship courses, Artifacts, Guides, and more.
We're even incorporating AI into our own product experience (with more to come soon).
And we’re back…
↔️ Horizontal vs ↕️ Vertical is emerging again in AI just like it did in SaaS with a bunch of landmines in the middle.
Vertical is more interesting, from our perspective. Here is why…
↔️ Horizontal Use Case Products = Incremental Gains * Lots of Tasks ↔️
Think Microsoft Copilot, Chat GPT, and other products that offer incremental gains with each engagement. These products are all about low input friction and easy output utilization. They attach to existing programs or behaviors, making them super user-friendly.
For instance, GitHub Copilot has transformed the coding experience by making autocomplete smarter. You're already writing code, and now hitting the tab button does something more magical. It's like having a coding buddy who's always ready to help. 🧙♂️
But they have the same hurdles as horizontal SaaS products. They can be used for so many different things which is a pro and a con. The cons manifest in hard-to-activate users and build a habit.
For these companies, it feels like a race on accuracy and price, and the barrier to entry is pretty high at this point. It might even be over-explored.
↕️ Vertical AI Products = 10X Gain * A Specific Task ↕️
Products like Fluint and Evenup are game-changers in their respective fields. Fluint helps account executives generate business cases for enterprise sales, while EvenUp assists personal injury lawyers in generating their demand letters.
These products are not about daily use but about solving super painful, time-consuming tasks that are crucial to the outcome of a sale or a case. They're about saving you tens of hours of work and making you more successful. Both of these provide 10X experiences on a very specific task. 💼
There are some common themes between these products:
Moderate Frequency - An injury lawyer, for instance, has to prepare a demand package for every injury case, while an enterprise sales rep must create a business case for most buying processes.
Tangible Connection To $$$ - In the case of Evenup, it's the settlement amount of the case, whereas for Fluint, it's the value of the enterprise deal.
High-Value Asset - The value associated with both companies’ outputs are typically high, often reaching five to six figures.
High Friction To Creation - The pre-AI manual creation process for both involves high levels of friction and is time-consuming to execute effectively.
Narrow and Specific - Companies sometimes make the mistake of thinking they are vertical, without focusing on a specific task or subset of the market. It's easy to fall into a trap where you're not actually providing the value you think you are. That's why we appreciate Fluint and Evenup—they started off very narrow and seem to have nailed it.
Data Set To Draw From - Both EvenUp and Fluint have access (or building) to data sets that they can draw from to do something that wasn't possible before. EvenUp leverages millions of public records on medical documents and case files, enabling them to tap into an entire knowledge base that would be challenging or impossible for a human to replicate. On the other hand, Fluint utilizes various data sources, including internal business cases and information gathered from sales calls, to customize their approach for each customer.
We feel these are going to be great wedges into the broader workflow of their customers. These narrow use cases appear under-explored at the moment. 🤔 💡
☠️ The Middle Ground = A Death Zone? ☠️
Products that live in the middle of these two use cases are in trouble. They're neither offering incremental gains nor solving a specific, painful problem.
That typically looks like a product providing incremental gains across a low volume of tasks.
There are many products like new note-taking and presentation tools that fit this category. They make the creation slightly better, but users have to replace a deeply engrained workflow to get those incremental gains.
If you found this interesting:
Join the conversation on LinkedIn
Subscribe to Unsolicited Feedback where we’ll be covering topics like this weekly starting Feb 20.
Enroll in Design and Build Conversational AI
In this 4-week course, you'll learn how to identify the right use cases, design a delightful user experience, and determine return on investment for Conversational AI products. This winter session includes new material around building with custom GPTs and the latest advances in local tools such as Chat Bot Builder AI and Voice Flow. See a Free Preview.