Evaluating AI Models, Unlocking Unstructured Data, and Achieving Reliability w/ Ben Kus

May 23, 2024

Hosts:

Fareed Mosavat

Topics:

AI Models, Foundational Models, Technical Strategy, Product Strategy, Box

Listen on

Spotify | Apple | Pandora | Amazon Music | TuneIn

Evaluating AI Models, Unlocking Unstructured Data, and Achieving Reliability w/ Ben Kus #tech #ai

Evaluating AI Models, Unlocking Unstructured Data, and Achieving Reliability w/ Ben Kus

Today on "Unsolicited Feedback," Fareed Mosavat dives deep into the evolving landscape of artificial intelligence with tech expert and Box CTO, Ben Kus. As AI continues to reshape industries, understanding its core mechanisms and potential applications has never been more critical. Ben sheds light on how AI can enhance productivity and decision-making within organizations while also sharing technical strategies he’s implemented to ensure reliability for Box’s enterprise customers.

⚠️ Understanding AI's Non-Deterministic Nature

A core challenge in building with AI is its non-deterministic nature, meaning that the same input can yield different outputs each time. This unpredictability complicates development and application. Ben even said, "We’ve gotten to the point where if we add a period at the end of a prompt versus not, it’ll change the answer." This variability requires developers to be meticulous in their approach, constantly testing and refining their models to achieve consistent performance.

🔍 Fine-Tuning AI Interactions for Better Outcomes

One way to manage AI's unpredictability is through fine-tuning interactions. There are many highly technical ways to do this, but Ben suggests starting simple. A key aspect here is the "temperature" setting, which controls the randomness of the AI's responses. Temperature often ranges from 0 to 1:

Temperature 0: The AI provides the same response for the same input every time, aiming for precision and consistency. For example, if you set the temperature to zero and ask the AI to tell you a joke, it would repeat the same joke over and over.
Temperature 1: The AI generates more diverse responses, introducing creativity but reducing consistency. If you want the AI to create something new or imaginative, setting a higher temperature encourages more varied outputs.

Ben emphasizes the importance of experimenting with different temperature settings to find the optimal balance for specific use cases. This iterative testing helps in understanding an AI's limitations and capabilities, significantly improving its utility in practical scenarios.

📌 Tailor the Suggested Prompts Per Model

One of the significant challenges discussed was the necessity of customizing prompts for different AI models. Ben pointed out, "We have to customize prompts per model, and we have to then manage the version history and control on those prompts." This customization is crucial because varying even a small detail can drastically change the output. He finds that some types of prompts work better with certain models over others, and unfortunately, the only way to see this is through trial and error.

🧠 Understand the Why and How of AI for Strategic Advantage

Ben has read over 30 research papers on AI, but he believes not everyone needs to delve that deep. For those leading technology strategies, it's crucial to know what changed with these models and why they make a difference.

Transformers, which form the basis of models like GPT (Generative Pre-trained Transformer), are a type of deep learning model designed to handle sequential data. They revolutionized AI by enabling the processing of text in a non-linear manner, which made handling context and dependencies in language more effective.

Understanding these fundamentals helps predict future tech trends and prepares us for shifts in digital landscapes.

🔄 Choosing the Right Model for Enterprise Companies

Managing AI models in an enterprise setting involves multiple layers of complexity. Here are a few tools you can use to decide what to use:

Categorize Models: There’s a sea of them out there. Grouping can help narrow the field.
Premium Models: These are the top-tier models that offer the highest performance, accuracy, and capabilities. They are ideal for critical tasks that require the best results, such as natural language understanding, complex data analysis, and high-stakes decision-making processes.
Standard Models: These models provide good performance at a lower cost. They are suitable for less critical tasks where some compromise on accuracy and capability is acceptable. Standard models can be used for everyday tasks, basic data processing, and initial development phases.
Look at Model Attributes:
Consider key attributes like where models are hosted, the trusted platform, safety, and open-source nature. Ben notes that where a model is hosted is often overlooked but critically important for reliability.
Utilize ELO Scores:
You can grade models against each other, which helps cluster them into performance categories.
Use Multiple Models:
Provide flexibility to choose or switch models as needed, accommodating diverse requirements. "We provide our customers with the chance to either pick their model or bring their own model." While this ensures flexibility for customers, it also introduces a high level of complexity. Ben advises smaller companies to start with one model to avoid premature optimization, which can complicate operations.

💡 How to Pick a Model if You're a Startup

Ben's guidance for startups is focused on practicality and cost-effectiveness:

Start Simple:
Avoid training your own models initially to save on infrastructure costs.
Utilize pre-existing models to quickly test and iterate on your product ideas.
Use Cloud Providers:
Leverage big cloud providers like GCP, AWS, or Azure for hosted models.
Utilize their infrastructure to simplify switching models through configuration changes.
Delay Optimization:
Focus first on finding product-market fit and understanding your costs.
Optimize for infrastructure only when scale necessitates it, ensuring you don't waste resources prematurely.

💬 How to Leverage Human Behavior to Improve AI Results

As humans, when we’re given a task, we don’t produce a perfect version in 30 seconds. We think about it, draft it, refine it, get feedback on it, and refine it some more. Ben talks about how at Box he’s working to put protocols in place that mimic human behavior, where tasks are iterated upon and refined through feedback. This strategy involves:

Use AI Feedback Loops:
AI operations (AI Ops) and observability tools play a crucial role in managing and maintaining AI systems. Ben revealed an intriguing technique where an AI is used to check if another AI provides a good answer: "You get an AI to tell you if another AI did a good job." This recursive validation helps in maintaining higher accuracy and reliability.
Implement Iterative Processes:
Allow AI to iterate on tasks, refining outputs with each pass to achieve better results.
Often, the third try is the best one.
Adopt a workflow where AI tools perform tasks similarly to how humans would, asking follow-up questions and iterating on the work.
Lean on Agents Over Chats:
Agents can achieve complex goals through iterative processes and plans. Unlike traditional AI queries, which are one-shot instructions, agents can make a plan and execute it iteratively: "This is a really important insight... usually, the agents have this iterative set of capabilities."
Despite the early hype around AI agents, the technology is still evolving. Ben noted, "If you're not careful, that little iteration it does amplifies the randomness." However, the continuous improvement in this area promises to make AI agents more reliable and useful in the future.

🌐 Ben is Using AI to Turn Unstructured Data into Structured Data

AI's potential to handle unstructured data is revolutionary, according to Ben. AI can process and draw insights from unstructured data like documents, images, and videos. Once completed, you can then use AI to structure unstructured data, making it more usable for traditional analytics. And the biggest unlock - AI can create metadata, helping to organize and analyze data more efficiently.

What tricks are you using to get reliability in AI? How have you chosen the right model? Let us know!

‹ Sonos' Redesign Debacle: How to Avoid Common Pitfalls in Product Upgrades w/ Kurt Schrader

Box CTO's Approach to Enterprise AI Tools & Insights from Google and OpenAI's Latest Releases w/ Ben Kus ›

Product

Information