Conversation with Groq CEO Jonathan Ross

Social Capital2 minutes read

Jonathan has a unique background in Silicon Valley, with Grock surpassing Nvidia's developer numbers within 30 days. Grock focuses on providing low-cost AI solutions to rival Nvidia's offerings, necessitating a shift towards inference computing.

Insights

  • Grock, led by Jonathan, quickly gained traction with 75,000 developers in just 30 days, surpassing Nvidia's 100,000 developers in seven years, showcasing its rapid growth and potential impact in the tech industry.
  • The shift from training to inference computing is a pivotal trend, with significant implications for system design and performance, highlighting the importance of optimizing for inference and the need for innovative solutions like Grock's to address evolving compute needs efficiently.

Get key ideas from YouTube videos. It’s free

Recent questions

  • How did Jonathan's entrepreneurial journey begin?

    Jonathan's entrepreneurial journey started with dropping out of high school, followed by taking classes at universities and eventually working at Google.

  • What was the inspiration behind the creation of TPU at Google?

    The creation of TPU at Google was driven by the need to afford machine learning models like speech recognition, which were outperforming humans but too costly to implement.

  • What is the key to Grock's success in the AI industry?

    Grock's success in the AI industry lies in its scaled inference approach, inspired by AlphaGo's performance on TPUs, utilizing interconnects for better performance.

  • How does Grock's current generation compare to Nvidia's B200?

    Grock's current generation outperforms Nvidia's B200 by 4X in speed and is one-tenth the cost per token, providing a cost-effective alternative.

  • Why is achieving low latency in AI models crucial?

    Achieving low latency in AI models is crucial for user experience, with every 100 milliseconds leading to increased engagement, emphasizing the need for rapid response times.

Related videos

Summary

00:00

Jonathan's Grock: A Silicon Valley Success

  • Jonathan has a unique origin story, having spent 25 years in Silicon Valley, and his company Grock is compared to Nvidia.
  • Grock has 75,000 developers within 30 days of launching its developer console, surpassing Nvidia's 100,000 developers in seven years.
  • Jonathan's entrepreneurial journey involved dropping out of high school, taking classes at universities, and eventually working at Google.
  • At Google, Jonathan worked on building test systems for ads, including a live testing system for ads queries.
  • The creation of TPU at Google stemmed from the need to afford machine learning models like speech recognition, which were outperforming humans but too costly to implement.
  • TPU's success was due to focusing on accelerating matrix multiplication, using a systolic array approach, different from traditional methods.
  • Jonathan left Google to join the Google X team for more innovative projects but desired to work on a concept from start to finish.
  • Grock initially focused on software due to the complexity of programming chips, leading to a compiler focus for faster development.
  • Grock's success lies in its scaled inference approach, inspired by AlphaGo's performance on TPUs, utilizing interconnects for better performance.
  • Nvidia excels in software and vertical integration, moving up the stack to compete with its customers, particularly strong in training models and forward integration.

14:45

"AI Inference Computing: Cost-Effective Solutions"

  • In real-world applications, the shift from training to inference computing is significant, with a 5-10% training and 90-95% inference split, leading to a substantial increase in inference compute needs.
  • HBM (High Bandwidth Memory) is crucial for performance in applications, but its limited supply, along with other components like interposers and cables, is monopolized by Nvidia, impacting the cost and complexity of systems.
  • To compete with leading solutions like Nvidia's B200, a different approach was taken, utilizing older technology like 14 nanometer and avoiding reliance on the same supply chain to provide a cost-effective alternative.
  • Comparisons between Grock's solutions and GPUs show Grock's current generation outperforming the B200 by 4X in speed and being one-tenth the cost per token.
  • The aim is to reduce costs for startups using AI by providing a low-cost alternative to avoid excessive spending on compute resources, redirecting funds from tech giants like Google, Amazon, and Facebook.
  • Nvidia's B200, while complex and engineered with a $10 billion investment, falls short of the claimed 30X performance increase, with concerns raised by Nvidia engineers about credibility.
  • Achieving low latency in AI models is crucial for user experience, with every 100 milliseconds leading to increased engagement, emphasizing the need for rapid response times.
  • Training and inference in AI differ significantly, with training focusing on monthly token volumes, while inference requires generating tokens per millisecond, leading to the need for new chip architectures and systems.
  • The shift towards inference computing is evident, with Nvidia's latest earnings showing a 40% revenue from inference, expected to rise to 90-95% in the future due to the availability of open-source models.
  • In the inference market, the ability to quickly adapt to new models is essential, with the need to swap between different models like Llama, Mistol, and Anthropic efficiently to keep up with evolving technologies and quality improvements.

30:02

AI Talent Competition and Collaborations Reshaping Industry

  • In Silicon Valley, recruiting top AI talent is challenging due to high competition from companies like Tesla, Google, and OpenAI offering lucrative pay packages, viewing the market as winner-takes-all. To build a successful team, it's advised to hire experienced engineers capable of shipping products efficiently and teach them AI, rather than solely focusing on young AI researchers.
  • Collaborations in the AI industry, such as the partnership between Gro and Saudi Aramco, aim to deploy significant amounts of computing power, surpassing even major tech companies like Meta. These deals are not about competition but rather complementing existing capabilities, with the potential to exceed hyperscalers in computing resources.
  • The future of AI prompts questions about its impact on jobs and society. Drawing a parallel to Galileo's telescope revealing humanity's place in the universe, large language models are likened to a "telescope for the mind," expanding our understanding of intelligence. While initially daunting, embracing the vastness of intelligence will lead to appreciation and acceptance of AI's role in our world.
Channel avatarChannel avatarChannel avatarChannel avatarChannel avatar

Try it yourself — It’s free.