Snorkel AI Accelerates Foundation Model Adoption with Data-centric AI

Snorkel AI, the data-centric AI platform company, today introduced Data-centric Foundation Model Development for enterprises to unlock complex, performance-critical use cases with GPT-3, RoBERTa, T5, and other foundation models. With this launch, enterprise data science and machine learning teams can overcome adaptation and deployment challenges by creating large, domain-specific datasets to fine-tune foundation models and using them to build smaller, specialized models deployable within governance and cost constraints. New capabilities for Data-centric Foundation Model Development are available within Snorkel Flow, the company’s flagship platform, in preview. Visit here to join the virtual launch event at 10 AM PT on November 22, 2022.

Foundation models such as GPT-3, DALL-E-2, Stable Diffusion, and more offer a lot of promise for generative, creative, and exploratory tasks. But enterprises are still nowhere close to deploying foundation models in production for complex, performance-critical NLP and other automation use cases. Enterprises need large volumes of domain- and task-specific labeled training data to adapt foundation models for domain-specific use. Creating these high-quality training datasets with traditional, manual data labeling approaches is painfully slow and expensive. Moreover, foundation models are incredibly costly to develop and maintain and pose governance constraints when deploying in production.

These challenges must be addressed before enterprises can reap the benefits of foundation models. Snorkel Flow’s Data-centric Foundation Management Development is a new paradigm for enterprise AI/ML teams to overcome the adaptation and deployment challenges currently blocking them from using foundation models to accelerate AI development.

Using early versions of Data-centric Foundation Management Development, AI/ML teams have built and deployed highly-accurate NLP applications in days:

A top US bank improved accuracy from 25.5% to 91.5% when extracting information from complex, multi-hundred-page long contracts.
A global home goods e-commerce company improved accuracy by 7-22% when classifying products from descriptions and reduced development time from four weeks to one day.
Pixability distilled knowledge from foundation models and built smaller classification models with more than 90% accuracy in days.
Snorkel AI research team and partners from Stanford University and Brown University have achieved the same quality as a fine-tuned GPT-3 model with a model that was over 1000x smaller on LEDGAR, a complex 100-class legal benchmark task.

“With over 500 hours of content created on YouTube every minute, we need to constantly and accurately categorize billions of videos to make sure we fully understand the context of videos so that advertisers can be sure they are running their ads on brand suitable content,” said Jackie Swansburg Paulino, Chief Product Officer at Pixability. “With Snorkel Flow, we can apply data-centric workflows to distill knowledge from foundation models and build high-cardinality classification models with more than 90% accuracy in days.”

Enterprise Foundation Model Management Suite features include:

Foundation Model Fine-tuning to create large, domain-specific training datasets to fine-tune and adapt foundation models for enterprise use cases with production-grade accuracy.
Foundation Model Warm Start to use foundation models and state-of-the-art zero- and few-shot learning to auto-label training data with a push of a button to train deployable models.
Foundation Model Prompt Builder to develop, evaluate, and combine prompts to tune and correct the output of foundation models to precisely label datasets and train deployable models.

“Enterprises have struggled to harness the power of foundation models like GPT-3 and DALL-E due to fundamental adaptation and deployment challenges. To work in real enterprise use cases, foundation models need to be adapted using task-specific training data and need to clear major deployment challenges around cost and governance,” said Alex Ratner, CEO and co-founder at Snorkel AI. “Snorkel Flow’s unique data-centric approach provides the necessary bridge between foundation models and enterprise AI, solving the adaptation and deployment challenges so enterprises can achieve real value from foundation models.”

About Snorkel AI
Founded by a team spun out of the Stanford AI Lab, Snorkel AI makes AI application development fast and practical by unlocking the power of machine learning without the bottleneck of manually-labeled training data. Snorkel Flow is the first data-centric AI platform powered by programmatic labeling. Backed by Addition, Greylock, GV, In-Q-Tel, Lightspeed Venture Partners and funds and accounts managed by BlackRock, the company is based in Palo Alto.

For more information on Snorkel AI, please visit: https://www.snorkel.ai/ or follow @SnorkelAI.