Skip to main content

Snorkel Flow update makes it easier to use enterprise data with Llama 3, Gemini AI

A diver wears a snorkel covered with computer code.
Credit: VentureBeat made with Midjourney V6

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Snorkel AI, a four-year-old startup that grew out of the Stanford AI Lab has announced a major update to its signature product — a data labeling, filtering, curation and AI fine-tuning platform called Snorkel Flow, which will now be able to integrate directly with Google’s Gemini AI model family and Meta’s brand new Llama 3.

Snorkel Flow was initially launched in March 2022 with the aim to drastically improve the process by which enterprises develop and deploy custom AI solutions, letting them take their data — whether it’s structured or unstructured documents — and help them automatically label, annotate, and organize it to make it useful as a source of truth and information for their AI apps and services.

“Enterprises are hitting a wall in terms of what they’re able to accomplish with off-the-shelf LLMs that have been trained on the public internet for general purpose usage,” said Alex Ratner, co-founder and CEO of Snorkel AI, in a video call interview with VentureBeat conducted yesterday ahead of the announcement. “They haven’t been customized for [enterprises’] use cases, their data sets, their settings. That’s where we come in with our platform Snorkel Flow for doing data labeling, data development, and doing it radically more efficiently.”

For example, if your company wanted to build a chatbot for employees that answered their questions about internal policies and company holidays, Snorkel Flow could ensure those documents were labeled appropriately so the chatbot could find them and extract the relevant information when asked. Or, if you wanted to create a customer service chatbot that understood when a customer mentioned a specific product name, Snorkel Flow could help you tune the model to understand that name referred to a specific SKU.

According to Ratner, the company’s specialty is “AI data development,” which includes “all of the labeling, but also the curation, picking the right data, filtering for high quality data, choosing the right mix of data, maybe augmenting it with synthetic data, all of that data development that gets a finely curated data set to actually tune [AI] models on.”

“All the platform vendors, the cloud vendors, are announcing APIs for tuning models, but they don’t help you with actually getting the data ready to dump into that API, which is the entire hard part,” Ratner added.

At the time it launched, Snorkel Flow included features such as programmatic data labeling and collaborative AI development, which were adopted by major enterprises such as Memorial Sloan Kettering Cancer Center and Chubb to improve AI model accuracy and development efficiency by 10 to 100 times, according to Snorkel. Seven of the top 10 U.S. banks are reportedly customers of the company, and Ratner said they were able to help some of them automatically label data for compliance with regulations reducing time spent on manual labeling from “six months to 24 hours.”

“As base LLMs become pervasive, including powerful open-source options like Llama 3, the speed and accuracy with which data is continuously labeled and curated for fine-tuning and aligning LLMs becomes the key differentiator,” Ratner added in a press release sent to VentureBeat.

A brand new Flow

In addition to allowing users to tap their enterprise data — organized and labeled by Snorkel’s AI — as a source of truth and information through tuned versions of Google Gemini and Llama 3, the Snorkel Flow platform received a boatload of additional upgrades today.

It will now integrates with Databricks Unity Catalog, Vertex AI, and Microsoft Azure Machine Learning, platforms that enterprises may already be using or wish to use to help them organize and control access to their data.

Moreover, Snorkel Flow now supports programmatic labeling of multimodal data including images.

This is particularly timely, as enterprises are increasingly looking to leverage AI across various data modalities to gain more comprehensive insights and enhance decision-making processes.

One Snorkel customer that has already taken advantage of the image data labeling functionality is Wayfair, and Ratner said the goal was to provide the same “speed up” in labeling as with text documents, going from months or weeks to days or hours.

More secure Snorkel

Snorkel has added new role-based access controls (RBAC) for Snorkel Flow account administrators, letting them control access to an enterprise’s data and where it can be used and accessed for AI development and responses.

The new RBAC features also enable administrators to control who can even upload data to an enterprise’s Snorkel Flow development account in the first place, and regulate access to any connected services or apps.

This is similar in some ways to the new Projects feature unveiled by OpenAI yesterday for its proprietary in-house models — but of course, Snorkel allows for enterprises to control access to multiple models from multiple vendors.

Additionally, Snorkel Flow now supports on-premise and air-gapped access to foundation models, allowing for greater compliance and data security.

This release also complements Snorkel’s recently launched enterprise AI accelerator, Snorkel Custom, which provides structured engagements to support enterprises from the initial evaluation phase through the tuning and optimization of AI models.

From flashy demo to production value added

In general, Snorkel is aiming to help its enterprise customers realize the potential of generative AI by making their data as useful as possible, as fast as possible for fine-tuning AI models to their specific needs and use cases, as well as building AI-powered applications and services — ideally atop fine-tuned models.

“It’s a pretty broad trend that we’re seeing,” Ratner told VentureBeat. “People had massive pressure to get some kind of [AI] demo out. The demo looked really good. Expectations were sky high, even compared to most hype cycles and just in tech history. And now there’s massive pressure to cross that chasm from flashy chat bot demo to production value.”

Snorkel Flow and Snorkel Custom are now generally available, with pricing dependent on the specific use case.