Cloud-based data warehouse company Snowflake on Tuesday at its annual Snowflake summit showed off a new set of tools and integrations to take on rival companies like Teradata and services like Google BigQuery and Amazon Redshift.
The new features, which include data access tools and support for Python on the company’s Snowpark application development system, are aimed at data scientists, data engineers and developers with the intent to accelerate their machine learning, thereby accelerating application development.
Snowpark, launched a year ago, is a dataframe-like development environment designed to allow developers to deploy their favorite serverless tools on Snowflake’s virtual warehouse compute engine. Python support is in public preview.
“Python is probably the most requested feature we hear from our customers,” said Christian Kleinerman, senior vice president of product at Snowflake.
The demand for Python makes sense, as it’s a language of choice for data scientists, analysts say.
“Snowflake is catching up on this front as rivals such as Teradata, Google BigQuery and Vertica already have Python support,” said Doug Henschen, principal analyst at Constellation Research.
In one of the updates announced at the summit, the company said it was adding a Streamlit integration for app development and iteration. Streamlit, which is an open source application framework in Python for machine learning and data science engineering teams to help visualize, modify and share data, was acquired by Snowflake in March.
The integration will allow users to remain within the Snowflake environment not only to access, secure and govern data, but also to develop data science applications to model and analyze data, said Tony Baer, principal analyst at dbInsights.
Snowflake launches Python-related integrations
Some of the other Python-related integrations include Snowflake Worksheets for Python, Large Memory Warehouses, and SQL Machine Learning.
Snowflake Worksheets for Python, which is in private preview, is designed to enable companies to develop pipelines, machine learning models and applications in the company’s web interface, dubbed Snowsight, the company said, adding that it had capabilities such as code completion and custom logic generation.
In order to help data scientists and development teams perform memory-intensive operations such as feature engineering and model training on large datasets, the company said it is working on a feature called Great Storehouses of Memory.
Currently in the development phase, Large Memory Warehouses will provide support for Python libraries through integration with the Anaconda data science platform, he added.
“Several rivals are configurable to support large memory warehouses as well as Python functions and language support, so it’s Snowflake that meets market demands,” Henschen said.
Snowflake also offers SQL Machine Learning, starting with time series data, in private preview. The service will help companies integrate machine learning-based predictions and analytics into business intelligence applications and dashboards, the company said.
Many analytical database vendors, according to Henschen, have built machine learning models for running in the database.
“The rationale behind Snowflake starting with the analysis of time series data is: [that it is] among the most popular machine learning analytics because it is about predicting future values based on previously observed values,” Henschen said, adding that time series analysis has many use cases in the financial sector.
Snowflake updates enable better access to data
With the logic that faster data access could lead to faster application development, Snowflake also introduced new features on Tuesday, including support for streaming data, Apache Iceberg tables in Snowflake, and external tables. for on-site storage.
Support for streaming data, which is in private preview, will help break down the boundaries between streaming and batch pipelines with Snowpipe Streaming. Snowpipe is the company’s continuous data ingestion service.
The rationale for launching the feature, according to Henschen, is the high interest in supporting low latency options, including near real-time and real-time streaming, and most vendors in this market have ticked. the streaming box.
“The feature gives engineering teams a built-in way to analyze the stream alongside historical data, so data engineers don’t have to tinker with something themselves. It’s a time saver,” Henschen said.
In order to meet the demand for more open source table formats, the company said it is developing Apache Iceberg Tables to work in its environment.
“Apache Iceberg is a very popular open source table format and it is rapidly gaining traction for data analytics platforms. Table formats such as Iceberg provide metadata that helps with consistency and scalable performance. Iceberg was also recently adopted by Google for its Big Lake offering,” Henschen said.
Meanwhile, in an effort to keep its on-premises customers engaged while trying to get them to adopt its cloud data platform, Snowflake is introducing on-premises storage of external tables. Currently in private preview, the tool allows users to access their data in on-premises storage systems from companies such as Dell Technologies and Pure Storage, the company said.
“Snowflake had a ‘cloud only’ policy for some time, so they clearly had large, important customers who wanted a way to bring on-premises data into analytics without moving everything into Snowflake,” said Henschen.
Additionally, Henschen said rivals including Teradata, Vertica and Yellowbrick offer on-premises deployment as well as hybrid and multicloud deployment.
Copyright © 2022 IDG Communications, Inc.