Table of Contents
What is Caching?
Caching in snowflake is one of the query performance improvement techniques which is achieved by keeping or maintaining frequently accessed data in the memory. Since data can be accessed quickly from memory or SSD, it truly provides a performance improvement for data access.
Quick Walkthrough of Snowflake Architecture should be understood before understanding the snowflake caching and how it works at different levels.
Snowflake Architecture (Source : Snowflake Website)
As we know, Snowflake has three layers with which it functions, so caching works at each layer. Let’s deep dive.
Caching in Snowflake – 3 Different Layers
Caching in Snowflake (source: Snowflake Community Page)
Snowflake employs various cache layers to optimize data retrieval and query performance. These layers include:
- Query Result Cache: This cache stores the results of queries executed within the last 24 hours. Results are accessible across virtual warehouses, enabling subsequent users to benefit from previously executed queries, provided the underlying data remains unchanged.
- Local Disk Cache / Virtual Warehouse Cache: Data utilized by SQL queries is cached in SSD and memory from the Remote Disk storage. When data is required for a query, it’s retrieved from this cache layer, enhancing query performance by minimizing disk reads.
- Remote Disk: This level serves as the long-term storage, responsible for maintaining data resilience. In platforms like Amazon Web Services, it ensures 99.999999999% durability, even in scenarios of complete data center failures.
- Metadata Cache: Count, min max of int columns, etc. and other stats of table are maintained in Metadata cache (which sits in compute service layer). Virtual warehouse is not spinned in this case.
Important Note:
Local disk cache gets wiped out when the virtual warehouse is suspended and hence if you run the query again and virtual warehouse starts, then it takes time to run the query and again new cache copy is saved until next suspension of virtual warehouse.
Summary:
- Count, min max of int columns, etc. and other stats of table are maintained in Metadata cache(which sits in compute service layer). Virtual warehouse is not spinned in this case.
- If we run a query, a query result cache is created in (compute service layer), which can be accessed across other virtual warehouses and for that query should be exactly same.
- When we run a query, a local disk cache or virtual warehouse cache is created which is local to virtual warehouse (other warehouses cannot access this). This is maintained in virtual warehouse or query processing layer. Whenever the same query or similar kind of query is ran, snowflake determines whether it can use local disk or virtual warehouse cache and then makes use of it to reduce overall processing time.