The Snowflake Data Platform utilizes an innovative partitioning method known as micro-partitioning. This approach offers the benefits of static partitioning while circumventing its recognized constraints, along with delivering further substantial advantages.
Table of Contents
What is Micro-partitioning or Micro-Partitions?
In Snowflake tables, all data is stored into micro-partitions, which represent continuous storage units. Each micro-partition contains uncompressed data of size ranging from 50 MB to 500 MB (though in practice, data in Snowflake is stored compressed, thus occupying less space). Rows within tables are grouped into these micro-partitions and organized in a columnar manner.
Snowflake maintains metadata of partitions which contains:
- range of values
- Information about unique values.
- Further attributes for optimization purposes and streamlined query execution.
Benefits of Micro-partitioning
Snowflake’s approach to partitioning table data offers several advantages:
- Snowflake’s micro-partitions are generated automatically, eliminating the need for upfront definition or manual maintenance by users.
- Micro-partitions sizes are small (ranging from 50 to 500 MB before compression), facilitating highly efficient Data Manipulation Language (DML) operations and precise pruning for performance.
- Overlapping ranges of values among micro-partitions, coupled with their consistently small size, mitigate skew issues, enhancing overall data distribution.
- Within micro-partitions, columns are stored independently using columnar storage techniques. This enables swift scanning of individual columns, with only the columns referenced in a query being scanned.
- Additionally, columns within micro-partitions are compressed independently. Snowflake dynamically determines the most suitable compression algorithm for each column, optimizing storage efficiency i.e. Query Pruning. When a query includes a filter predicate targeting a specific range of values, it ideally scans only the fraction of micro-partitions corresponding to that range, aligning with the proportion of values accessed.
Ref : Snowflake Website
Let’s deep dive into data clustering in next article.