Skip to content

Instantly share code, notes, and snippets.

@b09dan
Created March 25, 2025 15:01
Show Gist options
  • Select an option

  • Save b09dan/6bf49f759f0f97c78601673bbda7e98b to your computer and use it in GitHub Desktop.

Select an option

Save b09dan/6bf49f759f0f97c78601673bbda7e98b to your computer and use it in GitHub Desktop.
Too frequent show objects in dbt

The behavior when dbt issuing numerous SHOW OBJECTS IN <database>.<schema> queries when interfacing with Snowflake—is a recognized concern, especially in projects with a large number of schemas. This approach can introduce significant overhead due to the sequential execution of these commands, each requiring a separate connection to Snowflake.

Optimization Strategies:

  1. Aggregate Object Retrieval:

    • Current Behavior: dbt executes individual SHOW OBJECTS commands for each schema, which can be time-consuming in environments with many schemas.
    • Proposed Improvement: Modify dbt to retrieve all objects at the database level using a single SHOW OBJECTS IN DATABASE command, then filter the results for the relevant schemas locally. This method reduces the number of queries sent to Snowflake, thereby decreasing latency and computational overhead.
    • Considerations: The SHOW OBJECTS command has a limit of returning a maximum of 10,000 records. In databases with a vast number of objects, this cap might necessitate additional handling, such as implementing pagination or refining the scope of the command.
  2. Adjusting Pagination Settings:

    • Background: dbt's default pagination for object results is set to handle up to 100,000 objects, retrieving them in chunks of 10,000.
    • Optimization: If your schemas contain fewer objects, consider configuring dbt to use larger pagination sizes or adjust the pagination settings to better align with your specific schema sizes. This adjustment can reduce the number of queries executed.
  3. Disabling Unnecessary Features:

    • Feature in Question: The persist_docs feature in dbt, when enabled, can lead to additional queries that may impact performance.
    • Action: If documentation persistence is not critical for your workflow, consider disabling the persist_docs feature to enhance performance.

Implementation Steps:

  • Stay Updated: Ensure you're using the latest version of dbt, as performance enhancements and optimizations are continually integrated.
  • Community Engagement: Participate in dbt's community forums and GitHub discussions to stay informed about ongoing developments and to contribute to potential solutions.

By adopting these strategies, you can mitigate the computational costs associated with dbt's interaction with Snowflake, leading to more efficient and cost-effective data transformations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment