Source
yaml
id: caching
namespace: company.team
tasks:
  - id: transactions
    type: io.kestra.plugin.core.http.Download
    uri: https://huggingface.co/datasets/kestra/datasets/resolve/main/csv/cache_demo/transactions.csv
  - id: products
    type: io.kestra.plugin.core.http.Download
    uri: https://huggingface.co/datasets/kestra/datasets/resolve/main/csv/cache_demo/products.csv
    description: This task pulls the full product catalog once per day. Because the
      catalog changes infrequently and contains over 200k rows, running it only
      daily avoids unnecessary strain on that production DB, while ensuring
      downstream joins always use up-to-date reference data.
    taskCache:
      enabled: true
      ttl: PT24H
  - id: duckdb
    type: io.kestra.plugin.jdbc.duckdb.Query
    store: true
    inputFiles:
      products.csv: "{{ outputs.products.uri }}"
      transactions.csv: "{{ outputs.transactions.uri }}"
    sql: |-
      SELECT
        t.transaction_id,
        t.timestamp,
        t.quantity,
        t.sale_price,
        p.product_name,
        p.category,
        p.cost_price,
        p.supplier_id,
        (t.sale_price - p.cost_price) * t.quantity AS profit
      FROM
        read_csv_auto('transactions.csv') AS t
      JOIN
        read_csv_auto('products.csv') AS p
      USING (product_id);
About this blueprint
SQL Kestra Database
This flow illustrates the use of Kestra's taskCache feature to cache a task extracting large product catalog, reducing load on the source system.
- The 
transactionstask downloads recent transactions data without caching. - The 
productstask downloads the full product catalog and caches the result for 24 hours using thetaskCacheproperty, ensuring that downstream tasks use fresh data while avoiding repeated downloads within the TTL. - The 
duckdbtask joins the transactions and product data using DuckDB SQL, calculates profit per transaction, and stores the result.