Lineage Visualization

DJ provides powerful lineage visualization at both model and column levels, helping you understand data flow and dependencies across your dbt project.

Data Explorer

The Data Explorer visualizes your dbt models as an interactive lineage graph, showing how models connect and depend on each other.

Note: The Data Explorer displays model-level lineage. For column-level lineage, see the Column Lineage section below.

Opening Data Explorer

Method 1: Command Palette

Press Cmd+Shift+P / Ctrl+Shift+P
Type "DJ: Data Explorer"
Press Enter

Method 2: Keyboard Shortcut

Mac: Cmd+Shift+L
Windows/Linux: Ctrl+Shift+L

Method 3: Model Context Menu

Right-click a model file (.model.json, .sql, or .yml) in the explorer
Select "DJ: Data Explorer"

Opening Data Explorer, navigating/zooming the graph, expanding nodes, and running models from the graph

Understanding the Model Lineage Graph

The Data Explorer shows your dbt models as an interactive directed acyclic graph (DAG):

Visual Elements:

Nodes: Each model is represented as a colored node
Edges: Arrows show dependencies (data flow)
Layers: Models organized by type (staging → intermediate → mart)

Example from jaffle_shop:

The lineage graph shows how int__sales__orders__enriched depends on:

stg__sales__orders__standardized (staging layer)
stg__customers__profiles__clean (staging layer)
stg__sales__stores__locations (staging layer)

And how mart__sales__reporting__revenue depends on int__sales__orders__enriched.

Navigation Controls

Pan and Zoom:

Pan: Click and drag canvas
Zoom: Scroll wheel or pinch gesture
Fit View: Click "Fit to View" button

Node Selection:

Click node to highlight upstream/downstream
Click to open model file and switch to the model's lineage
Hover for quick info tooltip

Toolbar Features

The Data Explorer toolbar provides additional controls for customizing your view:

Auto-Sync Toggle:

Enable "Auto-sync" to automatically update the graph when you switch between model files
When enabled, opening a different .model.json, .sql, or .yml file updates the graph to show that model's lineage
Disabled by default to prevent unwanted graph changes
Can also be configured via dj.dataExplorer.autoRefresh setting

Split View Mode:

Click the split view icon to enable side-by-side layout
Shows the lineage graph alongside query results or compilation logs
Useful for comparing model structure with query output
Click again to return to single-panel view

Refresh Button:

Manually refresh the lineage graph to reflect latest changes
Useful after running dbt parse or modifying model dependencies
Icon animates while loading

Exploring the Graph

By default, the Data Explorer displays a model with its immediate upstream (parent) and downstream (child) models. You can progressively explore the lineage by expanding individual nodes.

Expanding Nodes:

Each node displays expand buttons (+ icons) when it has additional upstream or downstream models not currently shown:

Upstream expand button (left side): Shows models that feed into this model
Downstream expand button (right side): Shows models that depend on this model
Click the + button to reveal additional levels of lineage
Expansion happens per-node, allowing you to explore specific branches

Example: Starting from mart__sales__reporting__revenue:

Initially shows immediate upstream: int__sales__orders__enriched
Click upstream expand on int__sales__orders__enriched to reveal:
- stg__sales__orders__standardized
- stg__customers__profiles__clean
- stg__sales__stores__locations
Click upstream expand on staging models to reveal source tables:
- raw_orders, raw_customers, raw_stores

This progressive expansion lets you focus on relevant parts of the lineage without overwhelming the view.

Model Actions

Each model node in the Data Explorer displays icon buttons for quick actions:

View Columns:

Click the columns icon (grid icon) on any node
Opens Column Lineage panel for that model
Shows column-level dependencies and transformations

Compile Model:

Click the gear icon on a model node
Compiles the model and displays compilation logs
Shows generated SQL in the results panel
Note: Source nodes don't have compile/run buttons

Run Query:

Click the play circle icon on a model node
Executes a query against the compiled model
If model needs compilation, automatically compiles first
Shows query results with data preview and SQL view

Expand Lineage:

Click the + icon on nodes that have additional upstream/downstream models
Progressively reveals more of the lineage graph
Left side expands upstream dependencies
Right side expands downstream dependents

Select Model:

Click on a node to select it and view details
Opens the model file in the editor
Model becomes the active context for the graph

Results and Compilation Panels

The Data Explorer includes panels for viewing query results and compilation logs:

Query Results Panel:

Appears when you run a query from a model node
Data View: Shows query results in a tabular format
- Displays column names and row data
- Shows row count and execution time
- Supports scrolling through results
SQL View: Displays the compiled SQL that was executed
- Syntax-highlighted SQL
- Shows the exact query sent to the database
- Useful for debugging and understanding generated SQL
Toggle between Data and SQL views using the tab buttons
Maximize/restore the panel for better visibility

Compilation Logs Panel:

Shows real-time compilation output when compiling a model
Displays dbt compilation messages and any errors
Automatically appears when compilation starts
Shows success/failure status with visual indicators
Can be closed after reviewing the output

Column Lineage

Opening Column Lineage

Method 1: From Model File

Open any .model.json file
Click "Column Lineage" button in editor toolbar

Method 2: Command Palette

With .model.json file active
Run "DJ: Column Lineage"

Method 3: From Data Explorer

Click the columns icon button on a model node
Column Lineage opens for that model

Opening Column Lineage, understanding transformation types (raw/passthrough/renamed/derived), and tracing column origins

Understanding Column Flow

Column Lineage traces individual columns through transformations:

Transformation Types:

Raw (⬜ Gray)
- Source columns from raw tables
- Not yet transformed
- Example: Columns in raw_orders, raw_customers
Passthrough (🟢 Green)
- Column passes unchanged through transformation
- Example: customer_id → customer_id
Renamed (🟡 Yellow)
- Column renamed but values unchanged
- Example: id → customer_id
Derived (🟠 Orange)
- New column from expression, calculation, or aggregation
- Examples:
  - Calculation: first_name || ' ' || last_name → full_name
  - Aggregation: SUM(amount) → total_revenue
  - Date extraction: EXTRACT(YEAR FROM order_date) → order_year

Example from jaffle_shop:

In mart__sales__reporting__revenue:

order_id is passthrough from int__sales__orders__enriched
order_year is derived from EXTRACT(YEAR FROM order_date)
total_dollars is passthrough (but originally derived in intermediate layer)

Tracing Column Origins

Backward Tracing:

Click a column in the target model
Graph highlights the column and its immediate upstream connections
For derived columns: Hover over the derived badge area to see the transformation
A tooltip displays the SQL expression or transformation logic
Follow the highlighted path backwards to source

Viewing Transformations:

Derived columns with expressions show a tooltip when you hover over the badge area
The tooltip displays:
- SQL expression used to create the column
- Aggregation functions (SUM, COUNT, AVG, etc.)
- Calculations and formulas
- Date extractions and transformations
Simple mappings (raw, passthrough, renamed) don't show transformation tooltips since the relationship is direct

Example: Tracing order_total_dollars in int__sales__orders__enriched:

Shows it's derived from: CAST(stg__sales__orders__standardized.order_total_cents AS DECIMAL(10,2)) / 100.0
Which comes from order_total_cents in staging
Which comes from order_total in the source table raw_orders

Forward Tracing:

Select column
See all downstream uses
Identify impacted models

Example: Tracing order_date forward shows it's used in:

mart__sales__reporting__revenue (for year/month/quarter extraction)
int__sales__analytics__monthly_revenue (for rollup)
int__sales__analytics__rolling_30day (for lookback window)

Column Lineage Graph

Visual Elements:

Column Nodes: Individual columns displayed as separate nodes with color-coded badges
Model Information: Each node shows the model name that contains the column
Transformation Edges: Lines connecting columns across models
Color Coding: Badges show transformation type (raw/passthrough/renamed/derived)
Expression Tooltips: Hover over derived column badges to see transformation details

Viewing Details:

Hover column node: Highlights the column and its immediate connections
Hover derived badge: Shows SQL expression for derived columns in a tooltip
Hover edge: Highlights immediately connected nodes only

Bulk Select Expansion

DJ automatically expands bulk column selects:

all_from_model

{ "type": "all_from_model", "model": "stg__sales__orders__standardized" }

Expands to all columns from specified model

dims_from_model

{ "type": "dims_from_model", "model": "stg__sales__orders__standardized" }

Expands to dimension columns only

fcts_from_model

{ "type": "fcts_from_model", "model": "stg__sales__orders__standardized" }

Expands to fact/measure columns only

Example from jaffle_shop:

In int__products__analytics__product_popularity:

{
  "type": "dims_from_model",
  "model": "stg__sales__items__order_details",
  "include": ["product_sku", "product_type", "item_id"]
}

Column Lineage shows this expands to three dimension columns and traces each back to its source.

Finding Column Origin

From any file:

Place cursor on column name in SQL or .model.json
Run "DJ: Find Column Origin"
Column Lineage opens with column highlighted
Trace back to source table

Use Cases

Impact Analysis:

"If I change this source column, what breaks?"
Trace downstream to find affected models

Example: Changing ordered_at in raw_orders:

Affects order_date in stg__sales__orders__standardized
Which affects order_year, order_month, order_quarter in mart__sales__reporting__revenue
Which affects multiple dashboard queries

Debugging:

"Where does this value come from?"
Trace upstream to source table

Example: Investigating incorrect revenue_tier classification:

Trace to expression: CASE WHEN order_total_dollars >= 15.00 THEN 'High'...
Trace order_total_dollars to intermediate layer
Trace to cents-to-dollars conversion in staging
Trace to order_total in source

Documentation:

"What transformations are applied?"
Export lineage graph for documentation

Optimization:

"Is this column used downstream?"
Identify unused columns to remove

Example: Check if tax_paid_cents is used:

Trace forward through all models
Find it's only used in mart__sales__reporting__profitability
Safe to remove from other intermediate models

Lineage Configuration

Auto-Refresh Settings

Both Data Explorer and Column Lineage support auto-refresh on editor tab changes.

How it works:

Switch to a .model.json, .sql, or .yml file
The active lineage view (Data Explorer or Column Lineage) detects the file change
Graph updates to show the current model's context

Configuration:

{
  // Auto-refresh Data Explorer on tab change
  "dj.dataExplorer.autoRefresh": true,

  // Auto-refresh Column Lineage on tab change
  "dj.columnLineage.autoRefresh": true
}

Defaults:

dj.dataExplorer.autoRefresh: false (disabled by default)
dj.columnLineage.autoRefresh: true (enabled by default)

Performance Tips

Large Projects (100+ models):

Use progressive node expansion to explore specific branches
Avoid expanding all nodes at once
Disable auto-refresh if graph updates are slow

Deep Lineages (10+ levels):

Expand nodes progressively rather than all at once
Zoom in on specific sections of interest
Focus on critical paths through your pipeline

Troubleshooting

Lineage Not Showing

Check manifest.json:

Ensure dbt has been run: dbt parse in terminal
Verify target/manifest.json exists in your dbt project
Check for compilation errors in Output panel

Column Lineage Empty

Ensure model is compiled:

Open .model.json file
Run "DJ: Compile Model" from Command Palette
Check for validation errors in editor

Check dependencies:

Verify upstream models exist
Ensure manifest includes dependencies
Run dbt parse in terminal to rebuild manifest

Integration with dbt

DJ lineage graphs use dbt's manifest.json:

Parses target/manifest.json for dependencies
Respects dbt's ref() and source() relationships
Updates automatically when manifest changes
Compatible with all dbt features (packages, macros, etc.)

Next Steps

Visual Editor Guide - Create models visually
Tutorial - Build a complete pipeline
Model Types - Understand model dependencies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lineage Visualization

Data Explorer

Opening Data Explorer

Method 1: Command Palette

Method 2: Keyboard Shortcut

Method 3: Model Context Menu

Understanding the Model Lineage Graph

Navigation Controls

Toolbar Features

Exploring the Graph

Model Actions

Results and Compilation Panels

Column Lineage

Opening Column Lineage

Method 1: From Model File

Method 2: Command Palette

Method 3: From Data Explorer

Understanding Column Flow

Tracing Column Origins

Column Lineage Graph

Bulk Select Expansion

Finding Column Origin

Use Cases

Lineage Configuration

Auto-Refresh Settings

Performance Tips

Troubleshooting

Lineage Not Showing

Column Lineage Empty

Integration with dbt

Next Steps

FilesExpand file tree

LINEAGE.md

Latest commit

History

LINEAGE.md

File metadata and controls

Lineage Visualization

Data Explorer

Opening Data Explorer

Method 1: Command Palette

Method 2: Keyboard Shortcut

Method 3: Model Context Menu

Understanding the Model Lineage Graph

Navigation Controls

Toolbar Features

Exploring the Graph

Model Actions

Results and Compilation Panels

Column Lineage

Opening Column Lineage

Method 1: From Model File

Method 2: Command Palette

Method 3: From Data Explorer

Understanding Column Flow

Tracing Column Origins

Column Lineage Graph

Bulk Select Expansion

Finding Column Origin

Use Cases

Lineage Configuration

Auto-Refresh Settings

Performance Tips

Troubleshooting

Lineage Not Showing

Column Lineage Empty

Integration with dbt

Next Steps