Skip to content

[CUDA] Extend Pad support through opset 25 with wrap mode#27774

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/evaluate-cuda-kernels-wrap-mode
Draft

[CUDA] Extend Pad support through opset 25 with wrap mode#27774
Copilot wants to merge 3 commits intomainfrom
copilot/evaluate-cuda-kernels-wrap-mode

Conversation

Copy link
Contributor

Copilot AI commented Mar 19, 2026

Description

This PR consolidates PRs #27416 and #27708 to extend CUDA Pad kernel support through opset 25, including wrap mode implementation.

Motivation and Context

The CUDA execution provider previously only registered the Pad kernel up to opset 18 and did not implement wrap mode. When an ONNX model exported with opset 19+ was run on the CUDA executor, the Pad operation was forced to fall back to CPU, resulting in significant performance degradation. This PR aligns CUDA Pad registration with the ONNX Pad schema evolution through opset 25 and provides a correct wrap mode implementation.

Related issues: #26393
Related PRs: #27416, #27708

Summary of Changes

Kernel registration and opset coverage

File Change
onnxruntime/core/providers/cuda/tensor/pad.cc Adds CUDA Pad kernel registrations for opset ranges 18, 19-20, 21-22, 23, 24, and 25.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc Registers the new Pad kernel versions in the CUDA EP registry under the existing per-opset sections.

CUDA Pad implementation

File Change
onnxruntime/core/providers/cuda/tensor/pad_impl.h Extends the Pad kernel interface to pass effective sliced extents and per-axis input offsets.
onnxruntime/core/providers/cuda/tensor/pad_impl.cu Adds CUDA wrap mode using a WrapCoordinate device helper with if constexpr compile-time specialization. Removes dead wrap code from the NCHW-specialized kernel path.
onnxruntime/core/providers/cuda/tensor/pad.cc Computes effective sliced input extents/offsets for wrap behavior with negative pads. Bypasses the NCHW fast-path for wrap mode and routes through the generic implementation.

Documentation

File Change
docs/OperatorKernels.md Updates the CUDA Pad kernel opset coverage to reflect the new version splits (25+, 24, 23, [21,22], [19,20], 18) up to opset 25.

Test coverage

File Change
onnxruntime/test/providers/cpu/tensor/pad_test.cc Adds CUDA-only Pad coverage for edge across opsets 18-25 and wrap across opsets 19-25. Updates existing wrap test comment.

Checklist

  • Tests added/updated
  • No breaking changes

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

ShirasawaSama and others added 2 commits March 19, 2026 19:37
…wrap impl, CUDA-only tests

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
@ShirasawaSama
Copy link
Contributor

Thanks!

@tianleiwu
Copy link
Contributor

@copilot, please update ‎docs/OperatorKernels.md. You can follow PR #27708.

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 19, 2026

@copilot, please update ‎docs/OperatorKernels.md. You can follow PR #27708.

Updated docs/OperatorKernels.md with the CUDA Pad opset breakdown (25+, 24, 23, [21,22], [19,20], 18) matching PR #27708's pattern in commit f5e748c.

@tianleiwu
Copy link
Contributor

/azp run Windows GPU Doc Gen CI Pipeline, Python format

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants