Tentatively eliminate graph break overhead#3741
Conversation
narendasan
left a comment
There was a problem hiding this comment.
Can you include similar changes to the C++ runtime as well?
| self._caller_stream: Optional[torch.cuda.Stream] = None | ||
| self._engine_stream: Optional[torch.cuda.Stream] = None | ||
| self.output_tensors: Optional[List[torch.Tensor]] = None | ||
| self.sync_stream = True |
There was a problem hiding this comment.
Just inherit stream from PyTorch / input tensors
| # For shape tensors, we use CPU pointers and for data tensors, we use GPU pointers | ||
| # as per TensorRT requirements | ||
| if self.engine.is_shape_inference_io(input_name): | ||
| if self.is_shape_inference_io[i]: |
There was a problem hiding this comment.
Probably better to make this a dictionary and key on names, instead of implicitly relying on input order to stay the same over time
| input_name, tuple(contiguous_inputs[i].shape) | ||
| ) | ||
| if shape_changed: | ||
| self.context.set_input_shape( |
There was a problem hiding this comment.
Can we safely assume execution context holds shape between inference calls?
|
|
||
| # Only set the requires_unique_output flag for the last TRT Module when user has access to the output tensor | ||
| if trt_module and settings.use_python_runtime: | ||
| trt_module.set_requires_unique_output(True) |
There was a problem hiding this comment.
How is this going to work with serialization in C++?
Also make the name clearer like trt_module.module_is_output_operator or trt_module.requires_unowned_output_tensor
Yeah once we think all changes in pytorch is valid and I can make changes accordingly |
b2ef228 to
a9a27b1
Compare
d862b68 to
52f7c48
Compare
1c0a8aa to
5e0d3e8
Compare
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: