-
Notifications
You must be signed in to change notification settings - Fork 22.5k
Insights: pytorch/pytorch
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v2.5.1 PyTorch 2.5.1: bug fix release
published
Oct 29, 2024
2 Pull requests merged by 1 person
-
Bump rexml from 3.3.3 to 3.3.9 in /ios/TestApp
#139088 merged
Oct 28, 2024 -
Bump certifi from 2024.2.2 to 2024.7.4 in /tools/build/bazel
#130173 merged
Oct 28, 2024
213 Pull requests opened by 112 people
-
[hop free symbols] replace ctx.save_for_backward to support symints/ints
#138737 opened
Oct 23, 2024 -
Switch back to the default checkout action
#138739 opened
Oct 23, 2024 -
[inductor] Allow unbacked symint fallback in stride reordering
#138741 opened
Oct 23, 2024 -
Raise error for int64 and bool dtypes in nanmean, even for empty tensors
#138745 opened
Oct 23, 2024 -
Fix custom obj being treated as input in export
#138749 opened
Oct 23, 2024 -
[SymmetricMemory] expose signal_pads as tensors in Python
#138754 opened
Oct 23, 2024 -
[SymmetricMemory] introduce a binding for cuMemset32Async
#138755 opened
Oct 23, 2024 -
[inductor] don't fuse two nodes if likely increase peak memory
#138756 opened
Oct 23, 2024 -
Create ciflow/inductor-periodic
#138763 opened
Oct 23, 2024 -
[export] Add support for symbool to make it usable for torch.cond
#138765 opened
Oct 23, 2024 -
config: create Config objects with JK support
#138766 opened
Oct 23, 2024 -
justknobs: Remove JustKnobsConfig and justknobs_feature
#138767 opened
Oct 23, 2024 -
[ONNX] Use TracedONNXFunction op signature to promote inputs to tensors
#138770 opened
Oct 23, 2024 -
[dynamo] add SymNode bitwise and/or
#138777 opened
Oct 24, 2024 -
AOTI Minifier demo
#138780 opened
Oct 24, 2024 -
Add BFloat16 support and use a new pack method for flash attention forward kernel
#138783 opened
Oct 24, 2024 -
[pytorch][inductor] Adding tf32x3 to matmul
#138785 opened
Oct 24, 2024 -
Added option to control number of kernel options displayed
#138788 opened
Oct 24, 2024 -
Add SwiGLU Activation class
#138790 opened
Oct 24, 2024 -
[Docs] Optimize parameter description to declare allowed type (3/N)
#138798 opened
Oct 24, 2024 -
Use clang18 image for ASAN workflows
#138801 opened
Oct 24, 2024 -
[Intel GPU] Add device guard for XPU structured operator in torchgen
#138802 opened
Oct 24, 2024 -
Fix max_unpool2d op signature with stride and padding inputs
#138805 opened
Oct 24, 2024 -
[inductor] support linear+binary foldinig for freezing path
#138807 opened
Oct 24, 2024 -
Fix bug of torch.nn.functional.kl_div when broadcast happened
#138810 opened
Oct 24, 2024 -
dynamo: guard on FSDP module parameters
#138819 opened
Oct 24, 2024 -
[Testing] update XLA pin
#138825 opened
Oct 24, 2024 -
[Testing] Update torchbench pin
#138826 opened
Oct 24, 2024 -
Prioritize building with libgomp over libomp
#138834 opened
Oct 24, 2024 -
[cond] make cond not throw warnings on constant pred in eager mode
#138837 opened
Oct 24, 2024 -
Move inner loop of _create_symbolic_sizes_strides_storage_offset into its own method
#138843 opened
Oct 24, 2024 -
Simplify _compute_symbolic_stride()
#138844 opened
Oct 24, 2024 -
[export] preserve & respect dynamic decorators on input tensors
#138850 opened
Oct 24, 2024 -
Improvements for associative_scan - slicing of xs
#138858 opened
Oct 24, 2024 -
(wip) [CUTLASS] pull out static shapes as kernel args
#138859 opened
Oct 24, 2024 -
Changed calls to yaml.load to explicitly use the safe loader.
#138861 opened
Oct 24, 2024 -
[Pipelining] Relax multi-stage constraint
#138862 opened
Oct 24, 2024 -
Debug test failure for separate I, W execution
#138863 opened
Oct 24, 2024 -
Update tensorify pass to specialize symfloats we didn't tensorify away
#138868 opened
Oct 25, 2024 -
[Dynamo] destroy old extra state
#138870 opened
Oct 25, 2024 -
[AOTInductor] Fix memory access in update_constant_buffer on CPU
#138871 opened
Oct 25, 2024 -
[Pipelining] Update schedules to use I, B actions.
#138886 opened
Oct 25, 2024 -
[Test][Do not merge] Add BRGEMM API versioning to be compatible with different oneDNN versions
#138887 opened
Oct 25, 2024 -
[Test][Do not merge] Upgrade oneDNN to v3.6
#138889 opened
Oct 25, 2024 -
[Dynamo] Add explaination for not support usecase to fix todo in dynamo
#138890 opened
Oct 25, 2024 -
Add some error messages for flexattention
#138891 opened
Oct 25, 2024 -
Add CUDA 12.6 to Binaries Matrix
#138899 opened
Oct 25, 2024 -
rocm CI fixes
#138900 opened
Oct 25, 2024 -
[Intel GPU][Windows] Remove hardcoded SYCL runtime version
#138904 opened
Oct 25, 2024 -
Disable c10::optional macros
#138912 opened
Oct 25, 2024 -
Allow PyTorch to be built with USE_DISTRIBUTED=0
#138913 opened
Oct 25, 2024 -
only unspecialize float during eager
#138915 opened
Oct 25, 2024 -
unspecialize float for all torch.compile callers
#138918 opened
Oct 25, 2024 -
[aoti] Create dir when using aoti_compile and package if it isn't already there
#138919 opened
Oct 25, 2024 -
always unspecialize float
#138922 opened
Oct 25, 2024 -
Move pippy to training IR
#138923 opened
Oct 25, 2024 -
[Pipelining] Optimize ready_to_schedule logic
#138924 opened
Oct 25, 2024 -
[Pipelining] Remove unused special case from simulator
#138928 opened
Oct 25, 2024 -
[export] add mark_unbacked to export dynamic_shapes API
#138929 opened
Oct 25, 2024 -
TEST IGNORE: async compile
#138930 opened
Oct 25, 2024 -
Gaussian nll loss scalar variance support
#138931 opened
Oct 25, 2024 -
Fix weights_only for BUILD instructions for user allowlisted objects with __slots__
#138936 opened
Oct 25, 2024 -
[RFC] Add caching heuristic for skipping fast graphs
#138938 opened
Oct 25, 2024 -
[Inductor] refactor the require_stride_order logic in GraphLowering
#138946 opened
Oct 25, 2024 -
[ROCm] CK Flash Attention Backend
#138947 opened
Oct 25, 2024 -
[TF32][Inductor] Account for TF32 in `test_inductor_layout_optimization_input_mutations`
#138948 opened
Oct 25, 2024 -
Enable py dispatcher in FakeTensorMode.__enter__
#138953 opened
Oct 25, 2024 -
config: Add env_name_default and env_name_force to Config
#138956 opened
Oct 25, 2024 -
Turn off PRINT_AUTOTUNE by default
#138963 opened
Oct 26, 2024 -
Tensor .cuda() very slow with specific array sizes
#138964 opened
Oct 26, 2024 -
Use -Weverything
#138966 opened
Oct 26, 2024 -
Don't uselessly recompute axiom dict every static eval call
#138967 opened
Oct 26, 2024 -
support nesting of suppress_guards, suppress guards when generated compiled autograd graph
#138968 opened
Oct 26, 2024 -
[BE] typing for decorators - library
#138969 opened
Oct 26, 2024 -
I fixed the conflict between Qt and libtorch
#138982 opened
Oct 26, 2024 -
Caomengxuan666 patch 2
#138983 opened
Oct 26, 2024 -
Fix `USE_STATIC_MKL` lost functionality
#138996 opened
Oct 26, 2024 -
[c10d] Remove ProcessGroupGloo + CUDA tests
#138998 opened
Oct 26, 2024 -
Unify shallow_copy_and_detach overloads by passing c10::VariableVersion
#138999 opened
Oct 27, 2024 -
automatic_dynamic_local_pgo
#139001 opened
Oct 27, 2024 -
[PGNCCL] Make sure we do not use split for P2P comm creation
#139013 opened
Oct 27, 2024 -
Revert "[dynamo] Simplify creation of VariableTrackers (#135714)"
#139014 opened
Oct 27, 2024 -
removed stale use_lazy_graph_module
#139015 opened
Oct 27, 2024 -
Check for valid inputs to legacy _remove_batch_dim()
#139016 opened
Oct 27, 2024 -
Add missing description for input parameters
#139020 opened
Oct 27, 2024 -
[Intel GPU] Current libtorch_xpu.so double linked libtorch_xpu_ops.a,
#139024 opened
Oct 28, 2024 -
[Intel GPU] Support RegisterXPU.cpp codegen and compile for the in-tree XPU structured GEMM OPs.
#139025 opened
Oct 28, 2024 -
[WIP][AOTI XPU] Update torch-xpu-ops to extend c_shim_xpu layer with out-of-tree ATen OPs
#139026 opened
Oct 28, 2024 -
Fix torch.std_mean return Nan mean value, dismatch with torch.mean
#139035 opened
Oct 28, 2024 -
[rfc][dynamo] "skip_guard_eval" stance for power users
#139038 opened
Oct 28, 2024 -
Update torch-xpu-ops commit pin
#139041 opened
Oct 28, 2024 -
fix collective name of RECORD_PARAM_COMMS_DATA
#139042 opened
Oct 28, 2024 -
enable concat linear with mkldnn linear by flag
#139048 opened
Oct 28, 2024 -
Update slow tests
#139051 opened
Oct 28, 2024 -
[AOTI] Use `len(serialized_weights)` when calculating `consts_size`
#139054 opened
Oct 28, 2024 -
[aotd] Fuse tangents subclasses runtime traversals
#139068 opened
Oct 28, 2024 -
[Flex Attention] Dynamic shapes fix
#139069 opened
Oct 28, 2024 -
Type check delta value in HuberLoss
#139070 opened
Oct 28, 2024 -
[aoti] Add masked_select to cshim
#139071 opened
Oct 28, 2024 -
[PyTorch] Move bf16_gemv_trans to ReducedPrecisionFloatGemvFastPathKernel
#139081 opened
Oct 28, 2024 -
[PyTorch] Add efficient isnan for NEON float
#139082 opened
Oct 28, 2024 -
[PyTorch] Add efficient isnan for NEON half
#139083 opened
Oct 28, 2024 -
[PyTorch] Extract value_type-generic NEON Vectorized<Half> functions to CRTP base class
#139084 opened
Oct 28, 2024 -
[ROCm] Fix largeIndexBlockSize
#139087 opened
Oct 28, 2024 -
[PyTorch] Add Vectorized<c10::BFloat16> specialization for ARM
#139090 opened
Oct 28, 2024 -
[DO NOT LAND][EXPERIMENT][dynamo] Remove all instances in `dynamo_expected_failures` that are from `test/dynamo`
#139091 opened
Oct 28, 2024 -
[AOTI][refactor] Move stack allocation related configs
#139093 opened
Oct 28, 2024 -
[invoke_subgraph][aot_autograd_cache] Cache AC HOP
#139094 opened
Oct 28, 2024 -
[aotd] coerce_same_metadata_as_tangent with expected_type for e.g.AsyncCollectiveTensor
#139095 opened
Oct 28, 2024 -
[WIP] functional autograd + compiled autograd
#139098 opened
Oct 28, 2024 -
Fix and update the Quantization docs
#139100 opened
Oct 28, 2024 -
[Dynamo changes] Invoke Quant
#139101 opened
Oct 28, 2024 -
[Inductor changes] Invoke Quant
#139102 opened
Oct 28, 2024 -
TESTING
#139105 opened
Oct 28, 2024 -
Add utility to get all unsafe globals in checkpoint
#139106 opened
Oct 28, 2024 -
Add a hook for when AC caches a saved tensor
#139109 opened
Oct 28, 2024 -
Update linter targets to Python-3.9
#139119 opened
Oct 28, 2024 -
[compiled autograd][functional] partially inline into accumulate grad
#139121 opened
Oct 28, 2024 -
[draft] add a c10d/PG restart UT
#139123 opened
Oct 28, 2024 -
[export] Modify swap to take in lambda
#139127 opened
Oct 28, 2024 -
update onnx==1.17.0
#139128 opened
Oct 28, 2024 -
[invoke_subgraph] Generate fake_inputs correctly
#139130 opened
Oct 28, 2024 -
Enable cppcoreguidelines-special-member-functions
#139132 opened
Oct 29, 2024 -
[executorch hash update] update the pinned executorch hash
#139133 opened
Oct 29, 2024 -
[profiler] Annotate triton kernels with kernel hash
#139135 opened
Oct 29, 2024 -
[inductor] patterns to remove pointless view/permute pairs
#139136 opened
Oct 29, 2024 -
TunableOp assume dense inputs for size calculations
#139137 opened
Oct 29, 2024 -
Added aten.bernoulli.p and aten.bernoulli.default decompositions
#139141 opened
Oct 29, 2024 -
TEST CHANGE TO RVALUE REF
#139146 opened
Oct 29, 2024 -
Relax python linter requirements
#139148 opened
Oct 29, 2024 -
[AOTI] Ignore .o files in package_aoti
#139153 opened
Oct 29, 2024 -
[AOTI] Switch OSS dashboard to use aoti_compile_and_package
#139154 opened
Oct 29, 2024 -
Remove const fromDLPack overload
#139156 opened
Oct 29, 2024 -
[Autotune Inductor] Some clean up and dataclassing
#139157 opened
Oct 29, 2024 -
[PyTorch] Migrate bf16 gemv fast path kernel from intrinsics to vec::Vectorized
#139159 opened
Oct 29, 2024 -
[invoke_subgraph] User facing API to support arbitrary args and kwargs
#139162 opened
Oct 29, 2024 -
Use std::string_view in get_fully_qualified_type_name
#139164 opened
Oct 29, 2024 -
[Test] Refactor using make_tensor replace generate input logic
#139168 opened
Oct 29, 2024 -
Use the device interface for detecting Triton availability
#139171 opened
Oct 29, 2024 -
[Inductor][CPU] Enable the oneDNN Linear fusion for special case
#139172 opened
Oct 29, 2024 -
Use static_assert to detect get_type_index used in device code
#139173 opened
Oct 29, 2024 -
Update grad_scaler.py(Just for this bug)
#139174 opened
Oct 29, 2024 -
Use cached dnnl::stream in GpuStreamManager
#139176 opened
Oct 29, 2024 -
Export XPU oneDNN header to the public
#139177 opened
Oct 29, 2024 -
[dim_order] raised runtime error when tensor has ambiguous dim order
#139180 opened
Oct 29, 2024 -
Tests Generelization for multiple accelerator devices
#139184 opened
Oct 29, 2024 -
[Tests]: replace deprecated `pkg_resources` dependency with `importlib` in stdlib
#139186 opened
Oct 29, 2024 -
Bring back `io.BytesIO` type annotation for the output destination of `torch.onnx.export`
#139187 opened
Oct 29, 2024 -
Revert D65030974
#139194 opened
Oct 29, 2024 -
Support view_as() on NJT; allow nested int swapping
#139196 opened
Oct 29, 2024 -
Add check for unsupported sprase layout to resolve false INTERNAL ASSERT FAILED
#139198 opened
Oct 29, 2024 -
[Inductor][CI] Add numpy-2.X shard
#139199 opened
Oct 29, 2024 -
turn off USE_MIMALLOC_ON_MKL temporary.
#139204 opened
Oct 29, 2024 -
Add Gaudi support to benchmarks/dynamo
#139205 opened
Oct 29, 2024 -
[triton] Update pin for PyTorch 2.6/Triton 3.2
#139206 opened
Oct 29, 2024 -
[PyTorch] Build bf16 gemv fast path & entry points for non-ARM architectures too
#139208 opened
Oct 29, 2024 -
Fix custom obj being input
#139209 opened
Oct 29, 2024 -
Add Autograd Fallback for MTIA
#139211 opened
Oct 29, 2024 -
[real_tensor_prop] Infer Fake kernels during real tensor prop
#139213 opened
Oct 29, 2024 -
ModuleTracker: Add explicit garbage collection
#139214 opened
Oct 29, 2024 -
[dynamo] ignore False/None callback in fail_on_recompile/force_backend stances
#139215 opened
Oct 29, 2024 -
[invoke_subgraph] Re-enable fake tensor model in the fake tensor impl
#139216 opened
Oct 29, 2024 -
[mergebot] Add ci-no-td label on revert
#139218 opened
Oct 29, 2024 -
[export] deregister hooks and see what breaks
#139219 opened
Oct 29, 2024 -
Hook up bf16_gemv_trans to x86 bf16 GEMM
#139220 opened
Oct 29, 2024 -
Add utility to get all unsafe globals in checkpoint (no pickletools dependency)
#139221 opened
Oct 29, 2024 -
[export] Update min_val and max_val to Optional[int] in serialization.
#139223 opened
Oct 29, 2024 -
[AOTI] Update zero size computation in clone_preserve_strides
#139224 opened
Oct 29, 2024 -
Update cpuinfo to 8df44962
#139225 opened
Oct 29, 2024 -
[experimental] async-tp impl with cutlass-based, progress aware kernel
#139227 opened
Oct 29, 2024 -
[inductor] Enable AMD cooperative reduction tests
#139230 opened
Oct 29, 2024 -
[PyTorch] Fix lack of alias annotations for dropout variants
#139231 opened
Oct 29, 2024 -
Move pippy to training IR
#139233 opened
Oct 29, 2024 -
Fx graph always return tuple in fuse_as_graphmodule
#139236 opened
Oct 29, 2024 -
Fix existing lint issues in ir.py
#139237 opened
Oct 29, 2024 -
typing ir.py - Disallow untyped defs for ir.py
#139238 opened
Oct 29, 2024 -
Classify miss-inplaced tensors in logs.
#139240 opened
Oct 30, 2024 -
Do not print "we strongly recommend" for CPU builds
#139243 opened
Oct 30, 2024 -
[do not review] saving things for NJT metadata cache
#139247 opened
Oct 30, 2024 -
[3/N] Fix clang-tidy warnings in python_variable_methods.cpp
#139248 opened
Oct 30, 2024 -
Add conjugate method on SymFloat
#139249 opened
Oct 30, 2024 -
Don't set replacement if lhs is in the free symbols of the rhs
#139250 opened
Oct 30, 2024 -
[DCP] Unit Test to validate the stateful and non-stateful loads
#139251 opened
Oct 30, 2024 -
[4/N] Fix Wextra-semi warning
#139256 opened
Oct 30, 2024 -
[WIP] Add SYCL version control in cmake to keep BC
#139258 opened
Oct 30, 2024 -
[logging] Add new utilities to record and log compilation metrics
#139259 opened
Oct 30, 2024 -
[logging] Superficial use of new metrics_context in CompilationMetrics logging
#139260 opened
Oct 30, 2024 -
Add sv starts/ends_with
#139261 opened
Oct 30, 2024 -
[logging] Use metrics_context.timed to log CompilationMetrics time fields
#139263 opened
Oct 30, 2024 -
Fix index_reduce mean error when dtype is bfloat16 (#139242)
#139264 opened
Oct 30, 2024 -
[4/N] Don't skip ASAN on some tests
#139265 opened
Oct 30, 2024 -
[Intel GPU] Support RegisterSparseXPU.cpp codegen.
#139267 opened
Oct 30, 2024 -
[Dynamo] Fix graph break when `tensor.split()` is called within a device context manager
#139270 opened
Oct 30, 2024 -
[Quant][Inductor] modify QConv/QLinear + broadcast add fusion
#139271 opened
Oct 30, 2024 -
Lsan2
#139273 opened
Oct 30, 2024 -
[fx] split_module subgraph should always have an output node
#139275 opened
Oct 30, 2024 -
Add rvalue overload of THPVariable_Wrap
#139278 opened
Oct 30, 2024 -
[dyanmo] fix `deque.maxlen` support when extending elements from left
#139279 opened
Oct 30, 2024 -
Block more keys from config serialization
#139285 opened
Oct 30, 2024 -
[9/N] Fix extra warnings brought by clang-tidy-17
#139286 opened
Oct 30, 2024 -
[easy] Add start event metadata to collected metadata for PT2 Compile Events
#139289 opened
Oct 30, 2024 -
[Easy] GraphTransformObserver Refactoring
#139292 opened
Oct 30, 2024 -
[Easy] Refactor post grad application of passes
#139293 opened
Oct 30, 2024 -
[Easy] Add joint graph passes, fallback_random to bisector
#139295 opened
Oct 30, 2024 -
[ch] Move close_nonexistent_disable_issues.py queries to CH
#139296 opened
Oct 30, 2024 -
[ROCM] Increase hipBLASLt default workspace size
#139300 opened
Oct 30, 2024 -
[BE] Change _marked_safe_globals_list to set
#139303 opened
Oct 30, 2024 -
Add bfloat16 support for per tensor/channel cpu/cuda fake quantize ops
#139306 opened
Oct 30, 2024 -
export graph for memory simulator
#139308 opened
Oct 30, 2024 -
Add option to dynamo_timed and chromium_event_logger for logging pt2 compile events
#139309 opened
Oct 30, 2024 -
[Profiler] Create Auto-Trace Frontend for Trace ID
#139310 opened
Oct 30, 2024 -
Modified the curl|bash code to make it palatable to security-checking…
#139311 opened
Oct 30, 2024
174 Issues closed by 46 people
-
[Upstream Triton] attrs_description `KeyError: 'cls'`
#139179 closed
Oct 30, 2024 -
convert torch.jit.script model to ONNX get wrong result
#88072 closed
Oct 30, 2024 -
Torch 1.13 - 2.3 Onnx Scope name not correct!
#90439 closed
Oct 30, 2024 -
[Enhancement] Allow Dict[str, Any] format for `transforms` argument in `torchvision.transforms.v2.Compose`
#139291 closed
Oct 30, 2024 -
[MPS] Typo in error message for supported autocast type
#139190 closed
Oct 30, 2024 -
Adjust install_user script for Ubuntu 24.04 support
#138812 closed
Oct 30, 2024 -
DISABLED test_device_mode_ops_sparse_mm_reduce_cpu_float32 (__main__.TestDeviceUtilsCPU)
#132552 closed
Oct 30, 2024 -
DISABLED test_pre_dispatch_export_auto_functionalize_simple_cuda_float32 (__main__.TestHOPCUDA)
#123267 closed
Oct 30, 2024 -
Fix
#139276 closed
Oct 30, 2024 -
Cannot build flash-attention with torch==2.5.0
#139067 closed
Oct 30, 2024 -
Bug in the description of RMSNorm
#139274 closed
Oct 30, 2024 -
[dynamo] Format string with __class__
#118675 closed
Oct 30, 2024 -
[DEBUG] Strange behavior observed with PyTorch 2.4.0 + Windows + CPU inference
#131958 closed
Oct 30, 2024 -
DISABLED [WORKFLOW_NAME] / [PLATFORM_NAME] / [JOB_NAME]
#139277 closed
Oct 30, 2024 -
DISABLED test_kineto_profiler_with_environment_variable (__main__.TestProfiler)
#107383 closed
Oct 30, 2024 -
RuntimeError: Storage size calculation overflowed with sizes=[1, 4605674770382112385, 4128, 128]
#139181 closed
Oct 30, 2024 -
[Compiled Autograd] Usage of AOTAutograd cause all hook functions to get pushed to the end of the graph
#138538 closed
Oct 30, 2024 -
Distributed Tensor raises error with torch 2.5
#138742 closed
Oct 30, 2024 -
Program does not exit when using `torch.distributed.tensor.distribute_module`.
#139060 closed
Oct 30, 2024 -
[FSDP2] hsdp2 sometimes does not free grad immeidately after opt.step
#139234 closed
Oct 30, 2024 -
`max_unpool2d` returns a tensor with negative dimension
#73154 closed
Oct 30, 2024 -
[ONNX] Test _building.py
#138761 closed
Oct 29, 2024 -
Vision Dataset : FGVC_AirCraft not a callable module in torchvision.datasets
#139226 closed
Oct 29, 2024 -
Release 2.5.1 validations checklist and cherry-picks
#138876 closed
Oct 29, 2024 -
MacOS runners queue again
#138724 closed
Oct 29, 2024 -
Automatic linter fixing for PRs
#133715 closed
Oct 29, 2024 -
[aarch64 linux ci] missing sccache binaries for aarch64 linux platform
#121559 closed
Oct 29, 2024 -
Release 2.5.0 validations checklist and cherry-picks
#137492 closed
Oct 29, 2024 -
NJT Embedding backward
#138352 closed
Oct 29, 2024 -
DISABLED test_fsdp_unsupported_module_cls (__main__.TestFSDPMiscMultiThread)
#137948 closed
Oct 29, 2024 -
xpu: huggingface levit test_retain_grad_hidden_states_attentions test hangs on exit on PVC
#136007 closed
Oct 29, 2024 -
Output tensor of torch.split_with_sizes_copy has version 0 on CUDA while version 1 on CPU/XPU.
#136303 closed
Oct 29, 2024 -
DISABLED test_unbacked_bindings_for_divisible_u_symint (__main__.TestExport)
#138586 closed
Oct 29, 2024 -
DISABLED test_arange2_dynamic_shapes_cuda (__main__.DynamicShapesGPUTests)
#127343 closed
Oct 29, 2024 -
Multiprocessing in Dataloaders reduces variaty
#138989 closed
Oct 29, 2024 -
Bug in Maxpool2d Operator Exhibited During Metamorphic Testing
#139078 closed
Oct 29, 2024 -
[ROCm] MI300X Tunable ops causes 100GB of Memory Leak leading to OOM
#138532 closed
Oct 29, 2024 -
some function in torch._utils was skipped
#138897 closed
Oct 28, 2024 -
Dynamo CI Shard naming proposal
#118127 closed
Oct 28, 2024 -
RReLU doc doesn't specify the eval mode behaving just like LeakyReLU
#82677 closed
Oct 28, 2024 -
Should `_native_batch_norm_legit_functional` be in native_functions.yaml?
#113483 closed
Oct 28, 2024 -
Incorrect Type for `devices` Parameter in `benchmark_utilization` Function
#136697 closed
Oct 28, 2024 -
DISABLED test_execution_trace_with_kineto_cpu (__main__.TestExecutionTraceCPU)
#138071 closed
Oct 28, 2024 -
DISABLED test_source_multithreaded_close_in_scope_work_in_main_thread_False (__main__.TestProfiler)
#119364 closed
Oct 28, 2024 -
DISABLED test_source_multithreaded_multiple_preexisting_work_in_main_thread_False (__main__.TestProfiler)
#119526 closed
Oct 28, 2024 -
DISABLED test_source_multithreaded_complex_work_in_main_thread_False (__main__.TestProfiler)
#119490 closed
Oct 28, 2024 -
DISABLED test_source_multithreaded_open_in_scope_work_in_main_thread_False (__main__.TestProfiler)
#119537 closed
Oct 28, 2024 -
Missing Torch 2.5 wheels for Python 3.8
#138979 closed
Oct 28, 2024 -
tlparse: aot_compile doesn't setup compile ids
#123759 closed
Oct 28, 2024 -
Attempting to use hipBLASLt on a unsupported architecture!
#138067 closed
Oct 28, 2024 -
Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
#139052 closed
Oct 28, 2024 -
arange.start is annotated with scalar on inputs but got torch.SymInt
#138827 closed
Oct 28, 2024 -
AttributeError: Can't pickle local object 'make_opaque_unary_fn.<locals>.OpaqueUnaryFn'
#138070 closed
Oct 28, 2024 -
Changes in type annotations break downstream code
#138478 closed
Oct 28, 2024 -
DISABLED test_non_blocking_with_eager_init (__main__.ProcessGroupNCCLGroupTest)
#118827 closed
Oct 28, 2024 -
[c10d] ProcessGroupNCCL support setting custom stream for communication
#138074 closed
Oct 28, 2024 -
RuntimeError: mat1 and mat2 shapes cannot be multiplied
#139037 closed
Oct 28, 2024 -
pytorch 2.5.0 can not import torch
#138991 closed
Oct 28, 2024 -
Assertion error when running example in Pippy repo
#139028 closed
Oct 28, 2024 -
ValueError: Expected input batch_size (60) to match target batch_size (10)
#139031 closed
Oct 28, 2024 -
DISABLED test_reorder_compute_for_overlap (__main__.TestComputeCommReorderingMultiProc)
#113249 closed
Oct 28, 2024 -
DISABLED test_ddp_apply_optim_in_backward_ignored_params (__main__.TestDistBackendWithSpawn)
#106361 closed
Oct 28, 2024 -
[RFC] Allow DeviceMesh to use ncclCommSplit
#137017 closed
Oct 27, 2024 -
[PGNCCL][BUG] mutex acquired in recursive way may deadlock
#138995 closed
Oct 27, 2024 -
Weights only load numpy.float64 fails in torch 2.4.0
#138985 closed
Oct 27, 2024 -
Illegal Memory Access w/ efficient Attention + compile
#138772 closed
Oct 27, 2024 -
torch.cond should support omitting arguments to pass in when it is empty
#138150 closed
Oct 26, 2024 -
DISABLED test_autograd_cpp_node_data_dependent (__main__.TestCompiledAutograd)
#125579 closed
Oct 26, 2024 -
torch.cond would fail on eager mode
#138664 closed
Oct 26, 2024 -
Dynamo fails to propagate updates to nonlocal variables across different functions
#138112 closed
Oct 26, 2024 -
[dynamo] Speculation log diverges when Lazy Module is used in a certain way
#138489 closed
Oct 26, 2024 -
Leverage DLPack-based construction of PyTorch tensors to avoid costly element-by-element copy
#120614 closed
Oct 26, 2024 -
torch._int_mm accuracy issue on AMD CPU
#136746 closed
Oct 26, 2024 -
2.5.0 PyPI arm64 distribution logs `cpuinfo` error on import
#138333 closed
Oct 25, 2024 -
2.5.0 stable + cuda 12.4 broken; unable to to find cuda libs (works 2.4.1)
#138324 closed
Oct 25, 2024 -
[inductor][cpu]drq performance failure in 2024-09-30 nightly release
#137686 closed
Oct 25, 2024 -
[FlexAttention] Using FlexAttention with DDP complains about a "higher order optimizer"
#137481 closed
Oct 25, 2024 -
Mergebot often fails to fetch DrCI status
#138057 closed
Oct 25, 2024 -
[PTD][Doc] Review model parallel tutorial
#138835 closed
Oct 25, 2024 -
[PTD][DDP] Review the DDP Video Tutorials and modify the scripts so it can be run
#138833 closed
Oct 25, 2024 -
Pipelining zero bubble and activation checkpointing bug
#136766 closed
Oct 25, 2024 -
DISABLED test_ddp_apply_optim_in_backward (__main__.TestDistBackendWithSpawn)
#137761 closed
Oct 25, 2024 -
DISABLED test_ddp_apply_optim_in_backward_grad_as_bucket_view_false (__main__.TestDistBackendWithSpawn)
#137766 closed
Oct 25, 2024 -
DISABLED test_metadata_consistency_check (__main__.DTensorMeshTest)
#131598 closed
Oct 25, 2024 -
DISABLED test_scatter_uneven (__main__.DeviceMeshCollectiveTest)
#96282 closed
Oct 25, 2024 -
DISABLED test_non_blocking_init (__main__.ProcessGroupNCCLGroupTest)
#131203 closed
Oct 25, 2024 -
Illegal instruction (core dumped) in pytorch 1.10 version
#136973 closed
Oct 25, 2024 -
DISABLED test_ddp_control_flow_different_across_ranks (__main__.TestDistBackendWithSpawn)
#77992 closed
Oct 25, 2024 -
weights_only=True allow set as well as dict
#138851 closed
Oct 25, 2024 -
DISABLED test_all_gather_object_subgroup (__main__.TestDistBackendWithSpawn)
#98508 closed
Oct 25, 2024 -
DISABLED test_barrier_group_cuda (__main__.TestDistBackendWithSpawn)
#137751 closed
Oct 25, 2024 -
torch.compile + triton.jit: NameError: name 'constexpr' is not defined
#138509 closed
Oct 25, 2024 -
Compiling flex attention with batch dependent block mask and dynamic shapes
#136196 closed
Oct 25, 2024 -
[RFC] Allow lazy global init + eager subgroup init
#137018 closed
Oct 24, 2024 -
ObserverTest.TestMultipleNetBase intermittently segfaults
#9137 closed
Oct 24, 2024 -
Caffe2 has undeclared dependencies
#54497 closed
Oct 24, 2024 -
caffe2 load onnx model error:IndexError: Input 475 is undefined!
#52666 closed
Oct 24, 2024 -
[ONNX] BUG for Upsample operator export re-used by caffe2
#22070 closed
Oct 24, 2024 -
Export torch.cat to ONNX with Dynamic shape does not work on GPU
#24899 closed
Oct 24, 2024 -
[Caffe2/ONNX] ONNX LSTM Loading
#21843 closed
Oct 24, 2024 -
convert Onnx Slice operator to caffe2 failed
#20858 closed
Oct 24, 2024 -
convert the model from pytorch to onnx to caffe2, but get a lower accuracy than before
#20154 closed
Oct 24, 2024 -
The latest version of onnx-caffe2 does not support “pow” ?
#18475 closed
Oct 24, 2024 -
Error in converting pytorch model to caffe2 using onnx framework
#17154 closed
Oct 24, 2024 -
Undefined GPU Reference when importing torch with caffe2.onnx.backend for cpu-only pytorch
#16500 closed
Oct 24, 2024 -
No Schema registered for ConstantOfShape with domain_version of 9
#16363 closed
Oct 24, 2024 -
caffe2::onnx::OnnxExporter::Caffe2OpToOnnxNodes failure
#16298 closed
Oct 24, 2024 -
caffe2::onnx::OnnxExporter::CreateGemmNodes fails on bvlc_alexnet
#16296 closed
Oct 24, 2024 -
Assert In function importUnsqueeze
#16248 closed
Oct 24, 2024 -
[ONNX CI] TestCaffe2End2End.test_squeezenet occasional error
#14670 closed
Oct 24, 2024 -
[onnx][caffe2] Is there schedule to support onnxwhile in onnx exporter?
#13122 closed
Oct 24, 2024 -
onnx_graph_to_caffe2_net takes a model, not a graph
#13726 closed
Oct 24, 2024 -
[Caffe2] Failed to load ONNX model with python 3.6
#15998 closed
Oct 24, 2024 -
How to run a pytorch-onnx-caffe2 model on GPU?
#12702 closed
Oct 24, 2024 -
Error occuring while converting mnist or cifar model from caffe2 to onnx
#12704 closed
Oct 24, 2024 -
[Caffe2] Error importing ConvTranspose2d to Caffe2 with ONNX
#10667 closed
Oct 24, 2024 -
pytorch does not compatible with caffe2
#10249 closed
Oct 24, 2024 -
Does this mean onnx do not support spatialBN operator?
#10062 closed
Oct 24, 2024 -
[onnx-caffe2] Caffe2 only supports padding 2D Tensor
#9411 closed
Oct 24, 2024 -
Meet “No module named'tools. setup_helpers‘“ when installing caffe2
#134640 closed
Oct 24, 2024 -
LINK : fatal error LNK1102: out of memory ninja: build stopped: subcommand failed. "Caffe2 building failed"
#131562 closed
Oct 24, 2024 -
Caffe2 usage of cuDNN RNNv6 API blocks upgrade to cuDNN v9+
#124790 closed
Oct 24, 2024 -
can not find the caffe2::Threads
#124038 closed
Oct 24, 2024 -
No module named 'caffe2' when using `add_scalar` with string
#119195 closed
Oct 24, 2024 -
The derivation of swish activation function is wrong.
#110815 closed
Oct 24, 2024 -
Pytorch - cpu only & caffe2 build failing
#105655 closed
Oct 24, 2024 -
Issues building with caffe2 enabled
#100960 closed
Oct 24, 2024 -
[Bug] Circular Import
#83710 closed
Oct 24, 2024 -
[caffee2] Windows build / 'metanet_pb2' (a circular import) Anaconda
#83379 closed
Oct 24, 2024 -
Error occurred , when compile source code setting BUILD_CAFFE2=ON
#78034 closed
Oct 24, 2024 -
Failed to build `convert_and_benchmark.cc` due to missing `net_observer_reporter_print.h`.
#75186 closed
Oct 24, 2024 -
Multiple new caffe2-related build failures.
#73074 closed
Oct 24, 2024 -
Caffe2 uses FFMPEG functions that are deprecated in FFMPEG 4.0 and gone in 5.0
#72254 closed
Oct 24, 2024 -
Remove Caffe2
#72536 closed
Oct 24, 2024 -
gloo_test test_close_connection not working as intended due to unwanted comma
#70609 closed
Oct 24, 2024 -
model trace error
#43196 closed
Oct 24, 2024 -
[Caffe2] Missing CMAKE_CUDA_COMPILE_WHOLE_COMPILATION
#18524 closed
Oct 24, 2024 -
cannot build with tensorrt
#60228 closed
Oct 24, 2024 -
Caffe2 building failure when turning off USE_OPENMP
#17853 closed
Oct 24, 2024 -
Unicode support for the MS Windows platform
#13565 closed
Oct 24, 2024 -
"No module named 'tools.setup_helpers'" when use pip install caffe2
#10155 closed
Oct 24, 2024 -
[caffe2] Run resnet50_trainer.py error between 2 machines using GLOO/Redis and ibverbs
#6422 closed
Oct 24, 2024 -
[Caffe2] [feature request] How to freeze a layer ? (Selective Backward Propagation)
#6581 closed
Oct 24, 2024 -
DISABLED test_sink_waits (__main__.TestComputeCommReorderingMultiProc)
#121236 closed
Oct 24, 2024 -
DISABLED test_sink_waits_raise_comms (__main__.TestComputeCommReorderingMultiProc)
#121316 closed
Oct 24, 2024 -
DISABLED test_raise_comms (__main__.TestComputeCommReorderingMultiProc)
#120946 closed
Oct 24, 2024 -
sympy.ccode doesn't work with FloorDiv in torch.utils._sympy.functions
#138523 closed
Oct 24, 2024 -
[PRE-EMPTIVE] Experimenting with new runners `linux.aws.a100` on `inductor-perf-compare.yml`
#138708 closed
Oct 24, 2024 -
not working in site
#138808 closed
Oct 24, 2024 -
[Break XPU] C10_UNUSED change and newly add Inductor UTs break XPU CI.
#138577 closed
Oct 24, 2024 -
Flexattention: CUDA error: an illegal memory access was encountered
#134852 closed
Oct 24, 2024 -
`slow` workflow has been broken for 4+ weeks
#136694 closed
Oct 24, 2024 -
flex attention backward pass vmap error only when tensordict imported?
#134004 closed
Oct 24, 2024 -
[FlexAttention] Compiled `flex_attention' crashes when training with mixed precision on RTX 3090
#135723 closed
Oct 24, 2024 -
Flexattention: compilation fails when using block mask
#135206 closed
Oct 24, 2024 -
[Flex attention] RuntimeError with vmap when using torch.compile in create_mask
#136427 closed
Oct 24, 2024 -
FlexAttention errors with dynamic shapes when closing over a symbolic shape
#136914 closed
Oct 24, 2024 -
[dynamo] Dynamo triggering on list comprehension <listcomp> frames and graph breaking
#138654 closed
Oct 24, 2024 -
_functional_collectives.all_gather_into_tensor cannot compile in aot_module_simplified
#112009 closed
Oct 24, 2024 -
The calculation process of key_padding_mask does not match the document description
#137887 closed
Oct 24, 2024 -
[cuSPARSELt] `test_sp24_compile` appears broken on sm80/sm90
#138769 closed
Oct 23, 2024 -
DISABLED test_reduce_scatter_v_cuda (__main__.TestDistBackendWithSpawn)
#137325 closed
Oct 23, 2024 -
The calculation process of key_padding_mask does not match the document description
#138566 closed
Oct 23, 2024
140 Issues opened by 99 people
-
Plan to support “discrete” dynamic dimension on torch.export
#139307 opened
Oct 30, 2024 -
torch.export torchaudio kaldi module
#139305 opened
Oct 30, 2024 -
[RFC][Pipelining] RNG state communication, avoid seed checkpoint
#139304 opened
Oct 30, 2024 -
[ONNX][RFC] Migrate torchlib from onnxscript (torch 2.6/2.7)
#139301 opened
Oct 30, 2024 -
CUDNN sdp attention causes loss explosion
#139298 opened
Oct 30, 2024 -
`torch.package` warning -- `TypedStorage` is deprecated
#139297 opened
Oct 30, 2024 -
Using xpu module in the cuda version Pytorch
#139290 opened
Oct 30, 2024 -
No macOS 12 wheels available, cannot use torch binary wheels on GitHub Actions
#139288 opened
Oct 30, 2024 -
FSDP1 SHARD_GRAD_OP broken after torch upgrade to 2.4.1 and flash_attn upgrade
#139287 opened
Oct 30, 2024 -
test/export/test_retraceability.py fails locally, likely flaky
#139284 opened
Oct 30, 2024 -
Draft-mode export: ep.run_decompositions() doesn't run with real tensor prop
#139283 opened
Oct 30, 2024 -
Calculate speed
#139282 opened
Oct 30, 2024 -
`torch._numpy.ndarray.astype()` does not accept Numpy Dtypes correctly
#139281 opened
Oct 30, 2024 -
Possible bug of tools::flight_recorder
#139280 opened
Oct 30, 2024 -
Dynamo ignores non-inductor backend graph freezing option
#139272 opened
Oct 30, 2024 -
TensorBoard images loading error
#139269 opened
Oct 30, 2024 -
Torch Inductor should have a way for new backend to provide build options
#139268 opened
Oct 30, 2024 -
Error compiling the torch.library.custom_op with input mutations with set_
#139257 opened
Oct 30, 2024 -
DISABLED test_not_implemented_error (__main__.ExcTests)
#139255 opened
Oct 30, 2024 -
DISABLED test_dynamo_error (__main__.LoggingTests)
#139254 opened
Oct 30, 2024 -
DISABLED test_inductor_error (__main__.LoggingTests)
#139253 opened
Oct 30, 2024 -
Ban relative imports in test/
#139252 opened
Oct 30, 2024 -
export onnx error with sfft
#139246 opened
Oct 30, 2024 -
ncclInternalError: Internal check failed
#139245 opened
Oct 30, 2024 -
Tensor.index_reduce produces incorrect result
#139242 opened
Oct 30, 2024 -
Please implement batching rule for torch.nn.functional.multi_margin_loss
#139241 opened
Oct 30, 2024 -
TORCH_COMPILE_CPROFILE=1 doesn't work on python 3.12
#139232 opened
Oct 29, 2024 -
torch.compile'ing individual linears for torchtitan debug model + FSDP2 leads to errors
#139222 opened
Oct 29, 2024 -
[Runtime Error] Build PyTorch with cuda12.2 on Jetson AGX Orin with jetpack 5.1.4
#139210 opened
Oct 29, 2024 -
Major perf regression with `BatchNorm2d` + `torch.compile` with `reduce-overhead` + DDP
#139207 opened
Oct 29, 2024 -
fail_on_recompile stance is unusable because it doesn't handle skipped frames
#139202 opened
Oct 29, 2024 -
First run lint on just the changes in the PR before running it over the entire PR
#139201 opened
Oct 29, 2024 -
find_spec does something weird to Python Path when loading modules
#139200 opened
Oct 29, 2024 -
PCH build fail with sccache-v0.8.2
#139188 opened
Oct 29, 2024 -
Operators being traced as method calls in torch.fx
#139185 opened
Oct 29, 2024 -
compile fails when split is called within a device cm
#139183 opened
Oct 29, 2024 -
[ROCm] [Upstream Triton] num_stages=0 deprecation with stream pipeliner v2
#139182 opened
Oct 29, 2024 -
[v2.6] Inductor issue tracker for Triton release/3.2
#139175 opened
Oct 29, 2024 -
`pkg_resources` module is deprecated by setuptools
#139170 opened
Oct 29, 2024 -
Deepwise convolution with `torch.compile` has inconsistent numerical precision with eager mode
#139169 opened
Oct 29, 2024 -
[Dynamo] TypeError: `list` object is not callable
#139167 opened
Oct 29, 2024 -
SubgraphMatcher may fail to match when the matching pattern having call_module IR
#139163 opened
Oct 29, 2024 -
[ERROR]: fullgraph compilation failed: 'FakeRootModule' object has no attribute 'self___attr_list_0'
#139160 opened
Oct 29, 2024 -
Dynamo capture of tensor.data assignment doesn't identical to eager call of tensor.data assignment
#139152 opened
Oct 29, 2024 -
DISABLED TCPStoreTest.testMultiTenantStoresUV (__main__.TCPStoreTest)
#139150 opened
Oct 29, 2024 -
DISABLED test_like_channels_last_cpu (__main__.CpuTritonTests)
#139149 opened
Oct 29, 2024 -
Some files in sccache are owned by `hostmaster+pytorch`
#139143 opened
Oct 29, 2024 -
torch.nn.InstanceNorm2d throws "mixed dtype" error with track_running_stats set to True
#139140 opened
Oct 29, 2024 -
[FlexAttention] Update the way we generate kernel options
#139131 opened
Oct 28, 2024 -
[pgnccl] unstable restarting of PGs
#139129 opened
Oct 28, 2024 -
shift operators not supporting integral types
#139124 opened
Oct 28, 2024 -
[Profiler/Kineto] Add group_trace_id and trace_id support to auto-trace python frontend
#139118 opened
Oct 28, 2024 -
`_adjust_num_blocks_and_indices` gives wrong adjusted block mask
#139117 opened
Oct 28, 2024 -
[ROCm] PyTorch TunableOps results in Memory Access Fault
#139116 opened
Oct 28, 2024 -
torch.nn.AvgPool2d works with long on cpu but not gpu
#139115 opened
Oct 28, 2024 -
`torch.compile` errors when inputs memory overlap.
#139111 opened
Oct 28, 2024 -
torch.compile + FSDP1 CPU offloading + PT lightning validation loop throws an error
#139110 opened
Oct 28, 2024 -
LibTorch build error on Windows for CUDA version (debug/release)
#139108 opened
Oct 28, 2024 -
test/functorch/test_ops.py:test_extremal_numerics_l1_loss_cpu segfaults when wrapped with Dynamo
#139103 opened
Oct 28, 2024 -
[inductor][rocm] Cooperative reductions on AMD GPUs
#139099 opened
Oct 28, 2024 -
`tensor` not a `FakeTensor` under `FakeTensorMode` and `device('meta')`
#139092 opened
Oct 28, 2024 -
[dynamo] `dynamo_expected_failures` is silent on certain tests that ended up passing
#139080 opened
Oct 28, 2024 -
Add support for optional tensor kwargs in inductor
#139077 opened
Oct 28, 2024 -
DISABLED test_aot_export_scan_simple_cuda_float32 (__main__.TestHOPCUDA)
#139075 opened
Oct 28, 2024 -
DISABLED test_retrace_export_scan_simple_cuda_float32 (__main__.TestHOPCUDA)
#139074 opened
Oct 28, 2024 -
DISABLED test_pre_dispatch_export_scan_simple_cuda_float32 (__main__.TestHOPCUDA)
#139072 opened
Oct 28, 2024 -
DISABLED test_serialize_export_scan_simple_cuda_float32 (__main__.TestHOPCUDA)
#139073 opened
Oct 28, 2024 -
Covert Conv3d to Conv2d, But output different
#139066 opened
Oct 28, 2024 -
Cannot compile with latest LLVM-19
#139065 opened
Oct 28, 2024 -
[Flex Attention] Cannot determine truth value of Relational
#139064 opened
Oct 28, 2024 -
Quantization API examples in the docs are outdated
#139063 opened
Oct 28, 2024 -
Triton pre_hooks are ignored by torch.compile
#139059 opened
Oct 28, 2024 -
DISABLED test_where_broadcast_cpu (__main__.CpuTests)
#139057 opened
Oct 28, 2024 -
DISABLED test_wrapper_subclass_aliasing_conv2d_cpu (__main__.TestWrapperSubclassAliasingCPU)
#139056 opened
Oct 28, 2024 -
CI Failure for Instruction Count doesn't say what file to update
#139034 opened
Oct 28, 2024 -
Stabilizing the Kumaraswamy Distribution
#139019 opened
Oct 27, 2024 -
DISABLED test_extra_cuda_context (__main__.ProcessGroupNCCLGroupTest)
#139011 opened
Oct 27, 2024 -
Dependency management corrupted in pytorch 2.5.0
#139005 opened
Oct 27, 2024 -
fatal error C1083: 无法打开包括文件: “cstddef”: No such file or directory error on windows
#139002 opened
Oct 27, 2024 -
Build failed when use shared MKL library
#138994 opened
Oct 26, 2024 -
Support `attn_mask` in `jagged_scaled_dot_product_attention`
#138993 opened
Oct 26, 2024 -
Error loading ".venv\Lib\site-packages\torch\lib\c10_xpu.dll" or one of its dependencies
#138986 opened
Oct 26, 2024 -
The error compiling problem of Qt and libtorch(my solution to eliminate the bug)
#138981 opened
Oct 26, 2024 -
compile_fx inplace modifly the input graph
#138980 opened
Oct 26, 2024 -
What version or package name will be used in aarch64 release for 2.6 on pypi?
#138971 opened
Oct 26, 2024 -
CompiledFxGraph.current_callable is not thread-safe
#138961 opened
Oct 26, 2024 -
Write a script to convert PT2 compiler benchmark results to the standardize format for OSS benchmark
#138960 opened
Oct 26, 2024 -
`DataLoader` probably shouldn't use `fork` by default
#138957 opened
Oct 25, 2024 -
[Feature request] Flag to prevent certain parameters from being typecast by nn.module.to()
#138952 opened
Oct 25, 2024 -
cudagraphed compiled module slows down when called recursively
#138949 opened
Oct 25, 2024 -
DISABLED test_dtype_aware_codegen_load_upcast_to_fp32_False_bfloat16 (__main__.TritonCodeGenTests)
#138944 opened
Oct 25, 2024 -
DISABLED test_dtype_aware_codegen_load_upcast_to_fp32_False_float16 (__main__.TritonCodeGenTests)
#138943 opened
Oct 25, 2024 -
Libtorch torch::Tensor default constructor does not initialize on CPU like in python
#138940 opened
Oct 25, 2024 -
torch.cond complains two tensors have different metadata
#138926 opened
Oct 25, 2024 -
compiled autograd can specialize symints from backward graph without guarding
#138920 opened
Oct 25, 2024 -
DISABLED test_byte_mask (__main__.TestAdvancedIndexing)
#138911 opened
Oct 25, 2024 -
Replace magma-cuda dependency on `pytorch/conda-builder` Docker image
#138909 opened
Oct 25, 2024 -
Output mismatch between torch.compile and eager mode
#138908 opened
Oct 25, 2024 -
Select SDPA backend smartly by `sdp_params` and benchmark results
#138907 opened
Oct 25, 2024 -
RuntimeError: "addmm_cuda" not implemented for 'ComplexHalf'
#138906 opened
Oct 25, 2024 -
DISABLED test_sdpa_mask_fp16_L6_S17_NH23_HS121 (__main__.TestSDPA)
#138905 opened
Oct 25, 2024 -
Runtime error: retains_grad_hooks not implemented for compiled autograd
#138901 opened
Oct 25, 2024 -
cpu faster than mps for both GRUCell and LSTMCell on apple silicon M3 max: MPS Metal
#138898 opened
Oct 25, 2024 -
Inductor needs to be updated to work with upstream triton's AttrsDescriptor
#138895 opened
Oct 25, 2024 -
How to Implement multi-card parallel Inference by torchrun?
#138888 opened
Oct 25, 2024 -
DISABLED test_file_reader_no_memory_leak (__main__.TestScript)
#138885 opened
Oct 25, 2024 -
DISABLED test_slice_with_floordiv_serdes_non_strict (__main__.SerDesExportNonStrictTestExport)
#138884 opened
Oct 25, 2024 -
[ROCm] "No available kernel" when running EFFICIENT_ATTENTION sdpa
#138864 opened
Oct 24, 2024 -
Using int(shape) in export would result in silent specialization
#138853 opened
Oct 24, 2024 -
[torch.export] round is not an allowed operator type
#138852 opened
Oct 24, 2024 -
Using shape dependent conditionals with export resulted in weird KeyError
#138847 opened
Oct 24, 2024 -
`torch.distributed._state_dict_utils._broadcast_tensors` does not properly support CPU tensors.
#138842 opened
Oct 24, 2024 -
[export] `run_decompositions` fails with pytree error on `Llama-3.2-vision`
#138839 opened
Oct 24, 2024 -
[PTD][RPC] Verify RPC Tutorials contents and scripts
#138832 opened
Oct 24, 2024 -
Add mean and var operation for Nested Tensors
#138831 opened
Oct 24, 2024 -
[ONNX] Support for `aten.istft`
#138830 opened
Oct 24, 2024 -
torch.dot has inconsistent support for int64 (long)
#138829 opened
Oct 24, 2024 -
RuntimeError: HalfTensor is not supported
#138820 opened
Oct 24, 2024 -
autograd.Function HOP (maybe others) have name conflict when lifting freevars
#138817 opened
Oct 24, 2024 -
Strange recompilations on torch 2.5 + FSDP + UNet
#138813 opened
Oct 24, 2024 -
same data all reduce on H20, but results are different
#138811 opened
Oct 24, 2024 -
Batching rule not defined for `aten::_make_dual`.
#138800 opened
Oct 24, 2024 -
compile + allgather with group will fail for stack-style allgather
#138795 opened
Oct 24, 2024 -
Support nn.Module arguments for the function to torch.compiler.allow_in_graph
#138786 opened
Oct 24, 2024 -
torch.cond works, but would still print a misleading warning
#138782 opened
Oct 24, 2024 -
[Inductor] Regression in test_comprehensive_nn_functional_max_pool2d_cuda from triton
#138775 opened
Oct 24, 2024 -
[funcol] functional collectives are 67% slower than torch.distributed collectives
#138773 opened
Oct 24, 2024 -
Accuracy issues (NANs) in torch.sdpa backward on ROCm
#138764 opened
Oct 23, 2024 -
CI/CD: Figure out what to do with split build
#138750 opened
Oct 23, 2024 -
Speed up gaussian_nll_loss when variance is always the same scalar value
#138747 opened
Oct 23, 2024 -
Cleanup the scaling logic in runtime.triton_heuristics.triton_config
#138743 opened
Oct 23, 2024
446 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Consolidate Triton cache into Inductor cache
#138239 commented on
Oct 30, 2024 • 52 new comments -
[pt2, docs] Add new PT2 troubleshooting doc
#138620 commented on
Oct 30, 2024 • 43 new comments -
Generalization of distributed test cases for non-CUDA devices
#138216 commented on
Oct 28, 2024 • 29 new comments -
Relax mutation to aten.to in export
#138606 commented on
Oct 30, 2024 • 28 new comments -
[hop free symbols] lift free symbols in example_value when create_graph_input
#138363 commented on
Oct 30, 2024 • 20 new comments -
[Inductor][CPP] Cache weight tiles in L1D for AMX int8 WoQ GEMM
#136688 commented on
Oct 30, 2024 • 20 new comments -
[Intel GPU] qconv at XPU backend
#133080 commented on
Oct 30, 2024 • 16 new comments -
[hierarchical-compilation][invoke-subgraph] Add fake tensor caching
#137808 commented on
Oct 29, 2024 • 11 new comments -
make it clearer (in docs) one can double decorate with torch.library.impl_* APIs
#137608 commented on
Oct 30, 2024 • 11 new comments -
Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512.
#138388 commented on
Oct 30, 2024 • 10 new comments -
[aotd] Unwrap unseen AsyncCollectiveTensor tangents
#138731 commented on
Oct 28, 2024 • 8 new comments -
Add TORCHDYNAMO_EXTENDED_ADVICE (#137159)
#137196 commented on
Oct 29, 2024 • 7 new comments -
Make c10::string_view an alias of std::string_view
#130417 commented on
Oct 30, 2024 • 6 new comments -
Support tensor betas in Adam and AdamW
#134171 commented on
Oct 29, 2024 • 6 new comments -
Allow inplacing buffer when other users are inconsequential
#138383 commented on
Oct 30, 2024 • 6 new comments -
[Dynamo] Allow `filter()` to handle infinite iterator
#138305 commented on
Oct 29, 2024 • 5 new comments -
[aotd] capture rrelu_with_noise noise mutation
#138503 commented on
Oct 30, 2024 • 5 new comments -
Add deterministic path for CUDA `cumsum`
#136224 commented on
Oct 30, 2024 • 5 new comments -
[AOTI] Introduce an extensibility mechanism for the c shim codegen to make it easy to produce c shims for out-of-tree OP kernels as well. Add c_shim for XPU.
#136742 commented on
Oct 30, 2024 • 5 new comments -
Add UTs for accelerator device-agnostic runtime APIs
#133572 commented on
Oct 30, 2024 • 4 new comments -
Initial work on "cuda graphs with arguments".
#137318 commented on
Oct 27, 2024 • 4 new comments -
[inductor] fix the unligned variable ranges issue in fuse node
#138568 commented on
Oct 29, 2024 • 4 new comments -
[PyTorch] Hook up fp16_gemv_trans to x86 fp16 GEMM
#137918 commented on
Oct 30, 2024 • 4 new comments -
init kineto after torch module initialized
#131448 commented on
Oct 30, 2024 • 4 new comments -
[hop free symbols][refactor] make create_graph_input always take example_value
#138428 commented on
Oct 29, 2024 • 4 new comments -
[fx graph cache] Support freezing with FX graph caching
#136505 commented on
Oct 29, 2024 • 4 new comments -
[cpu] add int8 sdpa api
#138688 commented on
Oct 25, 2024 • 4 new comments -
make equation behind torch.isclose element-wise
#138459 commented on
Oct 29, 2024 • 3 new comments -
Add BRGEMM API versioning to be compatible with different oneDNN versions
#138184 commented on
Oct 30, 2024 • 3 new comments -
FlopCounterMode: Decompose ops for inference mode
#138508 commented on
Oct 28, 2024 • 3 new comments -
typing ir.py - part 2
#131846 commented on
Oct 30, 2024 • 3 new comments -
[Environment Variable][5/N] Use thread-safe getenv functions
#138438 commented on
Oct 24, 2024 • 3 new comments -
[AOTI] add C shim for QConvPointWise
#138540 commented on
Oct 29, 2024 • 3 new comments -
[Inductor][CPP] Add oneDNN BRGEMM config for Half cpp gemm template
#136255 commented on
Oct 30, 2024 • 3 new comments -
[Pipelining] Runtime support and optimization for separate dI / dW
#131762 commented on
Oct 25, 2024 • 3 new comments -
[ROCm] Add support for SymmetricMemory and Intra Node Comm
#134817 commented on
Oct 24, 2024 • 3 new comments -
Update pin memory related APIs to not pass 'device' argument
#131858 commented on
Oct 30, 2024 • 2 new comments -
Add clamp function for bool tensor
#138618 commented on
Oct 29, 2024 • 2 new comments -
[MPS] Compile kernels into Metallib
#138636 commented on
Oct 27, 2024 • 2 new comments -
[hop free symbols][refactor] make bound_symbols a dictionary
#138345 commented on
Oct 29, 2024 • 2 new comments -
[Quant][Inductor] support QConv/QLinear + broadcast add fusion
#138185 commented on
Oct 30, 2024 • 2 new comments -
softshrink nan fixes
#138421 commented on
Oct 28, 2024 • 2 new comments -
[Partitioner] Enumerate partitions by iterating partition ids
#136598 commented on
Oct 26, 2024 • 2 new comments -
Add float-to-int conversion errror checks
#138592 commented on
Oct 25, 2024 • 2 new comments -
[inductor] make requires_stride_order more unbacked-symint-aware
#137063 commented on
Oct 30, 2024 • 2 new comments -
unravel_index check if all elements in `indices` are within the acceptable range
#138223 commented on
Oct 29, 2024 • 2 new comments -
[FSDP] Add support for `FSDPCommContext` on PrivateUse1 backend
#138687 commented on
Oct 29, 2024 • 2 new comments -
fix dynamo tracking numpy 2 ops
#138686 commented on
Oct 30, 2024 • 2 new comments -
[Partitioner] Speed up the update of partition map
#136616 commented on
Oct 25, 2024 • 2 new comments -
Build transformed distributions within compiled code
#135001 commented on
Oct 25, 2024 • 2 new comments -
Make IPC features extendable on third-party devices
#133222 commented on
Oct 30, 2024 • 2 new comments -
[hop free symbols][refactor] lift freevar to parent graph before lifting to subgraph
#138559 commented on
Oct 30, 2024 • 2 new comments -
`NO_DEPRECATED` cmake switch & C++ macro prevents compilation of deprecated code
#138405 commented on
Oct 24, 2024 • 2 new comments -
Avoid args being parsed when common_utils imported
#134592 commented on
Oct 28, 2024 • 2 new comments -
Link with MKL::MKL instead of MKL_LIBRARIES
#128195 commented on
Oct 24, 2024 • 1 new comment -
fix sampler - force cpu device for .tolist tensors
#135990 commented on
Oct 30, 2024 • 1 new comment -
Fix lazy_property compatibility with dynamo
#138080 commented on
Oct 26, 2024 • 1 new comment -
[PT2D][DDP][WIP] Automatically detect whether to turn on CompiledDDP
#138560 commented on
Oct 26, 2024 • 1 new comment -
check fake/real mismatches during real tensor prop
#137747 commented on
Oct 28, 2024 • 1 new comment -
Use Manylinux2_28 for wheel builds
#138732 commented on
Oct 24, 2024 • 0 new comments -
[WIP][Inductor] auto-chunker
#136702 commented on
Oct 30, 2024 • 0 new comments -
[Partitioner] Reduce time consuming of partitions merger
#136614 commented on
Oct 25, 2024 • 0 new comments -
[Partitioner] Remove unnecessary upstream nodes in dependency viewer
#136608 commented on
Oct 23, 2024 • 0 new comments -
Make Context to be Device-agnostic Step by Step (2/N)
#136526 commented on
Oct 26, 2024 • 0 new comments -
[Reland] Remove outdated Caffe2 code from build scripts
#135104 commented on
Oct 28, 2024 • 0 new comments -
[experiment] xdist
#137533 commented on
Oct 29, 2024 • 0 new comments -
Clean up accelerator APIs
#137493 commented on
Oct 30, 2024 • 0 new comments -
[FlexAttention] add support for learnable biases in Inductor
#137452 commented on
Oct 30, 2024 • 0 new comments -
[sparse] add search for optimal alg_id to torch.compile
#137427 commented on
Oct 29, 2024 • 0 new comments -
[WIP] Added swizzle searching, disabled fp16 accum, and enabled ping-ong for cutlass
#137410 commented on
Oct 25, 2024 • 0 new comments -
Add XuehaiPan to CODEOWNERS for C++ PyTree utilities
#137408 commented on
Oct 30, 2024 • 0 new comments -
[pytree][4/N] make `torch.utils.pytree` as public API
#137400 commented on
Oct 30, 2024 • 0 new comments -
[dynamo][pytree][3/N] make CXX pytree traceable: `tree_map` / `tree_map_`
#137399 commented on
Oct 30, 2024 • 0 new comments -
[dynamo][pytree][2/N] make CXX pytree traceable: `tree_flatten` / `tree_unflatten` / `tree_structure`
#137398 commented on
Oct 30, 2024 • 0 new comments -
[dynamo][pytree][1/N] make CXX pytree traceable: `tree_iter` / `tree_leaves`
#137397 commented on
Oct 30, 2024 • 0 new comments -
[wip][compiled autograd] always lift python autograd function bwd
#137359 commented on
Oct 29, 2024 • 0 new comments -
Implement HPUHooksInterface
#137338 commented on
Oct 30, 2024 • 0 new comments -
[wip][compiled autograd] automatic compiled autograd
#137326 commented on
Oct 29, 2024 • 0 new comments -
[ROCm][Draft] Enable finding HIP and ROCm libraries on Windows
#137279 commented on
Oct 29, 2024 • 0 new comments -
[Inductor] Support tiling reduction dimensions
#137243 commented on
Oct 30, 2024 • 0 new comments -
[WIP] Fix test_out when run on CPU with CUDA available
#137140 commented on
Oct 29, 2024 • 0 new comments -
[reland] Flip triton kernel default layout constraint to "needs_fixed_stride_order"
#137064 commented on
Oct 30, 2024 • 0 new comments -
Update FlexAttention benchmarks
#137036 commented on
Oct 30, 2024 • 0 new comments -
Improvements for associative_scan - Autograd
#136966 commented on
Oct 23, 2024 • 0 new comments -
Avoid reorder in mkldnn_to_dense when output is already in a public format
#136859 commented on
Oct 24, 2024 • 0 new comments -
Enable XNNPACK for quantized add
#136850 commented on
Oct 25, 2024 • 0 new comments -
Add generator parameter to rand*_like functions
#136780 commented on
Oct 28, 2024 • 0 new comments -
[Intel GPU] qlinear.pointwise with mixed dtype support
#136753 commented on
Oct 30, 2024 • 0 new comments -
Wrap torch_python with torch_compile_options
#136743 commented on
Oct 28, 2024 • 0 new comments -
[dont-review][dynamo][dicts] Prototype - Guard on dict keys lazily
#134887 commented on
Oct 30, 2024 • 0 new comments -
Adds Error Handling Bindings to Cudart
#134869 commented on
Oct 29, 2024 • 0 new comments -
dumy test : add dumy ci test
#134814 commented on
Oct 29, 2024 • 0 new comments -
Fix the bug to pass correct sequence id into PyTorch exection trace
#134753 commented on
Oct 28, 2024 • 0 new comments -
Bisect PR-134552 changes breaking pytorch github unit test [Not for land] 1
#134751 commented on
Oct 28, 2024 • 0 new comments -
Bisect PR-134552 changes breaking pytorch github unit test [Not for land] 2
#134750 commented on
Oct 28, 2024 • 0 new comments -
[Inductor][Optimus] Fix group fusion stride layout
#134696 commented on
Oct 28, 2024 • 0 new comments -
Make dot and vdot structured ops (#64)
#134671 commented on
Oct 29, 2024 • 0 new comments -
Add TORCHINDUCTOR_VEC_ISA_OK env var for vec_isa_ok
#134667 commented on
Oct 29, 2024 • 0 new comments -
Fix unused Python variables in test/[a-d]*
#134665 commented on
Oct 29, 2024 • 0 new comments -
[wip] TORCH_FAKE_PROCESS_GROUP to run distributed on single gpu
#134634 commented on
Oct 27, 2024 • 0 new comments -
Switch XNNPack tests to use _export_for_training (#4867)
#134615 commented on
Oct 28, 2024 • 0 new comments -
Add functional checkpoint API to torch.func.checkpoint
#134584 commented on
Oct 26, 2024 • 0 new comments -
Disable AMP when propagating fake tensors
#134583 commented on
Oct 29, 2024 • 0 new comments -
[Release only] turn off failing CI tests
#134575 commented on
Oct 26, 2024 • 0 new comments -
implement `torch._foreach_rsqrt`
#134574 commented on
Oct 30, 2024 • 0 new comments -
[ONNX] Allow opset 21 for ONNXRuntime==1.18 compatibility (#127167)
#134571 commented on
Oct 28, 2024 • 0 new comments -
[Inductor][FlexAttention] Temporarily disable non-divisible support until fixing all issues.
#134505 commented on
Oct 26, 2024 • 0 new comments -
[CMake] Turn libshm into an object lib
#134437 commented on
Oct 30, 2024 • 0 new comments -
[CI] Enable CI UT on windows xpu
#134382 commented on
Oct 27, 2024 • 0 new comments -
[aot_inductor] add commit info for triton cache file
#134340 commented on
Oct 30, 2024 • 0 new comments -
Support narrow() on batch dim for NJT
#136444 commented on
Oct 29, 2024 • 0 new comments -
[inductor] modify the heuristic for disabling vectorization
#136422 commented on
Oct 30, 2024 • 0 new comments -
Add `truediv` support in export serializer
#136364 commented on
Oct 24, 2024 • 0 new comments -
Fix unused Python variables outside torch/ and test/
#136359 commented on
Oct 30, 2024 • 0 new comments -
Add a new distributed backend (XCCL) for Intel GPUs
#136343 commented on
Oct 30, 2024 • 0 new comments -
Pass rounding_mode for div reference inputs through kwargs
#136308 commented on
Oct 23, 2024 • 0 new comments -
Set output num_float_feature to have dynamic dimension
#136268 commented on
Oct 29, 2024 • 0 new comments -
Add determinmistic kernel for reflection2d
#136241 commented on
Oct 29, 2024 • 0 new comments -
Add support for `@contextmanager` in Dynamo
#136033 commented on
Oct 30, 2024 • 0 new comments -
Bump `nn.functional.conv3d` tolerances for `test_comprehensive`
#135719 commented on
Oct 24, 2024 • 0 new comments -
[scan] support jit inductor
#135603 commented on
Oct 24, 2024 • 0 new comments -
vec: support RVV
#135570 commented on
Oct 29, 2024 • 0 new comments -
[Reland] Fix tensor.data_ptr() representation overflow
#135567 commented on
Oct 30, 2024 • 0 new comments -
Fix the use of fsspec transactions
#135541 commented on
Oct 26, 2024 • 0 new comments -
Add BFloat16 support for BRGEMM flash attention forward kernel
#135473 commented on
Oct 24, 2024 • 0 new comments -
[Intel GPU] qconv.pointwise with mixed dtype XPU support
#135465 commented on
Oct 30, 2024 • 0 new comments -
[ONNX] New registration API
#135403 commented on
Oct 25, 2024 • 0 new comments -
add supports_coalescing property in c10d::Backend to determine whether backend supports coalescing
#135338 commented on
Oct 30, 2024 • 0 new comments -
[Intel GPU] qlinear_pointwise.binary[_tensor] XPU support
#135337 commented on
Oct 30, 2024 • 0 new comments -
[AOTI XPU] Rename test_cuda_cpp_wrapper.py to test_gpu_cpp_wrapper.py,
#135320 commented on
Oct 30, 2024 • 0 new comments -
[WIP][AOTI XPU] Enable Cpp wraper for Intel GPU.
#135318 commented on
Oct 30, 2024 • 0 new comments -
[Intel GPU] qconv_pointwise.binary XPU support
#135189 commented on
Oct 30, 2024 • 0 new comments -
[hop free symbols][refactor] make map's save_for_backward to handle int
#138558 commented on
Oct 29, 2024 • 0 new comments -
[Draft][CUDA][CI][cusparselt] Only CUDA 11.8 ships the libcusparseLt.so.0, CUDA 12 would use PYPI libcusparselt
#138547 commented on
Oct 30, 2024 • 0 new comments -
[Prototype] Adding lowering to persistent-tma device kernel for _scaled_mm
#138536 commented on
Oct 30, 2024 • 0 new comments -
`addmm`: error on output dtype mismatch.
#138520 commented on
Oct 30, 2024 • 0 new comments -
[Inductor] introduce comm buffer planning
#138519 commented on
Oct 26, 2024 • 0 new comments -
[c10d] user to explicitly specify whether split semantics should be used in new_group
#138518 commented on
Oct 26, 2024 • 0 new comments -
Add test for consistency between meta and CPU devices.
#138515 commented on
Oct 30, 2024 • 0 new comments -
Always produce XML
#138513 commented on
Oct 30, 2024 • 0 new comments -
torch/nn/modules/linear.py: docs: improvements
#138484 commented on
Oct 26, 2024 • 0 new comments -
[inductor] Remove an unused local function in select_algorithm.py
#138475 commented on
Oct 26, 2024 • 0 new comments -
[inductor] Remove an unused variable in ir.py
#138474 commented on
Oct 25, 2024 • 0 new comments -
[inductor] Replace set by OrderedSet
#138466 commented on
Oct 29, 2024 • 0 new comments -
Update test_function_base.py for Numpy 2.0 +
#138463 commented on
Oct 28, 2024 • 0 new comments -
[draft] set_linter finds and replaces builtin set in Python code
#138454 commented on
Oct 29, 2024 • 0 new comments -
Openreg: Add RNG Generator
#138449 commented on
Oct 29, 2024 • 0 new comments -
Add docs page for `torch.inf` and `torch.nan`
#138430 commented on
Oct 30, 2024 • 0 new comments -
[eazy][inductor] report peak GPU memory usage in the wrapper
#138429 commented on
Oct 30, 2024 • 0 new comments -
NJT OpInfo tests v2
#138370 commented on
Oct 29, 2024 • 0 new comments -
[cuDNN][SDPA] Match `query`'s memory layout ordering for `output` in cuDNN SDPA
#138354 commented on
Oct 25, 2024 • 0 new comments -
[logging] Use aot_graph_name for the metrics key for backward compilation.
#138339 commented on
Oct 23, 2024 • 0 new comments -
No-gil cherry picks for 2.5.X
#138335 commented on
Oct 25, 2024 • 0 new comments -
Add missing operator and corresponding unittest
#138309 commented on
Oct 29, 2024 • 0 new comments -
[Distributed] Fix todo relative with docs optimization
#138302 commented on
Oct 29, 2024 • 0 new comments -
Add documentation for torch.default_generator
#138295 commented on
Oct 29, 2024 • 0 new comments -
[Windows XPU] SYCL version compatibility
#138728 commented on
Oct 28, 2024 • 0 new comments -
[Windows XPU] Fix MSVC ambiguous symbol error
#138727 commented on
Oct 30, 2024 • 0 new comments -
cudagraph explicit sync on ROCm only after capture_begin()
#138722 commented on
Oct 23, 2024 • 0 new comments -
[wip] fix meta_registrations + alpha
#138703 commented on
Oct 25, 2024 • 0 new comments -
[ROCm] Enable e5m2 x e5m2 unit test in test_float8_basics
#138699 commented on
Oct 23, 2024 • 0 new comments -
Add overflow check for integer division
#138684 commented on
Oct 29, 2024 • 0 new comments -
[fx graph cache] Refactor FxGraphCachePickler, step 2
#138683 commented on
Oct 29, 2024 • 0 new comments -
[fx graph cache] Refactor FxGraphCachePickler
#138682 commented on
Oct 29, 2024 • 0 new comments -
Add device-agnostic runtime Device/Stream C++ API
#138677 commented on
Oct 29, 2024 • 0 new comments -
Move reduce to template parameter in vectorized_reduction
#138672 commented on
Oct 30, 2024 • 0 new comments -
only unspecialize float during aot_eager
#138666 commented on
Oct 26, 2024 • 0 new comments -
Make test_torchbind.py training IR compatible
#138658 commented on
Oct 29, 2024 • 0 new comments -
Solve issue 138464
#138653 commented on
Oct 25, 2024 • 0 new comments -
[Inductor][ROCm][CK] Enable lowering conv2d instances in CK Inductor backend
#138643 commented on
Oct 29, 2024 • 0 new comments -
Add COW support for MPS backend.
#138640 commented on
Oct 23, 2024 • 0 new comments -
ILP for auto FSDP wrapping
#138635 commented on
Oct 25, 2024 • 0 new comments -
Upgrade to fbscribelogger 0.1.7
#138634 commented on
Oct 24, 2024 • 0 new comments -
torch/nn/utils/rnn.py: docs: improvements
#138628 commented on
Oct 26, 2024 • 0 new comments -
Fix global namespace pollution in ATen/Dispatch.h (#138622)
#138626 commented on
Oct 30, 2024 • 0 new comments -
(wip)[CUTLASS] Support re-running with different shapes
#138611 commented on
Oct 24, 2024 • 0 new comments -
[ONNX] Bypass mark_static_address when converting a model to onnx
#138580 commented on
Oct 23, 2024 • 0 new comments -
[1/N] Apply py39 ruff fixes
#138578 commented on
Oct 29, 2024 • 0 new comments -
[1/N] Use isinstance in python code
#138573 commented on
Oct 29, 2024 • 0 new comments -
add CUDA 12.6 to docker for sbsa wheel
#138562 commented on
Oct 28, 2024 • 0 new comments -
Add getDeviceAllocator device agnostic API in accelerator hooks
#137972 commented on
Oct 24, 2024 • 0 new comments -
[WIP] avoid atomic add for XPU device in satter_add by deterministic mode
#137966 commented on
Oct 24, 2024 • 0 new comments -
[FlexAttention] Add sm86 config
#137959 commented on
Oct 29, 2024 • 0 new comments -
[Dynamo] allow dynamic callables on tensor variables
#137940 commented on
Oct 29, 2024 • 0 new comments -
Add more DSA
#137934 commented on
Oct 24, 2024 • 0 new comments -
update _unsafe_set_version_counter to accept lists of tensors
#137921 commented on
Oct 29, 2024 • 0 new comments -
Dont decompose aten.baddmm in inductor
#137904 commented on
Oct 30, 2024 • 0 new comments -
[WIP] Update triton xpu commit pin
#137886 commented on
Oct 30, 2024 • 0 new comments -
[POC][functorch] use public PyTree API in `torch.func`
#137884 commented on
Oct 30, 2024 • 0 new comments -
[BE]: Try turning on LTO in CMake in CI
#137866 commented on
Oct 30, 2024 • 0 new comments -
Turned Views, Layouts, and Loops into proper dataclasses
#137861 commented on
Oct 28, 2024 • 0 new comments -
Add Intel GPU info collection to the collect env script
#137846 commented on
Oct 29, 2024 • 0 new comments -
Fix for gcc10 torch.compile compiler error when march=aarch64+sve
#137795 commented on
Oct 30, 2024 • 0 new comments -
Refactor FxGraphDrawer to use HTML-like labels
#137726 commented on
Oct 25, 2024 • 0 new comments -
Pass all arguments when quantizing embedding bag from float
#137697 commented on
Oct 29, 2024 • 0 new comments -
[FlexAttention] Skip Calculating KV Gradient if Possible
#137658 commented on
Oct 25, 2024 • 0 new comments -
Globally enable Python dispatcher for all of Inductor compilation
#137621 commented on
Oct 30, 2024 • 0 new comments -
Flip default on weights_only
#137602 commented on
Oct 30, 2024 • 0 new comments -
Make Context to be Device-agnostic Step by Step (4/N)
#137580 commented on
Oct 25, 2024 • 0 new comments -
Make Context to be Device-agnostic Step by Step (3/N)
#137578 commented on
Oct 25, 2024 • 0 new comments -
[Intel GPU] allow_tf32 context at XPU backend
#137570 commented on
Oct 29, 2024 • 0 new comments -
Remove depracated alias macro(3/3)
#137562 commented on
Oct 26, 2024 • 0 new comments -
Remove depracated alias macro(2/3)
#137559 commented on
Oct 26, 2024 • 0 new comments -
[inductor] modify the heuristic for loop split optimization
#137550 commented on
Oct 30, 2024 • 0 new comments -
[CUDAGraph] Add multithreading support for cudagraph trees
#138282 commented on
Oct 27, 2024 • 0 new comments -
Small fix to Python rendering in documentation.
#138281 commented on
Oct 25, 2024 • 0 new comments -
[PyTorch] Support non-zero beta in fp16_gemv_trans
#138275 commented on
Oct 29, 2024 • 0 new comments -
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on
Oct 30, 2024 • 0 new comments -
[FX][export][dynamo] use `tuple` instead of `list` exported code signature
#138213 commented on
Oct 25, 2024 • 0 new comments -
Rewrite _reparametrize_module to use `@contextmanager`
#138203 commented on
Oct 30, 2024 • 0 new comments -
[POC][FX][pytree] cleanup fx pytree implementation
#138202 commented on
Oct 30, 2024 • 0 new comments -
Fuse partition lists reorder
#138176 commented on
Oct 29, 2024 • 0 new comments -
Update cusparselt to 0.6.3.2 version (0.6.3 for short).
#138175 commented on
Oct 24, 2024 • 0 new comments -
fix trace nn.parameters()
#138149 commented on
Oct 28, 2024 • 0 new comments -
[Pipelining] add schedule simulator and chrometrace dump
#138134 commented on
Oct 25, 2024 • 0 new comments -
fix test_float_to_int_conversion_nonfinite for NumPy 2
#138131 commented on
Oct 30, 2024 • 0 new comments -
[Pipelining] Support V-schedules in IR and Runtime
#138125 commented on
Oct 25, 2024 • 0 new comments -
[wip] "Python compiled autograd II"
#138101 commented on
Oct 29, 2024 • 0 new comments -
Propagate NJT lengths through op calls
#138098 commented on
Oct 25, 2024 • 0 new comments -
[DRAFT] do not look--testing cpp_extension no python
#138088 commented on
Oct 24, 2024 • 0 new comments -
[BE]: Improve typing storage
#138084 commented on
Oct 29, 2024 • 0 new comments -
handle more devices in method_type method of TensorVariable
#138078 commented on
Oct 30, 2024 • 0 new comments -
Use clang18 asan
#138066 commented on
Oct 24, 2024 • 0 new comments -
[POC][pytree] test C++ pytree in `torch.utils.pytree` by default
#138056 commented on
Oct 30, 2024 • 0 new comments -
[PyTorch] Hook up fp16_gemv_trans to gemv fast path for non-aarch64 architectures
#138005 commented on
Oct 29, 2024 • 0 new comments -
[CI] Move mkl install logic to `requirements-ci.txt`
#137995 commented on
Oct 29, 2024 • 0 new comments -
[BE]: Update CUDNN for Unix OSS to 9.5.1.17
#137978 commented on
Oct 27, 2024 • 0 new comments -
Fix layout for SetSourceTensorKernel
#137973 commented on
Oct 30, 2024 • 0 new comments -
[ONNX] Export Phi3.5 onnx graph multiple slice nodes missing starts or ends attribute
#138637 commented on
Oct 29, 2024 • 0 new comments -
FakeMode should not fakify non persistent buffer
#107879 commented on
Oct 28, 2024 • 0 new comments -
General MPS op coverage tracking issue
#77764 commented on
Oct 28, 2024 • 0 new comments -
CVE-2024-5480 reported by security analyzers
#129228 commented on
Oct 28, 2024 • 0 new comments -
Flex attention with mask depending on queries and keys lengths (or how to implement `causal_lower_right` masking)
#137779 commented on
Oct 28, 2024 • 0 new comments -
Problem with building PyTorch from source for ROCm on Ubuntu 24.04
#137858 commented on
Oct 28, 2024 • 0 new comments -
[ROCm] PyTorch Profiler Seg Fault on PyTorch Nightly
#138360 commented on
Oct 28, 2024 • 0 new comments -
FlopCounterMode doesn't support HOP
#134385 commented on
Oct 28, 2024 • 0 new comments -
Distributed Weighted Sampler.
#77154 commented on
Oct 28, 2024 • 0 new comments -
[ONNX] dynamo: support conditional op `cond`
#117655 commented on
Oct 28, 2024 • 0 new comments -
Add support for Flash Attention for AMD/ROCm
#112997 commented on
Oct 28, 2024 • 0 new comments -
Torch Dynamo support for Flux Transformer model
#138195 commented on
Oct 28, 2024 • 0 new comments -
DISABLED test_aot_export_cond_simple_cuda_float32 (__main__.TestHOPCUDA)
#123096 commented on
Oct 28, 2024 • 0 new comments -
[v2.5.1] Release Tracker
#138613 commented on
Oct 28, 2024 • 0 new comments -
[FSDP2] Eager-Mode Execution Tracker
#120003 commented on
Oct 28, 2024 • 0 new comments -
DISABLED test_save_with_without_initializer_dont_include_initializer_no_fake_mode_no_exported_program (__main__.TestFxToOnnx)
#125020 commented on
Oct 28, 2024 • 0 new comments -
AOT compilation fails for nested models using torchvision V2 API
#137743 commented on
Oct 28, 2024 • 0 new comments -
Compiling a module leads to `AssertionError: expected size 64==64, stride 1==49 at dim=1`
#136837 commented on
Oct 28, 2024 • 0 new comments -
torch._inductor.exc.LoweringException in `torch._export.aot_compile`
#138254 commented on
Oct 28, 2024 • 0 new comments -
Build errors while building PyTorch with BLIS
#134399 commented on
Oct 28, 2024 • 0 new comments -
Infinite cycle on BSC tensor: base -> values -> base
#122089 commented on
Oct 28, 2024 • 0 new comments -
[Dynamo] Eager fallback casued by graph breaks in module hooks
#135410 commented on
Oct 28, 2024 • 0 new comments -
[PTD BE DAY]Burn Down Distributed Disabled Tests!!
#132845 commented on
Oct 28, 2024 • 0 new comments -
The latest version of pytorch cannot export onnx with loop operator
#122588 commented on
Oct 27, 2024 • 0 new comments -
Obscure error: Expected a value of type 'List[int]' for argument 'sizes' but instead found type 'immutable_list'
#122129 commented on
Oct 27, 2024 • 0 new comments -
_tensors_definitely_do_not_overlap guard explosion
#118214 commented on
Oct 27, 2024 • 0 new comments -
[torch.export] Detect internal constrains
#136216 commented on
Oct 27, 2024 • 0 new comments -
Pruning step for users variable in Scheduler
#138721 commented on
Oct 27, 2024 • 0 new comments -
[RFC] Cuda support matrix for Release 2.6
#138609 commented on
Oct 29, 2024 • 0 new comments -
[torch.compile] Integers stored on nn.Modules as dynamic causing errors
#133166 commented on
Oct 29, 2024 • 0 new comments -
vmap, jacrev, jacfwd, hessian, etc., in libTorch
#106455 commented on
Oct 29, 2024 • 0 new comments -
torch._dynamo.exc.Unsupported: Unexpected type in sourceless builder torch.Tensor when running Mamba models in vLLM
#136497 commented on
Oct 29, 2024 • 0 new comments -
`torch.onnx.export` doesn't correctly constfold constants and DequantLinear
#123628 commented on
Oct 29, 2024 • 0 new comments -
[Intel GPU] Lower FLOPs and Bandwidth on Arc 770
#136342 commented on
Oct 29, 2024 • 0 new comments -
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on
Oct 29, 2024 • 0 new comments -
Using AOTInductor in C++ crashes if the model uses torch.linalg.eigh with CUDA
#138601 commented on
Oct 29, 2024 • 0 new comments -
[Inductor] unaligned variable ranges during node fusion
#138550 commented on
Oct 29, 2024 • 0 new comments -
[RFC] Per-Parameter-Sharding FSDP
#114299 commented on
Oct 29, 2024 • 0 new comments -
`maybe_mark_dynamic` causes max recursion error when used with compile during tensordict consolidation
#138729 commented on
Oct 29, 2024 • 0 new comments -
dynamo(eval_frame.py) failed on Windows UT.
#132561 commented on
Oct 29, 2024 • 0 new comments -
dynamo IndexError: list index out of range on Windows UT
#132569 commented on
Oct 29, 2024 • 0 new comments -
`unbind_copy` gives unexpected results on 1-dimensional inputs, or 0-dimensional outputs
#130829 commented on
Oct 29, 2024 • 0 new comments -
Microsoft Visual C++ Redistributable is not installed, this may lead to the DLL load failure.
#126507 commented on
Oct 29, 2024 • 0 new comments -
torch.onnx.dynamo_export fails to convert torchaudio.transforms.MFCC to onnx
#125375 commented on
Oct 29, 2024 • 0 new comments -
Missing librt.so.1, libdl.so.2, libz.so.1 and libpthread.so.0 in PyTorch Source Build (aarch64)
#137078 commented on
Oct 29, 2024 • 0 new comments -
Direct Implementation of K-Nearest neighbor (KNN) in pytorch
#71386 commented on
Oct 29, 2024 • 0 new comments -
Add support for immutable tensors in torch.export
#136642 commented on
Oct 29, 2024 • 0 new comments -
`_amp_foreach_non_finite_check_and_unscale_` can be torch.compiled inside torch.amp, but not in identical code outside it
#138412 commented on
Oct 29, 2024 • 0 new comments -
ROCm: /usr/bin/ld: failed to convert GOTPCREL relocation
#138427 commented on
Oct 29, 2024 • 0 new comments -
DISABLED test_constant_insertion (__main__.TestJit)
#96894 commented on
Oct 29, 2024 • 0 new comments -
Support for `uint16`, `uint32`, and `uint64`
#58734 commented on
Oct 28, 2024 • 0 new comments -
torch.compile(create_block_mask) errors in certain cases and hangs in others
#138514 commented on
Oct 28, 2024 • 0 new comments -
getting "handle_foreach_pow_scalar() got multiple values for argument 'self'" since PT2.5
#138698 commented on
Oct 28, 2024 • 0 new comments -
TorchInductor CPU Performance Dashboard
#93531 commented on
Oct 28, 2024 • 0 new comments -
[feature request]: Update max onnx opset to 21 for onnxruntime==1.18 compatability
#127167 commented on
Oct 28, 2024 • 0 new comments -
DISABLED test_serialize_export_auto_functionalize_simple_cuda_float32 (__main__.TestHOPCUDA)
#123563 commented on
Oct 24, 2024 • 0 new comments -
Investigate potential cost savings for inductor workflows
#138476 commented on
Oct 24, 2024 • 0 new comments -
[export] Cannot view a tensor with shape torch.Size([1, 512, 32, 128]) and strides (2097152, 128, 65536, 1) as a tensor with shape (1, 512, 4096)
#136543 commented on
Oct 24, 2024 • 0 new comments -
RFC: Integration of KleidiAI 4-Bit MatMul Kernels into PyTorch
#137830 commented on
Oct 24, 2024 • 0 new comments -
torch.compile+cudagraphs asserts in multithreaded context
#123177 commented on
Oct 24, 2024 • 0 new comments -
More comprehensive debug logging of intermediate FX graphs for Inductor passes
#118123 commented on
Oct 24, 2024 • 0 new comments -
inductor error with PT Lightning + FSDP + torchao.float8 + torch.compile
#138715 commented on
Oct 24, 2024 • 0 new comments -
with torch.cuda.amp.autocast() get out of memory error when using with torch.no_grad() during validation
#45910 commented on
Oct 24, 2024 • 0 new comments -
Performance regression in torch.compile
#136254 commented on
Oct 24, 2024 • 0 new comments -
`torch.bernoulli` and `torch.rand` to support `dtype=torch.bool` kwarg (and sypport operation of `tensor.uniform_` on bool tensors)
#35527 commented on
Oct 24, 2024 • 0 new comments -
DataLoader num_workers > 0 causes CPU memory from parent process to be replicated in all worker processes
#13246 commented on
Oct 24, 2024 • 0 new comments -
[CI] Continuous spam of PyTorch CI build status notifications in fork
#138564 commented on
Oct 24, 2024 • 0 new comments -
[inductor] cpp gemm autotune doesn't work on AMD EPYC
#138718 commented on
Oct 24, 2024 • 0 new comments -
`export()` fails for `full((n,), v)` but succeeds for `ones((n,)) * v` where `v` is dynamic
#138073 commented on
Oct 24, 2024 • 0 new comments -
DISABLED test_dropout_deterministic_cuda (__main__.GPUTests)
#133025 commented on
Oct 24, 2024 • 0 new comments -
NCCL watchdog thread terminated with exception
#113128 commented on
Oct 24, 2024 • 0 new comments -
dynamic control flow failed
#138689 commented on
Oct 24, 2024 • 0 new comments -
Less restrictive guards when using int(log2(sz))
#137220 commented on
Oct 23, 2024 • 0 new comments -
[ONNX]: Fail to export onnx when GroupNorm input shape rank=2
#130108 commented on
Oct 23, 2024 • 0 new comments -
dead code in test that can be removed
#138673 commented on
Oct 23, 2024 • 0 new comments -
The default setting of PyTorch should show warning always.
#138552 commented on
Oct 23, 2024 • 0 new comments -
[CI/CD] Deprecating PyTorch’s official Anaconda channel
#138696 commented on
Oct 23, 2024 • 0 new comments -
Nightly job `linux-binary-manywheel / manywheel-py3_10-xpu-build / build` is currently broken
#138695 commented on
Oct 23, 2024 • 0 new comments -
Flex attention underperforms SDPA (cuDNN), constructing T5 attention bias via embedding weights
#138493 commented on
Oct 23, 2024 • 0 new comments -
torch.compile silently treats None argument passed to Triton kernel as a `*i8`
#138546 commented on
Oct 23, 2024 • 0 new comments -
[Validation] pypi binaries with slimmed dependencies are usable in standard AWS containers (amazonlinux:2023 regression in 1.13 and 2.5)
#138482 commented on
Oct 23, 2024 • 0 new comments -
Fusion causes peak memory increase
#138685 commented on
Oct 23, 2024 • 0 new comments -
[feature request] Varlen indexing function for lookup and concat of varlen BPE tokens from a tensor vocab (i.e. `detokenize(...)` and arrays of strings)
#135704 commented on
Oct 27, 2024 • 0 new comments -
load_state_dict unexpectedly does not load Tensor to buffers that currently have None value
#8104 commented on
Oct 27, 2024 • 0 new comments -
Tensor loses `bool` method during scripting
#70544 commented on
Oct 27, 2024 • 0 new comments -
Investigate torch.compile Windows support.
#122094 commented on
Oct 26, 2024 • 0 new comments -
[Performance] [CuDNN-Attention] CuDNN backend should return the output in the same stride order as input Query
#138340 commented on
Oct 26, 2024 • 0 new comments -
Cleanup stale Dynamo feature flags
#136862 commented on
Oct 26, 2024 • 0 new comments -
The doc of `linalg.vector_norm()` should not say `ord` parameter accepts the `str` value `fro` or `nuc`
#136563 commented on
Oct 26, 2024 • 0 new comments -
torch 2.5 slower than 2.4.1 ?
#138386 commented on
Oct 26, 2024 • 0 new comments -
DISABLED test_save_load_checkpoint (__main__.DistributedDataParallelTest)
#137771 commented on
Oct 25, 2024 • 0 new comments -
Enable CUDA 12.6.2
#138440 commented on
Oct 25, 2024 • 0 new comments -
DISABLED test_device_mode_ops_sparse_mm_reduce_cpu_bfloat16 (__main__.TestDeviceUtilsCPU)
#132494 commented on
Oct 25, 2024 • 0 new comments -
[RFC] Add new CPP builder for inductor on pytorch Windows
#124245 commented on
Oct 25, 2024 • 0 new comments -
SDPA 2.5 Issue tracking
#138649 commented on
Oct 25, 2024 • 0 new comments -
DISABLED test_is_isnot (__main__.TestScript)
#120694 commented on
Oct 25, 2024 • 0 new comments -
[dynamo] Restart mechanism for unboxing cell optimization can hide speculation log divergence
#138491 commented on
Oct 25, 2024 • 0 new comments -
Batching rule for aten::bincount.
#105912 commented on
Oct 25, 2024 • 0 new comments -
get_model_state_dict failed after FSDP_model.to(dtype)
#138467 commented on
Oct 25, 2024 • 0 new comments -
[RFC] Intel GPU Distributed Support in PyTorch
#135791 commented on
Oct 25, 2024 • 0 new comments -
Support writing tensorboard traces to AWS S3 (and other cloud storage services) in profiler
#73131 commented on
Oct 25, 2024 • 0 new comments -
Libtorch C++ register_module raise "read access violation error"
#116568 commented on
Oct 25, 2024 • 0 new comments -
Building project using libtorch results in "Failed to find nvToolsExt"
#116242 commented on
Oct 24, 2024 • 0 new comments -
AOT inductor can't handle dynamic shape 2*s1
#137748 commented on
Oct 24, 2024 • 0 new comments -
Implement missing torch.nan* operators
#61474 commented on
Oct 24, 2024 • 0 new comments -
[export] Torch custom class export issue
#138344 commented on
Oct 24, 2024 • 0 new comments -
DISABLED test_assigning_back_deleter_fns_to_tensor (__main__.TestBlockStateAbsorption)
#134810 commented on
Oct 24, 2024 • 0 new comments -
DISABLED test_retrace_export_auto_functionalize_simple_cuda_float32 (__main__.TestHOPCUDA)
#123449 commented on
Oct 24, 2024 • 0 new comments -
foreach global norm
#134327 commented on
Oct 29, 2024 • 0 new comments -
Temp disable MKL in DistributionKernels.cpp
#132532 commented on
Oct 26, 2024 • 0 new comments -
[WIP] Refactor distributed code via device-agnostic API
#132371 commented on
Oct 30, 2024 • 0 new comments -
Add pybind support for is_hpu property of Tensor object
#132228 commented on
Oct 24, 2024 • 0 new comments -
[torch.special] Adding betainc with backward operation
#132135 commented on
Oct 27, 2024 • 0 new comments -
Add Weighted Loss Functions to PyTorch : WMSE, WMAE, and Weighted Huber Loss
#132049 commented on
Oct 30, 2024 • 0 new comments -
[xla hash update] update the pinned xla hash
#132021 commented on
Oct 28, 2024 • 0 new comments -
Added missing `__all__` in `__init__` files.
#131800 commented on
Oct 25, 2024 • 0 new comments -
track number of cpp->python exceptions thrown in torch.compile benchmark suite
#131481 commented on
Oct 29, 2024 • 0 new comments -
BE: reset dynamo before each test in test_ops_gradients.py
#131397 commented on
Oct 28, 2024 • 0 new comments -
Test module importability in subprocess
#131317 commented on
Oct 25, 2024 • 0 new comments -
[CI] enable operator benchmark on CPU
#131305 commented on
Oct 25, 2024 • 0 new comments -
cmake looks for FP16_PATH environment variable
#130865 commented on
Oct 28, 2024 • 0 new comments -
[1/N] Fix sign comparison errors in CUDA code
#130718 commented on
Oct 27, 2024 • 0 new comments -
LOG(INFO) -> VLOG(2) in ProcessGroupNCCL
#130696 commented on
Oct 30, 2024 • 0 new comments -
[ROCm] Unskip functorch efficient attention tests
#130481 commented on
Oct 28, 2024 • 0 new comments -
[pytree] implement key path APIs for CXX pytree
#130141 commented on
Oct 30, 2024 • 0 new comments -
[pytree] preserve `dict` keys in insertion order in CXX pytree
#130140 commented on
Oct 30, 2024 • 0 new comments -
[CI] Run `lintrunner` on generated `.pyi` stub files in CI
#129887 commented on
Oct 24, 2024 • 0 new comments -
[BE][Easy] Fix `PYI034`: non-self-return-type in tensor method hints
#129886 commented on
Oct 24, 2024 • 0 new comments -
Refactor `torch/utils/data/datapipes/gen_pyi.py` with `torchgen`
#129873 commented on
Oct 24, 2024 • 0 new comments -
Add `__all__` to `torch/nn/functional.pyi` and `torch/return_types.pyi`
#129872 commented on
Oct 24, 2024 • 0 new comments -
[torchgen] Refactor `torchgen.utils.FileManager` to accept `pathlib.Path`
#129871 commented on
Oct 24, 2024 • 0 new comments -
[PT2E Quantization] Fix RecursionError when prepare_pt2e graph with concat of the same node
#129567 commented on
Oct 30, 2024 • 0 new comments -
Use Generic TypeAlias (PEP 585) and Union Type (PEP 604) in generated `.pyi` stub files
#129420 commented on
Oct 25, 2024 • 0 new comments -
Use absolute path `path.resolve()` -> `path.absolute()`
#129409 commented on
Oct 25, 2024 • 0 new comments -
[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir`
#129374 commented on
Oct 25, 2024 • 0 new comments -
Fix numerical instability for norm
#129352 commented on
Oct 24, 2024 • 0 new comments -
Fix unbind_copy and add its decomposition
#134319 commented on
Oct 29, 2024 • 0 new comments -
[UT] Add test about PrivateUse1, its module must be registered before calling device-related APIs
#134318 commented on
Oct 26, 2024 • 0 new comments -
[profiler][UT] instantiate profiler UTs for devices and enable UTs for xpu profiler
#134316 commented on
Oct 29, 2024 • 0 new comments -
[CI][dashboard][experiment][not for landing] Change aarch64 to measure fp16
#134282 commented on
Oct 25, 2024 • 0 new comments -
[export] enumerate unsupported sympy.Functions
#134271 commented on
Oct 26, 2024 • 0 new comments -
[Dynamo][autograd.Function] Support ctx.setup_context properly and compose with vmap
#134256 commented on
Oct 28, 2024 • 0 new comments -
[DO NOT LAND] DTensor + GradScaler: DTensor dispatch without changing amp
#134185 commented on
Oct 25, 2024 • 0 new comments -
Enable c10d ops for XPU device using MPI backend
#134132 commented on
Oct 26, 2024 • 0 new comments -
cpuinfo git branch fix (main instead of master)
#134101 commented on
Oct 26, 2024 • 0 new comments -
[export] don't set complex replacements on hybrid symints
#133920 commented on
Oct 23, 2024 • 0 new comments -
[dynamo] refactor `builtins.zip` to use polyfill
#133895 commented on
Oct 24, 2024 • 0 new comments -
[dynamo][itertools] refactor `itertools.count` to use polyfill
#133875 commented on
Oct 24, 2024 • 0 new comments -
[cpu] Modify inductor opt flag --- ffast-math
#133842 commented on
Oct 29, 2024 • 0 new comments -
[dynamo] support `functools.cmp_to_key`
#133805 commented on
Oct 30, 2024 • 0 new comments -
Ensure SWA boundary conditions w.r.t. definition
#133773 commented on
Oct 28, 2024 • 0 new comments -
Allow mp.start_processes to create processes in parallel
#133707 commented on
Oct 26, 2024 • 0 new comments -
Implements user buffer registration using MemPool
#133603 commented on
Oct 29, 2024 • 0 new comments -
[BE]: Update NCCL submodule to 2.22.3
#133593 commented on
Oct 29, 2024 • 0 new comments -
Bijection and Jacobian for lower Cholesky and positive definite transforms
#133500 commented on
Oct 27, 2024 • 0 new comments -
Remove unused Python variables in torch/[_-a]*
#133492 commented on
Oct 26, 2024 • 0 new comments -
[Inductor][Do NOT Review] Inplacing with Donated Buffer
#133368 commented on
Oct 29, 2024 • 0 new comments -
[Intel GPU] qlinear at XPU backend
#133307 commented on
Oct 30, 2024 • 0 new comments -
[BE] enable `ruff` rule series `PIE` from `flake8-pie`
#133202 commented on
Oct 25, 2024 • 0 new comments -
[BE][Easy] enable `ruff` rule `PIE808`: unnecessary `start` argument in `range(...)`
#133201 commented on
Oct 25, 2024 • 0 new comments -
autograd codegen: bump VC properly for mutable ops with no returns
#133044 commented on
Oct 29, 2024 • 0 new comments -
Improvements for associative_scan - vmap fixes
#133013 commented on
Oct 27, 2024 • 0 new comments -
S390x update builder image
#132983 commented on
Oct 29, 2024 • 0 new comments -
`(*bias): last dimension must be contiguous` when running compiled SDPA on length 1 tensors
#138317 commented on
Oct 30, 2024 • 0 new comments -
Contribution Proposal: ScatterND Implementation for PyTorch
#138502 commented on
Oct 30, 2024 • 0 new comments -
FlexAttention result deviates with torch.compile() and torch.set_float32_matmul_precision('high')
#138556 commented on
Oct 30, 2024 • 0 new comments -
[export] Failed to save the model using torch.export.save
#136897 commented on
Oct 30, 2024 • 0 new comments -
[export] Dynamic shape torch-trt models fail on torch.export.load
#137365 commented on
Oct 30, 2024 • 0 new comments -
DISABLED test_comprehensive__unsafe_masked_index_cuda_bool (__main__.TestInductorOpInfoCUDA)
#131748 commented on
Oct 30, 2024 • 0 new comments -
Easier way of seeing dynamic shape specializations in C++?
#137657 commented on
Oct 30, 2024 • 0 new comments -
tensor.triu_(1) not working properly with large matrix
#136611 commented on
Oct 30, 2024 • 0 new comments -
FSDP.forward() fails "_is_root should not have been set" error after saving a distributed checkpoint
#113496 commented on
Oct 30, 2024 • 0 new comments -
Sequence Ops usage when exporting embedding bag into onnx
#138485 commented on
Oct 30, 2024 • 0 new comments -
ninja: build stopped: subcommand failed.
#108209 commented on
Oct 30, 2024 • 0 new comments -
Increased memory usage with AMP
#61173 commented on
Oct 30, 2024 • 0 new comments -
DISABLED test_comprehensive_fft_ifft_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#127344 commented on
Oct 30, 2024 • 0 new comments -
[MPS] Inconsistent performance issues
#136003 commented on
Oct 30, 2024 • 0 new comments -
torch._dynamo.exc.Unsupported: builtin: bool [<class 'torch._dynamo.variables.tensor.SymNodeVariable'>] False
#136075 commented on
Oct 30, 2024 • 0 new comments -
take_along_dim or gather unstable results on cpu with stride 1
#129093 commented on
Oct 30, 2024 • 0 new comments -
TORCH_COMPILE_CPROFILE does not work for python 3.12
#137869 commented on
Oct 30, 2024 • 0 new comments -
Python 3.13 support for PyTorch
#130249 commented on
Oct 29, 2024 • 0 new comments -
[Tracker] Nested tensor op coverage requests
#118107 commented on
Oct 29, 2024 • 0 new comments -
AOT eager accuracy regression in Segformer in 2.5.0 release
#138652 commented on
Oct 29, 2024 • 0 new comments -
Diffusers import change breaking newly built (non-cached) test-infra dockers.
#138591 commented on
Oct 29, 2024 • 0 new comments -
`Calculate Docker Image` step times out if run on LF runners
#138042 commented on
Oct 29, 2024 • 0 new comments -
vmap with torch.autograd.grad does not work on output of compiled function
#138422 commented on
Oct 29, 2024 • 0 new comments -
`Check label` failures are not very informative
#137927 commented on
Oct 29, 2024 • 0 new comments -
[CI][LF] Docker builds to new ECR fails
#137361 commented on
Oct 29, 2024 • 0 new comments -
There is a performance drop because we have not yet implemented the batching rule for aten::native_dropout_backward. Please file us an issue on GitHub so that we can prioritize its implementation
#122432 commented on
Oct 29, 2024 • 0 new comments -
torch.stft sometimes raises RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR on low free memory
#119420 commented on
Oct 29, 2024 • 0 new comments -
Provide a method to unregister privateuse1
#129056 commented on
Oct 29, 2024 • 0 new comments -
[inductor] refine loop split logic
#128812 commented on
Oct 25, 2024 • 0 new comments -
[Inductor] support masked vectorization for the tail_loop for integer datatypes and bool datatype
#128802 commented on
Oct 30, 2024 • 0 new comments -
Remove caffe2 namespace from c10/macros/Macros.h
#128672 commented on
Oct 29, 2024 • 0 new comments -
Add MaskedTensor support to *_like API
#128637 commented on
Oct 28, 2024 • 0 new comments -
Adds support for accelerated sorting with x86-simd-sort
#127936 commented on
Oct 23, 2024 • 0 new comments -
Reuse UT for Intel GPU backend [Part1]
#127602 commented on
Oct 29, 2024 • 0 new comments -
[Intel GPU]Enable fp64 double GEMM
#127508 commented on
Oct 26, 2024 • 0 new comments -
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 commented on
Oct 30, 2024 • 0 new comments -
Enable Leak Sanitizer
#127171 commented on
Oct 30, 2024 • 0 new comments -
[inductor] online softmax
#127011 commented on
Oct 28, 2024 • 0 new comments -
Add NHWC support for group normalization
#126635 commented on
Oct 30, 2024 • 0 new comments -
[vision hash update] update the pinned vision hash
#125806 commented on
Oct 30, 2024 • 0 new comments -
Fix hardcoded rocm warp size
#125433 commented on
Oct 28, 2024 • 0 new comments -
S390x ci periodic tests
#125401 commented on
Oct 24, 2024 • 0 new comments -
[DO NOT MERGE] Test new ROCm CI nodes
#124424 commented on
Oct 25, 2024 • 0 new comments -
Improve decomposition for constant_pad_nd
#123661 commented on
Oct 30, 2024 • 0 new comments -
New improved Conv3D implementation for MPS and support for ConvTranspose3D
#116580 commented on
Oct 24, 2024 • 0 new comments -
Conversions between strided and jagged layouts for Nested Tensors
#115749 commented on
Oct 29, 2024 • 0 new comments -
[triton hash update] update the pinned triton hash
#115529 commented on
Oct 28, 2024 • 0 new comments -
Automated submodule update: FBGEMM
#115316 commented on
Oct 30, 2024 • 0 new comments -
[pytree] support PyStructSequence types for Python pytree
#113258 commented on
Oct 30, 2024 • 0 new comments -
[pytree] add APIs to determine a class is a namedtuple or PyStructSequence
#113257 commented on
Oct 30, 2024 • 0 new comments -
Automated submodule update: kineto
#106149 commented on
Oct 29, 2024 • 0 new comments -
DISABLED test_device_mode_ops_sparse_mm_reduce_cpu_float64 (__main__.TestDeviceUtilsCPU)
#132619 commented on
Oct 30, 2024 • 0 new comments -
DISABLED test_input_mutation2_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135295 commented on
Oct 30, 2024 • 0 new comments -
ROCm loses some supported GPUs by requiring hipblaslt
#119081 commented on
Oct 30, 2024 • 0 new comments