Skip to content

Tags: anfedotoff/pytorch

Tags

ciflow/all/76481

Toggle ciflow/all/76481's commit message
Update on "Remove pow and float_power TestGradient Skips"

[ghstack-poisoned]

ciflow/all/75663

Toggle ciflow/all/75663's commit message
fixed kwargs update

ciflow/all/72710

Toggle ciflow/all/72710's commit message
modify test_basic

ciflow/all/67833

Toggle ciflow/all/67833's commit message
Update on "Add linalg.lu"


This PR modifies `lu_unpack` by:
- Using less memory when unpacking `L` and `U`
- Fuse the subtraction by `-1` with `unpack_pivots_stub`
- Define tensors of the correct types to avoid copies
- Port `lu_unpack` to be a strucutred kernel so that its `_out` version does not incur on extra copies

Then we implement `linalg.lu` as a structured kernel, as we want to
compute its derivative manually. We do so because composing the derivatives
of `torch.lu_factor` and `torch.lu_unpack` would be less efficient.

This new function and `lu_unpack` comes with all the things it can come:
forward and backward ad, decent docs, correctness tests, OpInfo, complex support,
support for metatensors and support for vmap and vmap over the gradients.

I really hope we don't continue adding more features.

This PR also avoids saving some of the tensors that were previously
saved unnecessarily for the backward in `lu_factor_ex_backward` and
`lu_backward` and does some other general improvements here and there
to the forward and backward AD formulae of other related functions.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

[ghstack-poisoned]

ciflow/trunk/76556

Toggle ciflow/trunk/76556's commit message
[FSDP] Relax exec order valid. to only fwd

[ghstack-poisoned]

ciflow/trunk/76499

Toggle ciflow/trunk/76499's commit message
[ROCm] default tests use 1 GPU, distributed tests use 2 GPUs

ciflow/trunk/76480

Toggle ciflow/trunk/76480's commit message
update

ciflow/trunk/76194

Toggle ciflow/trunk/76194's commit message
Add optional timeout argument for RpcAgent join() (pytorch#76194)

Summary:
This PR was created to resolve issue brought up in https://fb.workplace.com/groups/319878845696681/permalink/741428653541696/

Changes:
- Adds timeout argument to RpcAgent.join()
- Add optional timeout argument to ThriftRpcAgent barrier()
- During shutdown (ThriftRpcAgent join) calls the barrier, the agent will use the timeout passed to shutdown and pass that timeout into the join().
- Update API.py to also include fix bug (missing timeout for signal)
- Change default shutdown timeout to 0 (no timeout). Existing functionality in _all_gather will remain the same and wait indefinitely for signal if no timeout is set for the function. New functionality has user specify timeout for both the signal and rpc calls.

Pull Request resolved: pytorch#76194

Test Plan:
Modified barrier test

buck test torch/fb/distributed/thriftRpcBackend/test:ThriftRpcAgentTest -- BarrierTest

Differential Revision: D35825382

fbshipit-source-id: 2195a4350c07accceec52905ef1f7990534d0ec8

ciflow/trunk/75917

Toggle ciflow/trunk/75917's commit message
Update on "[NVFuser] Opinfos for extremal values"

Added slow tests for comparing the eager & fused outputs for given extremal inputs.

[ghstack-poisoned]

ciflow/cuda/71216

Toggle ciflow/cuda/71216's commit message
Rebase and fix merge conflicts on "cuDNN/miopen: Use per-operator hea…

…ders"

Differential Revision: [D33949898](https://our.internmc.facebook.com/intern/diff/D33949898)

[ghstack-poisoned]