(feat) add attention logits to model output, add attention soft_cap to vanilla attention; (fix) DP sharding of batch, update dtype of memory tracking interval #209

dvruette · 2025-08-02T09:19:36Z

Collection of various improvements:

Attention logits in model output: useful for experiment tracking and debugging
Attention softcap: used by Gemma, now also supported by vanilla attention
DP sharding of batch: now correctly shards the batch along the DP axis. Depends on a breaking change in eformer ([fix] Apply with_sharding_constraint recursively to pytrees eformer#4).
Minor change to check_interval dtype for memory tracking.

dvruette added 30 commits July 31, 2025 17:38

implement attn softcap in vanilla attention

c00a57c

add attention logits to attention output

db3a35e

change check_interval type to int or float

40f9ec3

[fix] make sure sharding gets applied to every array in batch

f2cd8e5

prettier memory summary printing

f81e3e3

improve memory monitoring display

0ec9171

graceful handling of failing to save state

42c6764

Merge branch 'erfanzar:main' into main

a261472

keep track of dataloading time

f8bf6e4

implement metrics aggregation

c86da8e

make wandb tags configurable

3116af9

fix metrics aggregation

38d8bf7

annotate profiler steps

678a6fb

fix training step annotation

b8b5424

Merge branch 'main' of github.com:erfanzar/EasyDeL

8efe3e4

fix vanilla attention

6546900

add profiling based on env variable

17acf24

fix tpu_setup script for v6e

408e565

Merge branch 'erfanzar:main' into main

b4b1fbd

revert to_state function

bb13fa0

try fix for to_state function

b3663af

debugging to_state method

ecf61be

debugging to_state method

7655aa9

fix to_state method

a8dc4e1

add logs for starting/stopping profiler

390f020

disable profiling options

69403ea

minor logging change

ccb05fe

refactor wandb init to allow for arbitrary kwargs

ce11645

fix case where entity is provided as wandb kwarg

ddb9e7f

fix bug when resuming wandb run

1f13fd3

dvruette added 4 commits August 27, 2025 18:33

add cryptography python dependency

b9c2968

add fixes and patches from marin

30ac0d0

fix patch location

ebb9ce9

fix libtpu installation (hopefully)

1e3285a

dvruette force-pushed the main branch from 02b11d9 to 1e3285a Compare August 27, 2025 21:04

dvruette added 25 commits September 1, 2025 16:22

Merge branch 'main' of github.com:erfanzar/EasyDeL

a7ccc6e

add step skipping for resuming from checkpoint

916582e

install eformer from dvruette/eformer

4edfcfd

fix checkpointing

f187f79

add ray docker image to build

903b632

update ray image

fa10e6e

update ray

3c067f4

update lockfile

8ef48ce

Merge branch 'main' of github.com:erfanzar/EasyDeL into merge-upstream

827a355

remove old fix that is no longer necessary

d194cd8

update lockfile

d740552

add support for uploading profiler traces to gcs

9ccf74d

add orbax saving

c8ffb7a

fix orbax checkpointing

a0c04af

load from orbax checkpoint

b080e20

checkpointing fixes

acf93b2

more checkpointing safety

5708b7c

fix type hint

2bb8f83

various minor fixes and improvements to training loop/checkpointing

0904c88

remove barrier after checkpointing

b441979

update ray to latest version

85472a5

fix continuing from checkpoint, fix unnecessary comms

73eebea

minor changes

3840afc

fix easydel auto flags

38b2380

make training loop compatible with new data sampler

69f5617

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(feat) add attention logits to model output, add attention soft_cap to vanilla attention; (fix) DP sharding of batch, update dtype of memory tracking interval #209

(feat) add attention logits to model output, add attention soft_cap to vanilla attention; (fix) DP sharding of batch, update dtype of memory tracking interval #209

Uh oh!

dvruette commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

(feat) add attention logits to model output, add attention soft_cap to vanilla attention; (fix) DP sharding of batch, update dtype of memory tracking interval #209

Are you sure you want to change the base?

(feat) add attention logits to model output, add attention soft_cap to vanilla attention; (fix) DP sharding of batch, update dtype of memory tracking interval #209

Uh oh!

Conversation

dvruette commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant