Tri Dao e45a46a5b7 [Rotary] Implement GPT-J style (interleaved) rotary %!s(int64=2) %!d(string=hai) anos
..
layers e45a46a5b7 [Rotary] Implement GPT-J style (interleaved) rotary %!s(int64=2) %!d(string=hai) anos
losses c6ecd40a59 Tweak CrossEntropyLoss to take process_group in init %!s(int64=2) %!d(string=hai) anos
models 78b7a1dc18 [OPT] Load fp16 weights on CPU before moving to GPU %!s(int64=2) %!d(string=hai) anos
modules 88173a1aaf [FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP %!s(int64=2) %!d(string=hai) anos
ops eb33e587e9 [LayerNorm] Rename x1 -> residual %!s(int64=2) %!d(string=hai) anos
utils 78b7a1dc18 [OPT] Load fp16 weights on CPU before moving to GPU %!s(int64=2) %!d(string=hai) anos
__init__.py af4a9ce024 Add missing __init__.py %!s(int64=2) %!d(string=hai) anos
bert_padding.py 4e38df059e remove numpy dependency %!s(int64=2) %!d(string=hai) anos
flash_attention.py 41cb909741 Change default dropout value in documentation %!s(int64=2) %!d(string=hai) anos
flash_attn_interface.py 88c4e5dbf6 Fix the case when dout is not contiguous %!s(int64=2) %!d(string=hai) anos
flash_attn_triton.py 6b5f271c6d [Triton] Avoid einops repeat by using Tensor.expand %!s(int64=2) %!d(string=hai) anos
flash_attn_triton_og.py b0c0db81f6 Implement FlashAttention in Triton %!s(int64=2) %!d(string=hai) anos
flash_blocksparse_attention.py 5a61cb7729 Rename src -> flash_attn %!s(int64=2) %!d(string=hai) anos
flash_blocksparse_attn_interface.py 5a61cb7729 Rename src -> flash_attn %!s(int64=2) %!d(string=hai) anos
fused_softmax.py ed553e9238 Add Megatron attention implementation for benchmarking %!s(int64=2) %!d(string=hai) anos