.. |
layers
|
e45a46a5b7
[Rotary] Implement GPT-J style (interleaved) rotary
|
%!s(int64=2) %!d(string=hai) anos |
losses
|
c6ecd40a59
Tweak CrossEntropyLoss to take process_group in init
|
%!s(int64=2) %!d(string=hai) anos |
models
|
78b7a1dc18
[OPT] Load fp16 weights on CPU before moving to GPU
|
%!s(int64=2) %!d(string=hai) anos |
modules
|
88173a1aaf
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
|
%!s(int64=2) %!d(string=hai) anos |
ops
|
eb33e587e9
[LayerNorm] Rename x1 -> residual
|
%!s(int64=2) %!d(string=hai) anos |
utils
|
78b7a1dc18
[OPT] Load fp16 weights on CPU before moving to GPU
|
%!s(int64=2) %!d(string=hai) anos |
__init__.py
|
af4a9ce024
Add missing __init__.py
|
%!s(int64=2) %!d(string=hai) anos |
bert_padding.py
|
4e38df059e
remove numpy dependency
|
%!s(int64=2) %!d(string=hai) anos |
flash_attention.py
|
41cb909741
Change default dropout value in documentation
|
%!s(int64=2) %!d(string=hai) anos |
flash_attn_interface.py
|
88c4e5dbf6
Fix the case when dout is not contiguous
|
%!s(int64=2) %!d(string=hai) anos |
flash_attn_triton.py
|
6b5f271c6d
[Triton] Avoid einops repeat by using Tensor.expand
|
%!s(int64=2) %!d(string=hai) anos |
flash_attn_triton_og.py
|
b0c0db81f6
Implement FlashAttention in Triton
|
%!s(int64=2) %!d(string=hai) anos |
flash_blocksparse_attention.py
|
5a61cb7729
Rename src -> flash_attn
|
%!s(int64=2) %!d(string=hai) anos |
flash_blocksparse_attn_interface.py
|
5a61cb7729
Rename src -> flash_attn
|
%!s(int64=2) %!d(string=hai) anos |
fused_softmax.py
|
ed553e9238
Add Megatron attention implementation for benchmarking
|
%!s(int64=2) %!d(string=hai) anos |