david/flash-attention

Autor	SHA1 Mensaje	Fecha
Tri Dao	e0fbaa7016 [Gen] Simplify decode_speculative	hace 1 año
Tri Dao	e6a8026489 [Gen] Rename max_sequence_len->max_seqlen, sequence_len_offset->seqlen_offset	hace 1 año
Tri Dao	dfe29f5e2b [Gen] Don't use ft_attention, use flash_attn_with_kvcache instead	hace 1 año
Tri Dao	ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache	hace 1 año
Tri Dao	a86442f0f3 [Gen] Use flash_attn_with_kvcache in generation	hace 1 año
Tri Dao	fd20f16a4e Support cache_seqlens being integer	hace 1 año
Tri Dao	913922cac5 [Gen] Refactor decoding function	hace 1 año
dan_the_3rd	011ec323d6 Support MQA + MP for decoding (#490)	hace 1 año
Tri Dao	9f42cb6e7a [Gen] Clone logits before returning when cg=True	hace 1 año
Tri Dao	f8aea6ead0 [GPT] Generalize last_token_only arg to num_last_tokens	hace 1 año
Tri Dao	371e20658c [GPT] Test generation when passing in multiple tokens	hace 1 año
Tri Dao	c000c3a2c0 [GPT] Move more tests to test_gpt.py	hace 1 año
Tri Dao	9b713872ea [GPT] Move GPT and OPT generation tests to test_{gpt,opt}.py	hace 1 año
Tri Dao	0e8c46ae08 Run isort and black on test files	hace 1 año
Tri Dao	4d87e4d875 Implement GPT-J	hace 1 año
Tri Dao	88173a1aaf [FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP	hace 2 años
Tri Dao	ff34123bd4 Reorder LN in Block, support OPT	hace 2 años
Tri Dao	63670fd84a Implement generation for GPT	hace 2 años
Tri Dao	9d797d8848 Support loading GPT2 weights from Huggingface	hace 2 años