david/flash-attention

Auteur	SHA1 Message	Date
Tri Dao	0705d2718d [Llama] Fix some tests, add tests for Llama 2 and CodeLlama	il y a 1 an
Kevin Hu	42832575d4 Fix Llama GQA/MQA (#546)	il y a 1 an
Tri Dao	dfe29f5e2b [Gen] Don't use ft_attention, use flash_attn_with_kvcache instead	il y a 1 an
Tri Dao	8a733cbd53 [Gen] Fix calling update_graph_cache in tests	il y a 1 an
Tri Dao	913922cac5 [Gen] Refactor decoding function	il y a 1 an
Tri Dao	0e8c46ae08 Run isort and black on test files	il y a 1 an
Xuechen Li	7fcd3e6a04 map custom model state_dict back to huggingface format (#465)	il y a 1 an
Xuechen Li	bb4cded17b support when num_heads is not divisible by world_size; resolves #459 (#461)	il y a 1 an
Xuechen Li	0f7853c6a1 enable loading hf llama checkpoints for training (#446)	il y a 1 an
Tri Dao	184b992dcb [GPT] Implement parallel LLaMa	il y a 1 an
Tri Dao	56ccaff126 [GPT] Add LLaMa-13B to test	il y a 1 an
Tri Dao	8e9820a55b [Rotary] Fix tests when loading state dict with rotary inv_freqs	il y a 1 an
Tri Dao	62e9814466 [Rotary] Make sure frequency calculation is in fp32	il y a 1 an
Tri Dao	a9a4b4e4f2 [LLaMa] Fix last norm layer to use RMSNorm instead of LayerNorm	il y a 1 an
Tri Dao	96d10f6545 Implement LLaMa	il y a 1 an