david/flash-attention

Autor	SHA1 Mensagem	Data
JDKWangGuan	0d810cfb73 Fix KeyError handling for non-existing key in state_dict.pop() (#898)	há 6 meses atrás
Tri Dao	ef0ed10622 Add window_size option to MHA and GPT	há 11 meses atrás
Tri Dao	abbc131173 [LayerNorm] Switch from CUDA to Triton implementation	há 1 ano atrás
Tri Dao	73df3be7d5 Add test for BTLM init	há 1 ano atrás
Tri Dao	2e29dacf0c Implement muParam	há 1 ano atrás
Tri Dao	2c7d7b7396 Implement norm head for Baichuan2	há 1 ano atrás
Tri Dao	c3b2196652 Add Alibi to MHA, test with Baichuan-13B	há 1 ano atrás
Yuchao Dai	187c2a0635 Fix E1136 (#563)	há 1 ano atrás
Tri Dao	d0032700d1 Add tests for Pythia, GPT-JT, and RedPajama models	há 1 ano atrás
Kevin Hu	07005806ff Add BigCode converters (#532)	há 1 ano atrás
Tri Dao	798858f9f1 Fix test_baichuan	há 1 ano atrás
Tri Dao	7b33743a72 [Gen] Add back num_last_tokens in gpt.py	há 1 ano atrás
dan_the_3rd	011ec323d6 Support MQA + MP for decoding (#490)	há 1 ano atrás
Tri Dao	f8aea6ead0 [GPT] Generalize last_token_only arg to num_last_tokens	há 1 ano atrás
Aman Gupta Karmani	e0b09891c6 add llama support to GPTPreTrainedModel.from_pretrained (#479)	há 1 ano atrás
Xuechen Li	25d6b1dbcb handle uneven heads across ranks when combining state_dicts; resolves #467 (#468)	há 1 ano atrás
Xuechen Li	7fcd3e6a04 map custom model state_dict back to huggingface format (#465)	há 1 ano atrás
Tri Dao	f1a73d0740 Run isort and black on python files	há 1 ano atrás
Xuechen Li	bb4cded17b support when num_heads is not divisible by world_size; resolves #459 (#461)	há 1 ano atrás
Tri Dao	4b661a569d [GPT] Run black on gpt.py	há 1 ano atrás
Tri Dao	184b992dcb [GPT] Implement parallel LLaMa	há 1 ano atrás
Haodong Lyu	8ee62efca3 Implement ParallelGatedMlp (#251)	há 1 ano atrás
Tri Dao	d38357dd2f [GPT] Implement Falcon	há 1 ano atrás
Tri Dao	425dbcb6c6 [MHA] Implement MQA/GQA	há 1 ano atrás
Tri Dao	ec9f74ab9a [Rotary] Don't store inv_freq in state_dict	há 1 ano atrás
Tri Dao	75e334d407 [MLP] Add ParallelMLP	há 1 ano atrás
Tri Dao	48bc6eacd6 [Gen] Add rotary base as an argument to FT attention kernel	há 1 ano atrás
Federico Berto	69f5f7d0a2 [BugFix] cannot unpack non-iterable NoneType object	há 1 ano atrás
Tri Dao	a9a4b4e4f2 [LLaMa] Fix last norm layer to use RMSNorm instead of LayerNorm	há 1 ano atrás
Tri Dao	ba2fe7f378 [Gen] Move allocate_inference_cache to within the model	há 1 ano atrás

Recente Antigo

Histórico de Commits Pesquisar

Histórico de Commits