david/flash-attention

Autor	SHA1 Mensagem	Data
JDKWangGuan	0d810cfb73 Fix KeyError handling for non-existing key in state_dict.pop() (#898)	há 5 meses atrás
Tri Dao	ef0ed10622 Add window_size option to MHA and GPT	há 10 meses atrás
Tri Dao	abbc131173 [LayerNorm] Switch from CUDA to Triton implementation	há 11 meses atrás
Tri Dao	73df3be7d5 Add test for BTLM init	há 11 meses atrás
Tri Dao	2e29dacf0c Implement muParam	há 11 meses atrás
Tri Dao	2c7d7b7396 Implement norm head for Baichuan2	há 1 ano atrás
Tri Dao	c3b2196652 Add Alibi to MHA, test with Baichuan-13B	há 1 ano atrás
Yuchao Dai	187c2a0635 Fix E1136 (#563)	há 1 ano atrás
Tri Dao	d0032700d1 Add tests for Pythia, GPT-JT, and RedPajama models	há 1 ano atrás
Kevin Hu	07005806ff Add BigCode converters (#532)	há 1 ano atrás
Tri Dao	798858f9f1 Fix test_baichuan	há 1 ano atrás
Tri Dao	7b33743a72 [Gen] Add back num_last_tokens in gpt.py	há 1 ano atrás
dan_the_3rd	011ec323d6 Support MQA + MP for decoding (#490)	há 1 ano atrás
Tri Dao	f8aea6ead0 [GPT] Generalize last_token_only arg to num_last_tokens	há 1 ano atrás
Aman Gupta Karmani	e0b09891c6 add llama support to GPTPreTrainedModel.from_pretrained (#479)	há 1 ano atrás
Xuechen Li	25d6b1dbcb handle uneven heads across ranks when combining state_dicts; resolves #467 (#468)	há 1 ano atrás
Xuechen Li	7fcd3e6a04 map custom model state_dict back to huggingface format (#465)	há 1 ano atrás
Tri Dao	f1a73d0740 Run isort and black on python files	há 1 ano atrás
Xuechen Li	bb4cded17b support when num_heads is not divisible by world_size; resolves #459 (#461)	há 1 ano atrás
Tri Dao	4b661a569d [GPT] Run black on gpt.py	há 1 ano atrás
Tri Dao	184b992dcb [GPT] Implement parallel LLaMa	há 1 ano atrás
Haodong Lyu	8ee62efca3 Implement ParallelGatedMlp (#251)	há 1 ano atrás
Tri Dao	d38357dd2f [GPT] Implement Falcon	há 1 ano atrás
Tri Dao	425dbcb6c6 [MHA] Implement MQA/GQA	há 1 ano atrás
Tri Dao	ec9f74ab9a [Rotary] Don't store inv_freq in state_dict	há 1 ano atrás
Tri Dao	75e334d407 [MLP] Add ParallelMLP	há 1 ano atrás
Tri Dao	48bc6eacd6 [Gen] Add rotary base as an argument to FT attention kernel	há 1 ano atrás
Federico Berto	69f5f7d0a2 [BugFix] cannot unpack non-iterable NoneType object	há 1 ano atrás
Tri Dao	a9a4b4e4f2 [LLaMa] Fix last norm layer to use RMSNorm instead of LayerNorm	há 1 ano atrás
Tri Dao	ba2fe7f378 [Gen] Move allocate_inference_cache to within the model	há 1 ano atrás

Recente Antigo

Histórico de Commits Pesquisar

Histórico de Commits