JDKWangGuan
|
0d810cfb73
Fix KeyError handling for non-existing key in state_dict.pop() (#898)
|
5 mesiacov pred |
Tri Dao
|
ef0ed10622
Add window_size option to MHA and GPT
|
10 mesiacov pred |
Tri Dao
|
abbc131173
[LayerNorm] Switch from CUDA to Triton implementation
|
11 mesiacov pred |
Tri Dao
|
73df3be7d5
Add test for BTLM init
|
11 mesiacov pred |
Tri Dao
|
2e29dacf0c
Implement muParam
|
11 mesiacov pred |
Tri Dao
|
2c7d7b7396
Implement norm head for Baichuan2
|
1 rok pred |
Tri Dao
|
c3b2196652
Add Alibi to MHA, test with Baichuan-13B
|
1 rok pred |
Yuchao Dai
|
187c2a0635
Fix E1136 (#563)
|
1 rok pred |
Tri Dao
|
d0032700d1
Add tests for Pythia, GPT-JT, and RedPajama models
|
1 rok pred |
Kevin Hu
|
07005806ff
Add BigCode converters (#532)
|
1 rok pred |
Tri Dao
|
798858f9f1
Fix test_baichuan
|
1 rok pred |
Tri Dao
|
7b33743a72
[Gen] Add back num_last_tokens in gpt.py
|
1 rok pred |
dan_the_3rd
|
011ec323d6
Support MQA + MP for decoding (#490)
|
1 rok pred |
Tri Dao
|
f8aea6ead0
[GPT] Generalize last_token_only arg to num_last_tokens
|
1 rok pred |
Aman Gupta Karmani
|
e0b09891c6
add llama support to GPTPreTrainedModel.from_pretrained (#479)
|
1 rok pred |
Xuechen Li
|
25d6b1dbcb
handle uneven heads across ranks when combining state_dicts; resolves #467 (#468)
|
1 rok pred |
Xuechen Li
|
7fcd3e6a04
map custom model state_dict back to huggingface format (#465)
|
1 rok pred |
Tri Dao
|
f1a73d0740
Run isort and black on python files
|
1 rok pred |
Xuechen Li
|
bb4cded17b
support when num_heads is not divisible by world_size; resolves #459 (#461)
|
1 rok pred |
Tri Dao
|
4b661a569d
[GPT] Run black on gpt.py
|
1 rok pred |
Tri Dao
|
184b992dcb
[GPT] Implement parallel LLaMa
|
1 rok pred |
Haodong Lyu
|
8ee62efca3
Implement ParallelGatedMlp (#251)
|
1 rok pred |
Tri Dao
|
d38357dd2f
[GPT] Implement Falcon
|
1 rok pred |
Tri Dao
|
425dbcb6c6
[MHA] Implement MQA/GQA
|
1 rok pred |
Tri Dao
|
ec9f74ab9a
[Rotary] Don't store inv_freq in state_dict
|
1 rok pred |
Tri Dao
|
75e334d407
[MLP] Add ParallelMLP
|
1 rok pred |
Tri Dao
|
48bc6eacd6
[Gen] Add rotary base as an argument to FT attention kernel
|
1 rok pred |
Federico Berto
|
69f5f7d0a2
[BugFix] cannot unpack non-iterable NoneType object
|
1 rok pred |
Tri Dao
|
a9a4b4e4f2
[LLaMa] Fix last norm layer to use RMSNorm instead of LayerNorm
|
1 rok pred |
Tri Dao
|
ba2fe7f378
[Gen] Move allocate_inference_cache to within the model
|
1 rok pred |