AlpinDale fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
..
E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=14336,device_name=AMD_Instinct_MI300X.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=1792,device_name=AMD_Instinct_MI300X.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=3584,device_name=AMD_Instinct_MI300X.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=7168,device_name=AMD_Instinct_MI300X.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu
E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json fca911ee0a vLLM Upstream Sync (#526) 6 bulan lalu
README f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 bulan lalu

README

This directory contains tuned configurations for different settings of the fused_moe kernel.
For different settings of
- E (number of experts)
- N (intermediate size)
- device_name (torch.cuda.get_device_name())
the JSON file contains a mapping from M (batch size) to the chosen configuration.

Mixtral has intermediate size N = 14336, i.e. for TP2 we have
N = 7168 and for TP4 we have N = 3584.