- [Apple to Apple][INC Reference] Gaudi perf issue
- [Expert Consultation][No reference] Sage attention acc issue
- https://arxiv.org/pdf/2505.11594 dealta_s
- [Human Steer]task.md cross-projects workspace:
- ar - vllm - omni
- ct -> llmc -> vllm: quant primitive -> quant model -> inference model
- setup driver (root) -> user_install_cmd.sh https://github.com/yiliu30/torch-xpu-setup
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| model_path="/dataset/auto-round/qwen_moe/" | |
| taskname=gsm8k | |
| taskname=longbench_hotpotqa | |
| timestamp=$(date +%Y%m%d_%H%M%S) | |
| model_path="/storage/yiliu7/meta-llama/Llama-3.1-8B-Instruct" | |
| output_log_file_name="${taskname}_${timestamp}" | |
| MAX_MODEL_LEN=40960 | |
| max_length=${MAX_MODEL_LEN} | |
| taskname=gsm8k |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| from functools import wraps | |
| # from vllm import envs | |
| from loguru import logger | |
| def with_thread_limits(): | |
| """ | |
| Decorator to temporarily set OMP_NUM_THREADS and PyTorch threads, | |
| and restore them after the function call. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## install uv on OS | |
| curl -LsSf https://astral.sh/uv/install.sh | sh | |
| ## create new project | |
| uv init myproj | |
| ## install packages | |
| uv add django requests "pandas>=2.3" | |
| ## remove package |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| diff --git a/neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py b/neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py | |
| index 69c03d8efb8..f3668018c43 100755 | |
| --- a/neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py | |
| +++ b/neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py | |
| @@ -930,6 +930,22 @@ class PatchedVllmMixtureOfExpertsOp(PatchedModuleBase): | |
| router_weights, | |
| permuted_weights=True, | |
| activation="silu"): | |
| + enable_moe_chunk = hasattr(self.orig_mod, "enable_moe_chunk") and self.orig_mod.enable_moe_chunk | |
| + if not enable_moe_chunk: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| model_path=inc-res/quantized_model_ds_mxfp8/ | |
| # VLLM_ENABLE_AR_EXT=1 \ | |
| # VLLM_AR_MXFP4_MODULAR_MOE=1 \ | |
| # VLLM_ENABLE_AR_EXT=1 \ | |
| # VLLM_MXFP4_PRE_UNPACK_TO_FP8=0 \ | |
| # VLLM_ENABLE_STATIC_MOE=0 \ | |
| # VLLM_MXFP4_PRE_UNPACK_WEIGHTS=0 \ | |
| # VLLM_USE_DEEP_GEMM=0 \ | |
| # VLLM_ENABLE_V1_MULTIPROCESSING=1 \ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| # | |
| # https://docs.docker.com/build/buildkit/ | |
| # https://github.com/docker/buildx/releases/ | |
| # https://github.com/docker/buildx | |
| ## docker builder prune --all | |
| ## docker buildx du --verbose | |
| ## For Ubuntu 24.04 try: sudo apt install docker-buildx |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| diff --git a/neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py b/neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py | |
| index 69c03d8efb8..f3668018c43 100755 | |
| --- a/neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py | |
| +++ b/neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py | |
| @@ -930,6 +930,22 @@ class PatchedVllmMixtureOfExpertsOp(PatchedModuleBase): | |
| router_weights, | |
| permuted_weights=True, | |
| activation="silu"): | |
| + enable_moe_chunk = hasattr(self.orig_mod, "enable_moe_chunk") and self.orig_mod.enable_moe_chunk | |
| + if enable_moe_chunk: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| # Check if a model name is passed as an argument, otherwise use the default model path | |
| if [ -z "$1" ]; then | |
| model_path="Meta-Llama-3-8B-Instruct-W4A16-G128-AutoRound" | |
| else | |
| model_path="$1" | |
| fi | |
| tp_size=1 | |
| model_name=$(basename ${model_path}) |
NewerOlder