- [Apple to Apple][INC Reference] Gaudi perf issue
- [Expert Consultation][No reference] Sage attention acc issue
- https://arxiv.org/pdf/2505.11594 dealta_s
- [Human Steer]task.md cross-projects workspace:
- ar - vllm - omni
- ct -> llmc -> vllm: quant primitive -> quant model -> inference model
- setup driver (root) -> user_install_cmd.sh https://github.com/yiliu30/torch-xpu-setup
- Tools(Agent): web, vscode, copilot cli, vscode(claude agent)...
- Skills: how to create skills and examples
- Others
Boundary, Steer
mxfp4-decompress task file
Address: vllm-project/compressed-tensors#680
In addition to the unit test, we need an end-to-end verification on the
llm-compressorside:/home/yiliu7/workspace/llm-compressor/experimental/mxfp4/qwen3_mxfp4.py.transformersand run a short generation example. The output must be reasonable, even though it is a small model.Local Dev
compressed-tensor: /home/yiliu7/workspace/venvs/ct/bin/
vllm: /home/yiliu7/workspace/venvs/vllm/bin/vllm
Note:
compressed-tensorsis a standalone repository that provides quantization primitives forllm-compressor.Ref:
llm-compressor:/home/yiliu7/workspace/llm-compressor/compressed-tensors:/home/yiliu7/workspace/compressed-tensorsvllm:/home/yiliu7/workspace/