vllm.model_executor.layers.quantization.utils.nvfp4_emulation_utils ¶
dequantize_to_dtype ¶
dequantize_to_dtype(
tensor_fp4: Tensor,
tensor_sf: Tensor,
global_scale: Tensor,
dtype: dtype,
block_size: int = 16,
swizzle: bool | None = True,
)
Dequantize the fp4 tensor back to high precision.
Supports both 2D and 3D inputs: - 2D: [m, packed_k] -> [m, k] - 3D: [dim0, m, packed_k] -> [dim0, m, k]
Source code in vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py
ref_nvfp4_quant_dequant ¶
NVFP4 quantize-dequantize operation.
global_scale is expected to have a single element.