Qwix: A JAX-native Quantization Library#
Qwix is a Jax quantization library for both research and production.
Why Qwix#
Qwix is the only Jax quantization solution that supports QT, PTQ on TPU/GPU, and ODML quantization in one library.
Qwix has well tested static range quantization (SRQ) support.
In ODML quantization mode, Qwix can quantize every op in the model, and export a full integer network.
Qwix integrates with models without need to modify the model code.
Qwix is open and extensible. Building new algorithms on Qwix is easy.
Features#
Supported schemas:
Weight-only quantization.
Dynamic-range quantization.
Static-range quantization.
Supported modes:
QT: this mode allows for training with quantization, with support for both fake quantization (QAT) and true quantized training with quantized backpropagation (QT).
PTQ: this mode achieves the best serving performance on XLA devices such as TPU and GPU.
ODML: this mode adds proper annotation to the model so that the LiteRT converter could produce full integer models.
LoRA/QLoRA: this mode enables LoRA and QLoRA on a model.
Supported numerics:
Native:
int4,int8,fp8.Emulated:
int1toint7,nf4.
Supported array calibration methods:
absmax: symmetric quantization using maximum absolute value.minmax: asymmetric quantization using minimum and maximum values.rms: symmetric quantization using root mean square, also known as “MSE Quant”.fixed: fixed range.
Supported Jax ops and their quantization granularity:
XLA:
conv_general_dilated: per-channel.dot_generalandeinsum: per-channel and sub-channel.
LiteRT:
conv,matmul, andfully_connected: per-channel.Other ops available in LiteRT: per-tensor.
Integration with any Flax Linen or NNX models via a single function call.
Relation with AQT#
The design of Qwix was inspired by AQT and borrowed many great ideas from it. Here’s a brief list of the similarities and the differences.
Qwix’s
QArrayis similar to AQT’sQTensor, both supporting sub-channel quantization.Both AQT and Qwix support quantized training. Qwix supports both Quantization-Aware Training (QAT) with fake quantization and Quantized Training (QT) with quantized forward and backward passes.
AQT provides drop-in replacements for
einsumanddot_general, each of these having to be configured separately. Qwix provides addtional mechanisms to integrate with a whole model implicitly.Applying static-range quantization is easier in Qwix as it has more in-depth support with Flax.
Contents
Core API
Providers API