Qwix: A JAX-native Quantization Library#

Qwix is a Jax quantization library for both research and production.

Why Qwix#

Qwix is the only Jax quantization solution that supports QT, PTQ on TPU/GPU, and ODML quantization in one library.
Qwix has well tested static range quantization (SRQ) support.
In ODML quantization mode, Qwix can quantize every op in the model, and export a full integer network.
Qwix integrates with models without need to modify the model code.
Qwix is open and extensible. Building new algorithms on Qwix is easy.

Supported schemas:
- Weight-only quantization.
- Dynamic-range quantization.
- Static-range quantization.
Supported modes:
- QT: this mode allows for training with quantization, with support for both fake quantization (QAT) and true quantized training with quantized backpropagation (QT).
- PTQ: this mode achieves the best serving performance on XLA devices such as TPU and GPU.
- ODML: this mode adds proper annotation to the model so that the LiteRT converter could produce full integer models.
- LoRA/QLoRA: this mode enables LoRA and QLoRA on a model.
Supported algorithms:
- GPTQ
- AWQ (Activation-aware Weight Quantization)
- SmoothQuant
Supported numerics:
- Native: int4, int8, fp8.
- Emulated: int1 to int7, nf4.
Supported array calibration methods:
- absmax: symmetric quantization using maximum absolute value.
- minmax: asymmetric quantization using minimum and maximum values.
- rms: symmetric quantization using root mean square, also known as “MSE Quant”.
- fixed: fixed range.
Supported Jax ops and their quantization granularity:
- XLA:
  - conv_general_dilated: per-channel.
  - dot_general and einsum: per-channel and sub-channel.
- LiteRT:
  - conv, matmul, and fully_connected: per-channel.
  - Other ops available in LiteRT: per-tensor.
Supported offline quantization modes:
- NNX:
  - QT: int4, int8, fp8
  - PTQ: int4, int8, fp8
Integration with any Flax Linen or NNX models via a single function call.

The design of Qwix was inspired by AQT and borrowed many great ideas from it. Here’s a brief list of the similarities and the differences.

Qwix’s QArray is similar to AQT’s QTensor, both supporting sub-channel quantization.
Both AQT and Qwix support quantized training. Qwix supports both Quantization-Aware Training (QAT) with fake quantization and Quantized Training (QT) with quantized forward and backward passes.
AQT provides drop-in replacements for einsum and dot_general, each of these having to be configured separately. Qwix provides addtional mechanisms to integrate with a whole model implicitly.
Applying static-range quantization is easier in Qwix as it has more in-depth support with Flax.

Contents

Core API

Providers API

Supported Algorithms