Estimators
Hespas supports multiple compute estimator backends. While the main hespas_chakra_gen
tool selects the estimator from the configuration file, each estimator can also be run
independently on a StableHLO MLIR file. This is useful for quick experiments, debugging,
or comparing estimator results without running the full trace generation flow.
All estimators accept a --config-file for loading parameters from a JSON configuration,
or individual command-line flags to override specific settings. They also support a
--cache-dir for caching results across runs.
Roofline Estimator
Analytical performance estimator based on the roofline model. Estimates execution time from peak FLOPS and memory bandwidth — no hardware required.
python -m hespas.estimator.roofline_estimator [OPTIONS]
Key options:
--config-file CONFIG_FILE— Configuration file path--mlir-file MLIR_FILE— MLIR file to estimate--peak-flops PEAK_FLOPS— Peak FLOPS/s of the target hardware--memory-bandwidth MEMORY_BANDWIDTH— Peak memory bandwidth in bytes/s--per-datatype-flops PER_DATATYPE_FLOPS— Peak FLOPS/s dict for each datatype--cache-dir CACHE_DIR— Cache directory (default:hespas_cache)--disable-cache— Disable the estimator cache--in-memory-only-cache— Use in-memory cache only, no files--num-npus— Number of NPUs to simulate--error-on-unknown-type— Error on unrecognized datatypes--warn-on-unknown-type— Warn on unrecognized datatypes
Example:
python -m hespas.estimator.roofline_estimator \
--config-file tests/fixtures/configs/config_roofline_a100.json
XLA Estimator
Profiling-based estimator using XLA’s HLO runner. Translates StableHLO to HLO and executes on GPU hardware for measured timings. Use this for the most accurate results.
python -m hespas.estimator.xla_estimator [OPTIONS]
Key options:
--config-file CONFIG_FILE— Configuration file path--mlir-file MLIR_FILE— MLIR file to estimate--cache-dir CACHE_DIR— Cache directory (default:hespas_cache)--disable-cache— Disable the estimator cache--in-memory-only-cache— Use in-memory cache only, no files--num-npus— Number of NPUs to simulate--translate TRANSLATE— Translation options--hlo-runner-main-path PATH— Path tohlo_runner_mainbinary (default:hlo_runner_main)--xla-translate-path PATH— Path toxla-translatebinary (default:xla-translate)--sample— Enable sampling mode
Note
Requires an NVIDIA GPU with CUDA, and the xla-translate and hlo_runner_main
binaries (typically built from XLA source or available in a JAX-Toolbox Docker image).
Docker Setup
The easiest way to get all XLA estimator dependencies is via Docker. The provided
Dockerfile builds an image with xla-translate, hlo_runner_main, and GPU support
pre-configured.
Build the image:
docker build -t hespas .
Run with GPU access:
docker run --gpus all -it --rm hespas
Mount your local workspace:
docker run --gpus all -it --rm -v $(pwd):/workspace -w /workspace hespas
IREE Estimator
Compilation-and-profiling estimator using the IREE compiler and runtime. Compiles StableHLO modules and benchmarks them on real hardware (CPU or GPU).
Requires the iree optional dependency:
pip install ".[iree]"
python -m hespas.estimator.iree_estimator [OPTIONS]
Key options:
--config-file CONFIG_FILE— Configuration file path--mlir-file MLIR_FILE— MLIR file to estimate--cache-dir CACHE_DIR— Cache directory (default:hespas_cache)--disable-cache— Disable the estimator cache--in-memory-only-cache— Use in-memory cache only, no files--num-npus— Number of NPUs to simulate
Note
Requires iree-compile and iree-benchmark-module binaries to be available on
the system PATH.