Estimators

Hespas supports multiple compute estimator backends. While the main hespas_chakra_gen tool selects the estimator from the configuration file, each estimator can also be run independently on a StableHLO MLIR file. This is useful for quick experiments, debugging, or comparing estimator results without running the full trace generation flow.

All estimators accept a --config-file for loading parameters from a JSON configuration, or individual command-line flags to override specific settings. They also support a --cache-dir for caching results across runs.

Roofline Estimator

Analytical performance estimator based on the roofline model. Estimates execution time from peak FLOPS and memory bandwidth — no hardware required.

python -m hespas.estimator.roofline_estimator [OPTIONS]

Key options:

--config-file CONFIG_FILE — Configuration file path
--mlir-file MLIR_FILE — MLIR file to estimate
--peak-flops PEAK_FLOPS — Peak FLOPS/s of the target hardware
--memory-bandwidth MEMORY_BANDWIDTH — Peak memory bandwidth in bytes/s
--per-datatype-flops PER_DATATYPE_FLOPS — Peak FLOPS/s dict for each datatype
--cache-dir CACHE_DIR — Cache directory (default: hespas_cache)
--disable-cache — Disable the estimator cache
--in-memory-only-cache — Use in-memory cache only, no files
--num-npus — Number of NPUs to simulate
--error-on-unknown-type — Error on unrecognized datatypes
--warn-on-unknown-type — Warn on unrecognized datatypes

Example:

python -m hespas.estimator.roofline_estimator \
    --config-file tests/fixtures/configs/config_roofline_a100.json

XLA Estimator

Profiling-based estimator using XLA’s HLO runner. Translates StableHLO to HLO and executes on GPU hardware for measured timings. Use this for the most accurate results.

python -m hespas.estimator.xla_estimator [OPTIONS]

Key options:

--config-file CONFIG_FILE — Configuration file path
--mlir-file MLIR_FILE — MLIR file to estimate
--cache-dir CACHE_DIR — Cache directory (default: hespas_cache)
--disable-cache — Disable the estimator cache
--in-memory-only-cache — Use in-memory cache only, no files
--num-npus — Number of NPUs to simulate
--translate TRANSLATE — Translation options
--hlo-runner-main-path PATH — Path to hlo_runner_main binary (default: hlo_runner_main)
--xla-translate-path PATH — Path to xla-translate binary (default: xla-translate)
--sample — Enable sampling mode

Note

Requires an NVIDIA GPU with CUDA, and the xla-translate and hlo_runner_main binaries (typically built from XLA source or available in a JAX-Toolbox Docker image).

Docker Setup

The easiest way to get all XLA estimator dependencies is via Docker. The provided Dockerfile builds an image with xla-translate, hlo_runner_main, and GPU support pre-configured.

Build the image:

docker build -t hespas .

Run with GPU access:

docker run --gpus all -it --rm hespas

Mount your local workspace:

docker run --gpus all -it --rm -v $(pwd):/workspace -w /workspace hespas

IREE Estimator

Compilation-and-profiling estimator using the IREE compiler and runtime. Compiles StableHLO modules and benchmarks them on real hardware (CPU or GPU).

Requires the iree optional dependency:

pip install ".[iree]"

python -m hespas.estimator.iree_estimator [OPTIONS]

Key options:

--config-file CONFIG_FILE — Configuration file path
--mlir-file MLIR_FILE — MLIR file to estimate
--cache-dir CACHE_DIR — Cache directory (default: hespas_cache)
--disable-cache — Disable the estimator cache
--in-memory-only-cache — Use in-memory cache only, no files
--num-npus — Number of NPUs to simulate

Note

Requires iree-compile and iree-benchmark-module binaries to be available on the system PATH.