Usage

blackboxml provides three tracking modes: a decorator for generator-based loops, a context manager for imperative logging, and a Keras callback for model.fit(). All three produce the same JSON output. This page covers each mode and shows how to use them with PyTorch, Keras, scikit-learn, and plain Python.

@track decorator

The @track decorator wraps a generator function. Each yield becomes a logged step.

from blackboxml import track, MetricStore

@track(name="resnet_cifar10", tags=["pytorch", "cifar10"])
def train():
    metrics = MetricStore()
    for epoch in range(10):
        metrics.reset()
        for batch in dataloader:
            loss, acc = train_step(batch)
            metrics.update({"loss": loss, "acc": acc}, n=len(batch))
        yield metrics.compute()

results = train()

Parameters

Parameter	Type	Default	Description
`name`	`str`	required	Name for the run. Used in the directory name and stored in the JSON.
`tags`	`list[str] \\| None`	`None`	Tags to associate with the run. Useful for filtering runs later.
`log_dir`	`str`	`"blackboxml_logs"`	Base directory for saving run data.

How it works

Calls your function to get a generator.
Opens a Run context manager internally.
Iterates over yielded dicts, logging each one as a step.
On exit, saves the full run to disk.
Returns a list of all yielded metric dicts.

If the decorated function is not a generator (doesn't yield), blackboxml logs a warning and returns an empty list.

Yielded values

Each yielded value should be a dict[str, float]. Common patterns:

# Simple - yield a dict directly
yield {"loss": 0.42, "acc": 0.91}

# With MetricStore - yield weighted epoch averages
yield metrics.compute()

# With validation - yield both in one dict
yield {"train_loss": t_loss, "val_loss": v_loss, "val_acc": v_acc}

Run context manager

For more control, use Run directly. Call run.log() each time you have metrics to record.

from blackboxml import Run

with Run(name="resnet_cifar10", tags=["pytorch"]) as run:
    for epoch in range(10):
        loss = train_one_epoch()
        val_loss = validate()
        run.log({"loss": loss, "val_loss": val_loss, "epoch": epoch})

Parameters

Parameter	Type	Default	Description
`name`	`str`	required	Run name.
`tags`	`list[str] \\| None`	`None`	Optional tags.
`log_dir`	`str`	`"blackboxml_logs"`	Base directory for logs.

Methods

`run.log(metrics: dict[str, float]) -> None`

Appends a metric dict as a new step. Call once per epoch or at whatever granularity you want.

run.log({"loss": 0.42, "acc": 0.91})
run.log({"loss": 0.38, "acc": 0.93})

Lifecycle

__enter__ records start time and captures environment metadata (git, Python, frameworks, hostname).
__exit__ records end time, computes duration, and saves the run as JSON.
Exceptions are not suppressed. If your training code raises, the context manager exits cleanly and re-raises.

MetricStore

MetricStore accumulates batch-level metrics into epoch-level weighted averages. It's pure Python with no framework dependencies.

from blackboxml import MetricStore

metrics = MetricStore()

for epoch in range(10):
    metrics.reset()
    for batch in dataloader:
        loss, acc = train_step(batch)
        metrics.update({"loss": loss, "acc": acc}, n=len(batch))

    epoch_avg = metrics.compute()
    # {"loss": 0.42, "acc": 0.91}  - weighted by batch size

Methods

`update(metrics: dict[str, float], n: int = 1) -> None`

Accumulate a batch of metrics weighted by batch size n.

# Each call adds value * n to the running weighted sum
metrics.update({"loss": 0.5, "acc": 0.8}, n=32)
metrics.update({"loss": 0.4, "acc": 0.9}, n=32)

The n parameter matters when batch sizes vary (e.g. the last batch in an epoch is smaller). Without it, small batches would be overweighted.

`compute() -> dict[str, float]`

Returns the weighted average for each metric since the last reset.

Raises ValueError if called before any update() call.

`reset() -> None`

Zeros all accumulators. Call at the start of each epoch.

Typical pattern

metrics = MetricStore()
for epoch in range(num_epochs):
    metrics.reset()                          # clear for this epoch
    for batch in dataloader:
        loss = train_step(batch)
        metrics.update({"loss": loss}, n=len(batch))  # accumulate
    yield metrics.compute()                  # epoch average

Keras callback

BlackBoxCallback plugs into model.fit() and logs per-epoch metrics automatically.

from blackboxml.callback import BlackBoxCallback

callback = BlackBoxCallback(
    name="lstm_nlp",
    tags=["keras", "nlp"],
    log_dir="blackboxml_logs"
)

model.fit(
    x_train, y_train,
    validation_data=(x_val, y_val),
    epochs=10,
    callbacks=[callback]
)

Parameters

Parameter	Type	Default	Description
`name`	`str`	required	Run name.
`tags`	`list[str] \\| None`	`None`	Optional tags.
`log_dir`	`str`	`"blackboxml_logs"`	Base directory for logs.

What it captures

Per-epoch metrics including loss, accuracy, val_loss, val_accuracy, and any custom metrics from model.compile()
Model metadata including model name, parameter count, optimizer name, and learning rate
Environment including Python version and TensorFlow version
Timing including start time, end time, and duration

Requirements

Requires TensorFlow. Install with:

pip install blackboxml[keras]

Raises ImportError at instantiation if TensorFlow is not installed.

scikit-learn

scikit-learn doesn't have an epoch-based training loop, but you can log cross-validation scores, grid search results, or any per-configuration metrics. Use the Run context manager:

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from blackboxml import Run

with Run(name="rf_grid_search", tags=["sklearn", "random-forest"]) as run:
    for n_trees in [50, 100, 200, 500]:
        clf = RandomForestClassifier(n_estimators=n_trees)
        scores = cross_val_score(clf, X, y, cv=5, scoring="accuracy")
        run.log({
            "n_estimators": n_trees,
            "mean_accuracy": scores.mean(),
            "std_accuracy": scores.std()
        })

Or use @track with a generator to log each configuration as a step:

from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from blackboxml import track

@track(name="svm_kernel_search", tags=["sklearn", "svm"])
def search():
    for kernel in ["linear", "rbf", "poly"]:
        clf = SVC(kernel=kernel)
        scores = cross_val_score(clf, X, y, cv=5)
        yield {"kernel_acc": scores.mean()}

search()

Plain Python

blackboxml has no framework dependencies for core tracking. A hand-written gradient descent loop works the same way:

from blackboxml import Run

with Run(name="gradient_descent", tags=["scratch"]) as run:
    w = 0.0
    for step in range(100):
        grad = compute_gradient(w)
        w -= 0.01 * grad
        if step % 10 == 0:
            run.log({"step": step, "loss": compute_loss(w), "weight": w})

Custom log directory

All three tracking modes accept a log_dir parameter:

@track(name="my_run", log_dir="/data/experiments")
def train(): ...

with Run(name="my_run", log_dir="/data/experiments") as run: ...

BlackBoxCallback(name="my_run", log_dir="/data/experiments")

Runs are saved to <log_dir>/<name>_<timestamp>/run.json.