Utilities¶
pywatson.utils
¶
PyWatson utilities — path management and HDF5 data handling.
Provides DrWatson.jl-inspired helpers for: - Path management : datadir(), plotsdir(), savename(), … - HDF5 data I/O : save_data(), tagsave(), load_data(), load_selective(), … - Smart caching : produce_or_load()
Key design choices¶
save_data — git info is opt-in (include_git=False by default).
Pass include_git=True to embed commit hash / branch / dirty
flag in the file's metadata.
tagsave — thin alias that always captures git state; equivalent to
save_data(..., include_git=True). Use this when you want
every saved file to be traceable to an exact commit.
Functions¶
collect_results
¶
collect_results(
folder_path: str | None = None,
subdir: str | None = None,
recursive: bool = True,
as_dataframe: bool = False,
) -> list[dict[str, Any]] | pd.DataFrame
Collect all results from .h5 files in a folder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folder_path
|
str | None
|
Explicit path to the folder. Defaults to |
None
|
subdir
|
str | None
|
Subdirectory within |
None
|
recursive
|
bool
|
Whether to search subdirectories recursively (default True). |
True
|
as_dataframe
|
bool
|
Return a |
False
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]] | DataFrame
|
List of data dicts, or a |
list[dict[str, Any]] | DataFrame
|
is |
Source code in src/pywatson/utils.py
current_git_commit
¶
data_info
¶
Get information about a data file without loading all data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Name of the file (with or without .h5 extension) |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with file information |
Source code in src/pywatson/utils.py
datadir
¶
datafile
¶
dict_list
¶
Expand parameter dictionaries into every combination (Cartesian product).
List-valued entries are expanded; scalar entries are broadcast. Accepts multiple dicts that are first merged left-to-right.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*dicts
|
dict
|
One or more parameter dictionaries. Later dicts override earlier keys. List values are expanded; scalars are treated as single-element lists. |
()
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of flat parameter dicts, one per combination. |
Example
dict_list({"alpha": [0.1, 0.5], "N": [100, 1000]})
# [{'alpha': 0.1, 'N': 100}, {'alpha': 0.1, 'N': 1000},
# {'alpha': 0.5, 'N': 100}, {'alpha': 0.5, 'N': 1000}]
dict_list({"model": "euler"}, {"dt": [0.01, 0.001], "T": 10})
# [{'model': 'euler', 'dt': 0.01, 'T': 10},
# {'model': 'euler', 'dt': 0.001, 'T': 10}]
Source code in src/pywatson/utils.py
docsdir
¶
find_project_root
¶
Find the project root directory by looking for pyproject.toml or .git.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_path
|
str | Path | None
|
Starting directory to search from. Defaults to current directory. |
None
|
Returns:
| Type | Description |
|---|---|
Path | None
|
Path to project root or None if not found. |
Source code in src/pywatson/utils.py
get_project_dir
¶
Get path to a project directory (data, plots, scripts, etc.) with optional subdirectories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
directory
|
str
|
Directory name (e.g., 'data', 'plots', 'scripts', 'notebooks') |
required |
*subdirs
|
str
|
Optional subdirectories to append |
()
|
create
|
bool
|
Whether to create the directory if it doesn't exist |
True
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the requested directory |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If project root cannot be found |
Source code in src/pywatson/utils.py
git_status_clean
¶
list_data_files
¶
List all HDF5 data files in the data directory.
load_array
¶
Convenience function to load a single numpy array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Name of the file |
required |
array_name
|
str | None
|
Name of the array in the file (if None, loads first array found) |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Numpy array |
Source code in src/pywatson/utils.py
load_data
¶
Load data from HDF5 file in the data directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Name of the file (with or without .h5 extension) |
required |
keys
|
list | None
|
Optional list of dataset keys to load. If None, loads all datasets. Metadata is always loaded regardless of this parameter. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary containing the loaded data and metadata |
Source code in src/pywatson/utils.py
load_npz
¶
Load a NumPy .npz archive from the data directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Filename with or without |
required |
subdir
|
str | None
|
Subdirectory within |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary of arrays plus |
Source code in src/pywatson/utils.py
load_selective
¶
Load only specific keys from HDF5 file (convenience wrapper for load_data). Metadata is always loaded automatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Name of the file (with or without .h5 extension) |
required |
keys
|
list
|
List of dataset keys to load |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary containing the loaded data and metadata |
Example
Source code in src/pywatson/utils.py
load_zarr
¶
Load arrays from a Zarr store in the data directory.
Requires the zarr package.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Directory name of the Zarr store (with or without |
required |
keys
|
list | None
|
Optional list of dataset keys to load. |
None
|
subdir
|
str | None
|
Subdirectory within |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary of arrays plus |
Source code in src/pywatson/utils.py
notebookfile
¶
notebooksdir
¶
Get path to notebooks directory, optionally with subdirectories.
parse_savename
¶
Parse a filename produced by :func:savename back into a parameter dict.
Performs best-effort type coercion: integer strings become int,
numeric strings become float, everything else stays str.
Keys not in key=value form (e.g. a bare project name prefix) are
silently ignored.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Filename or path string, e.g. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary of parameter key→value pairs. |
Example
Source code in src/pywatson/utils.py
plotfile
¶
plotsdir
¶
produce_or_load
¶
produce_or_load(
filename: str, producing_function: Any, *args: Any, subdir: str | None = None, **kwargs: Any
) -> tuple[dict[str, Any], Path]
Load existing data or produce and save new data (DrWatson.jl-style smart cache).
On the first call the producing function is executed and its result is
saved via :func:tagsave (git info always captured). On every subsequent
call the file is loaded directly — the producing function is not called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Name of the cache file to load from or save to |
required |
producing_function
|
Any
|
Function that returns a |
required |
*args
|
Any
|
Positional arguments forwarded to |
()
|
subdir
|
str | None
|
Optional subdirectory within data/ (keyword-only) |
None
|
**kwargs
|
Any
|
Keyword arguments forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Tuple of |
Path
|
class: |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
Example
Source code in src/pywatson/utils.py
projectdir
¶
Get path to project root directory.
Source code in src/pywatson/utils.py
safesave
¶
safesave(
filename: str,
data: dict[str, Any],
metadata: dict[str, Any] | None = None,
compression: str | None = "gzip",
include_git: bool = False,
subdir: str | None = None,
) -> Path
Atomically save data to an HDF5 file, preventing partial-write corruption.
Writes to a temporary file in the same directory, then renames it to the final destination. If the write fails the original file (if any) is untouched.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Target filename (without extension). |
required |
data
|
dict[str, Any]
|
Data dictionary (same contract as :func: |
required |
metadata
|
dict[str, Any] | None
|
Optional metadata dictionary. |
None
|
compression
|
str | None
|
HDF5 compression algorithm. |
'gzip'
|
include_git
|
bool
|
Embed git state in metadata. |
False
|
subdir
|
str | None
|
Subdirectory within |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the saved file. |
Source code in src/pywatson/utils.py
save_array
¶
Convenience function to save a single numpy array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
array
|
ndarray
|
Numpy array to save |
required |
name
|
str
|
Name for the array (used as filename) |
required |
metadata
|
dict[str, Any] | None
|
Optional metadata |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to saved file |
Source code in src/pywatson/utils.py
save_data
¶
save_data(
data: dict[str, Any],
filename: str,
metadata: dict[str, Any] | None = None,
compression: str | None = "gzip",
include_git: bool = False,
subdir: str | None = None,
) -> Path
Save data to HDF5 file in the data directory with metadata.
Git information is opt-in: pass include_git=True to embed the
current commit hash, branch, and dirty-state flag in the file metadata.
Use :func:tagsave instead if you always want git tracking.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary of data to save (keys become HDF5 groups/datasets).
Values may be numpy arrays, scalars, strings, lists, dicts, or
|
required |
filename
|
str
|
Name of the file (without extension) |
required |
metadata
|
dict[str, Any] | None
|
Optional metadata dictionary |
None
|
compression
|
str | None
|
Compression method ('gzip', 'lzf', 'szip', or None) |
'gzip'
|
include_git
|
bool
|
Whether to include git information in metadata (default: False — opt-in) |
False
|
subdir
|
str | None
|
Optional subdirectory within data/ to save the file in. Created automatically if it does not exist. |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the saved file |
Source code in src/pywatson/utils.py
save_npz
¶
save_npz(
data: dict[str, Any],
filename: str,
metadata: dict[str, Any] | None = None,
compressed: bool = True,
subdir: str | None = None,
) -> Path
Save arrays to a NumPy .npz archive in the data directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary of arrays (values are passed to :func: |
required |
filename
|
str
|
Filename without extension. |
required |
metadata
|
dict[str, Any] | None
|
Metadata dict stored as a |
None
|
compressed
|
bool
|
Use :func: |
True
|
subdir
|
str | None
|
Subdirectory within |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the saved |
Source code in src/pywatson/utils.py
save_zarr
¶
save_zarr(
data: dict[str, Any],
filename: str,
metadata: dict[str, Any] | None = None,
compression: str = "blosc",
subdir: str | None = None,
) -> Path
Save arrays to a Zarr store in the data directory.
Requires the zarr package (uv add zarr or pip install zarr).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary of arrays. |
required |
filename
|
str
|
Directory name for the Zarr store (without extension). |
required |
metadata
|
dict[str, Any] | None
|
Metadata dict stored in the Zarr store's |
None
|
compression
|
str
|
Zarr compressor name ( |
'blosc'
|
subdir
|
str | None
|
Subdirectory within |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the Zarr store directory. |
Source code in src/pywatson/utils.py
savename
¶
savename(
d: dict,
suffix: str = ".h5",
connector: str = "_",
access: Any | None = None,
digits: int = 3,
ignore_keys: list | None = None,
) -> str
Create a filename from a dictionary, similar to DrWatson's savename.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
dict
|
Dictionary with parameter values. |
required |
suffix
|
str
|
File suffix to be appended. |
'.h5'
|
connector
|
str
|
String used to join key-value pairs. |
'_'
|
access
|
Any | None
|
Function to access specific properties of values. |
None
|
digits
|
int
|
Number of significant digits for floats (default: 3). |
3
|
ignore_keys
|
list | None
|
List of keys to exclude from filename. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Formatted filename. |
Example
Source code in src/pywatson/utils.py
scriptfile
¶
scriptsdir
¶
set_random_seed
¶
Set random seeds for reproducibility and return a metadata-ready dict.
Sets seeds for Python's built-in :mod:random module and NumPy. If
PyTorch is installed its seed is set too.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed
|
int
|
Integer seed value. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
Dictionary |
Example
Source code in src/pywatson/utils.py
snapshot_environment
¶
Capture the current Python environment for reproducibility.
Returns a dictionary with Python version, platform, and installed packages
(as reported by pip list). Safe to embed in HDF5 metadata.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with keys |
Source code in src/pywatson/utils.py
srcdir
¶
tagsave
¶
Save data with git state and custom tags (DrWatson.jl-style alias).
Equivalent to save_data(data, filename, metadata=tags, include_git=True).
Use this whenever you want every file to carry an exact git commit hash,
branch name, and dirty-state flag — e.g. for parameter sweeps where
reproducibility is critical.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Name of the file (with or without .h5 extension) |
required |
data
|
dict[str, Any]
|
Data dictionary to save |
required |
tags
|
dict[str, Any] | None
|
Additional tags to include in metadata (merged with git info) |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the saved file |
Example
Source code in src/pywatson/utils.py
testsdir
¶
tmpsave
¶
tmpsave(
data: dict[str, Any], suffix: str = ".h5", compression: str | None = "gzip"
) -> Generator[Path, None, None]
Context manager: save data to a temporary file, yield its path, then delete it.
Useful for testing or one-off intermediate results that should not persist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Data dictionary. |
required |
suffix
|
str
|
File suffix (default |
'.h5'
|
compression
|
str | None
|
HDF5 compression. |
'gzip'
|
Yields:
| Type | Description |
|---|---|
Path
|
class: |