API

The tracker module contains the Tracker class which can alternatively be imported directly from the gpu_tracker package.

class gpu_tracker.tracker.Tracker(sleep_time: float = 1.0, ram_unit: str = 'gigabytes', gpu_ram_unit: str = 'gigabytes', time_unit: str = 'hours', n_expected_cores: int = None, gpu_uuids: set[str] = None, disable_logs: bool = False, process_id: int = None, resource_usage_file: str | None = None, n_join_attempts: int = 5, join_timeout: float = 10.0, gpu_brand: str | None = None)[source]

Runs a sub-process that tracks computational resources of the calling process. Including the compute time, maximum CPU utilization, mean CPU utilization, maximum RAM, and maximum GPU RAM used within a context manager or explicit calls to start() and stop() methods. Calculated quantities are scaled depending on the units chosen for them (e.g. megabytes vs. gigabytes, hours vs. days, etc.).

Variables:

resource_usage (ResourceUsage) – Data class containing the computational resource usage data collected by the tracking process.

Parameters:
  • sleep_time (float) – The number of seconds to sleep in between usage-collection iterations.

  • ram_unit (str) – One of ‘bytes’, ‘kilobytes’, ‘megabytes’, ‘gigabytes’, or ‘terabytes’.

  • gpu_ram_unit (str) – One of ‘bytes’, ‘kilobytes’, ‘megabytes’, ‘gigabytes’, or ‘terabytes’.

  • time_unit (str) – One of ‘seconds’, ‘minutes’, ‘hours’, or ‘days’.

  • n_expected_cores (int) – The number of cores expected to be used during tracking (e.g. number of processes spawned, number of parallelized threads, etc.). Used as the denominator when calculating the hardware percentages of the CPU utilization (except for system-wide CPU utilization which always divides by all the cores in the system). Defaults to all the cores in the system.

  • gpu_uuids (set[str]) – The UUIDs of the GPUs to track utilization for. The length of this set is used as the denominator when calculating the hardware percentages of the GPU utilization (i.e. n_expected_gpus). Defaults to all the GPUs in the system.

  • disable_logs (bool) – If set, warnings are suppressed during tracking. Otherwise, the Tracker logs warnings as usual.

  • process_id (int) – The ID of the process to track. Defaults to the current process.

  • resource_usage_file (str | None) – The file path to the pickle file containing the resource_usage attribute. This file is automatically deleted and the resource_usage attribute is set in memory if the tracking successfully completes. But if the tracking is interrupted, the tracking information will be saved in this file as a backup. Defaults to a randomly generated file name in the current working directory of the format .gpu-tracker_<random UUID>.pkl.

  • n_join_attempts (int) – The number of times the tracker attempts to join its underlying sub-process.

  • join_timeout (float) – The amount of time the tracker waits for its underlying sub-process to join.

  • gpu_brand (str | None) – The brand of GPU to profile. Valid values are “nvidia” and “amd”. Defaults to the brand of GPU detected in the system, checking Nvidia first.

Raises:

ValueError – Raised if invalid units are provided.

class State(*values)[source]

The state of the Tracker.

NEW = 0
STARTED = 1
STOPPED = 2
start()[source]

Begins tracking for the duration of time until stop() is called. Equivalent to entering the context manager.

stop()[source]

Stop tracking. Equivalent to exiting the context manager.

__str__() str[source]

Constructs a string representation of the computational-resource-usage measurements and their units.

Return type:

str

to_json() dict[str, dict][source]

Constructs a dictionary of the computational-resource-usage measurements and their units.

Return type:

dict[str, dict]

class gpu_tracker.tracker.RSSValues(total_rss: float = 0.0, private_rss: float = 0.0, shared_rss: float = 0.0)[source]

The resident set size (RSS) i.e. memory used by a process or processes.

Parameters:
  • total_rss (float) – The sum of private_rss and shared_rss.

  • private_rss (float) – The RAM usage exclusive to a process.

  • shared_rss (float) – The RAM usage of a process shared with at least one other process.

total_rss: float = 0.0
private_rss: float = 0.0
shared_rss: float = 0.0
class gpu_tracker.tracker.MaxRAM(unit: str, system_capacity: float, system: float = 0.0, main: ~gpu_tracker.tracker.RSSValues = <factory>, descendants: ~gpu_tracker.tracker.RSSValues = <factory>, combined: ~gpu_tracker.tracker.RSSValues = <factory>)[source]

Information related to RAM including the maximum RAM used over a period of time.

Parameters:
  • unit (str) – The unit of measurement for RAM e.g. gigabytes.

  • system_capacity (float) – A constant value for the RAM capacity of the entire operating system.

  • system (float) – The RAM usage across the entire operating system.

  • main (RSSValues) – The RAM usage of the main process.

  • descendants (RSSValues) – The summed RAM usage of the descendant processes (i.e. child processes, grandchild processes, etc.).

  • combined (RSSValues) – The summed RAM usage of both the main process and any descendant processes it may have.

unit: str
system_capacity: float
system: float = 0.0
main: RSSValues
descendants: RSSValues
combined: RSSValues
class gpu_tracker.tracker.MaxGPURAM(unit: str, system_capacity: float, system: float = 0.0, main: float = 0.0, descendants: float = 0.0, combined: float = 0.0)[source]

Information related to GPU RAM including the maximum GPU RAM used over a period of time.

Parameters:
  • unit (str) – The unit of measurement for GPU RAM e.g. gigabytes.

  • system_capacity (float) – A constant value for the GPU RAM capacity of all the GPUs in the system.

  • system (float) – The GPU RAM usage of all the GPUs in the system.

  • main (float) – The GPU RAM usage of the main process.

  • descendants (float) – The summed GPU RAM usage of the descendant processes (i.e. child processes, grandchild processes, etc.).

  • combined (float) – The summed GPU RAM usage of both the main process and any descendant processes it may have.

unit: str
system_capacity: float
system: float = 0.0
main: float = 0.0
descendants: float = 0.0
combined: float = 0.0
class gpu_tracker.tracker.ProcessingUnitPercentages(max_sum_percent: float = 0.0, max_hardware_percent: float = 0.0, mean_sum_percent: float = 0.0, mean_hardware_percent: float = 0.0)[source]

Utilization percentages of one or more processing units (i.e. GPUs or CPU cores). Max refers to the highest value measured over a duration of time. Mean refers to the average of the measured values during this time. Sum refers to the sum of the percentages of the processing units involved. If there is only one unit in question, this is the percentage of just that unit. Hardware refers to this sum divided by the number of units involved. If there is only one unit in question, this is the same as the sum.

Parameters:
  • max_sum_percent (float) – The maximum sum of utilization percentages of the processing units at any given time.

  • max_hardware_percent (float) – The maximum utilization percentage of the group of units as a whole (i.e. max_sum_percent divided by the number of units involved).

  • mean_sum_percent (float) – The mean sum of utilization percentages of the processing units used by the process(es) over time.

  • mean_hardware_percent (float) – The mean utilization percentage of the group of units as a whole (i.e. mean_sum_percent divided by the number of units involved).

max_sum_percent: float = 0.0
max_hardware_percent: float = 0.0
mean_sum_percent: float = 0.0
mean_hardware_percent: float = 0.0
class gpu_tracker.tracker.CPUUtilization(system_core_count: int, n_expected_cores: int, system: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, main: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, descendants: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, combined: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, main_n_threads: int = 0, descendants_n_threads: int = 0, combined_n_threads: int = 0)[source]

Information related to CPU usage, including core utilization percentages of the main process and any descendant processes it may have as well as system-wide utilization. The system hardware utilization percentages are strictly divided by the total number of cores in the system while that of the main, descendant, and combined processes can be divided by the expected number of cores used in a task.

Parameters:
  • system_core_count (int) – The number of cores available to the entire operating system.

  • n_expected_cores (int) – The number of cores expected to be used by the main process and/or any descendant processes it may have.

  • system (ProcessingUnitPercentages) – The utilization percentages of all the cores in the entire operating system.

  • main (ProcessingUnitPercentages) – The utilization percentages of the cores used by the main process.

  • descendants (ProcessingUnitPercentages) – The utilization percentages summed across descendant processes (i.e. child processes, grandchild processes, etc.).

  • combined (ProcessingUnitPercentages) – The utilization percentages summed across both the descendant processes and the main process.

  • main_n_threads (int) – The maximum detected number of threads used by the main process at any time.

  • descendants_n_threads (int) – The maximum sum of threads used across the descendant processes at any time.

  • combined_n_threads (int) – The maximum sum of threads used by both the main and descendant processes.

system_core_count: int
n_expected_cores: int
system: ProcessingUnitPercentages
main: ProcessingUnitPercentages
descendants: ProcessingUnitPercentages
combined: ProcessingUnitPercentages
main_n_threads: int = 0
descendants_n_threads: int = 0
combined_n_threads: int = 0
class gpu_tracker.tracker.GPUUtilization(system_gpu_count: int, n_expected_gpus: int, gpu_percentages: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>)[source]

Utilization percentages of one or more GPUs being tracked. Hardware percentages are the summed percentages divided by the number of GPUs being tracked.

Parameters:
  • system_gpu_count (int) – The number of GPUs in the system.

  • n_expected_gpus (int) – The number of GPUs to be tracked (e.g. GPUs actually used while there may be other GPUs in the system).

  • gpu_percentages (ProcessingUnitPercentages) – The utilization percentages of the GPU(s) being tracked.

system_gpu_count: int
n_expected_gpus: int
gpu_percentages: ProcessingUnitPercentages
class gpu_tracker.tracker.ComputeTime(unit: str, time: float = 0.0)[source]

The time it takes for a task to complete.

Parameters:
  • unit (str) – The unit of measurement for compute time e.g. hours.

  • time (float) – The real compute time.

unit: str
time: float = 0.0
class gpu_tracker.tracker.ResourceUsage(max_ram: MaxRAM, max_gpu_ram: MaxGPURAM, cpu_utilization: CPUUtilization, gpu_utilization: GPUUtilization, compute_time: ComputeTime)[source]

Contains data for computational resource usage.

Parameters:
  • max_ram (MaxRAM) – The maximum RAM used at any point while tracking.

  • max_gpu_ram (MaxGPURAM) – The maximum GPU RAM used at any point while tracking.

  • cpu_utilization (CPUUtilization) – Core counts, utilization percentages of cores and maximum number of threads used while tracking.

  • gpu_utilization (GPUUtilization) – GPU counts and utilization percentages of the GPU(s).

  • compute_time (ComputeTime) – The real time spent tracking.

max_ram: MaxRAM
max_gpu_ram: MaxGPURAM
cpu_utilization: CPUUtilization
gpu_utilization: GPUUtilization
compute_time: ComputeTime