API
The tracker
module contains the Tracker
class which can alternatively be imported directly from the gpu_tracker
package.
- class gpu_tracker.tracker.Tracker(sleep_time: float = 1.0, ram_unit: str = 'gigabytes', gpu_ram_unit: str = 'gigabytes', time_unit: str = 'hours', n_expected_cores: int = None, gpu_uuids: set[str] = None, disable_logs: bool = False, process_id: int = None, resource_usage_file: str | None = None, n_join_attempts: int = 5, join_timeout: float = 10.0, gpu_brand: str | None = None)[source]
Runs a sub-process that tracks computational resources of the calling process. Including the compute time, maximum CPU utilization, mean CPU utilization, maximum RAM, and maximum GPU RAM used within a context manager or explicit calls to
start()
andstop()
methods. Calculated quantities are scaled depending on the units chosen for them (e.g. megabytes vs. gigabytes, hours vs. days, etc.).- Variables:
resource_usage (ResourceUsage) – Data class containing the computational resource usage data collected by the tracking process.
- Parameters:
sleep_time (float) – The number of seconds to sleep in between usage-collection iterations.
ram_unit (str) – One of ‘bytes’, ‘kilobytes’, ‘megabytes’, ‘gigabytes’, or ‘terabytes’.
gpu_ram_unit (str) – One of ‘bytes’, ‘kilobytes’, ‘megabytes’, ‘gigabytes’, or ‘terabytes’.
time_unit (str) – One of ‘seconds’, ‘minutes’, ‘hours’, or ‘days’.
n_expected_cores (int) – The number of cores expected to be used during tracking (e.g. number of processes spawned, number of parallelized threads, etc.). Used as the denominator when calculating the hardware percentages of the CPU utilization (except for system-wide CPU utilization which always divides by all the cores in the system). Defaults to all the cores in the system.
gpu_uuids (set[str]) – The UUIDs of the GPUs to track utilization for. The length of this set is used as the denominator when calculating the hardware percentages of the GPU utilization (i.e. n_expected_gpus). Defaults to all the GPUs in the system.
disable_logs (bool) – If set, warnings are suppressed during tracking. Otherwise, the Tracker logs warnings as usual.
process_id (int) – The ID of the process to track. Defaults to the current process.
resource_usage_file (str | None) – The file path to the pickle file containing the
resource_usage
attribute. This file is automatically deleted and theresource_usage
attribute is set in memory if the tracking successfully completes. But if the tracking is interrupted, the tracking information will be saved in this file as a backup. Defaults to a randomly generated file name in the current working directory of the format.gpu-tracker_<random UUID>.pkl
.n_join_attempts (int) – The number of times the tracker attempts to join its underlying sub-process.
join_timeout (float) – The amount of time the tracker waits for its underlying sub-process to join.
gpu_brand (str | None) – The brand of GPU to profile. Valid values are “nvidia” and “amd”. Defaults to the brand of GPU detected in the system, checking Nvidia first.
- Raises:
ValueError – Raised if invalid units are provided.
- start()[source]
Begins tracking for the duration of time until
stop()
is called. Equivalent to entering the context manager.
- class gpu_tracker.tracker.RSSValues(total_rss: float = 0.0, private_rss: float = 0.0, shared_rss: float = 0.0)[source]
The resident set size (RSS) i.e. memory used by a process or processes.
- Parameters:
- class gpu_tracker.tracker.MaxRAM(unit: str, system_capacity: float, system: float = 0.0, main: ~gpu_tracker.tracker.RSSValues = <factory>, descendants: ~gpu_tracker.tracker.RSSValues = <factory>, combined: ~gpu_tracker.tracker.RSSValues = <factory>)[source]
Information related to RAM including the maximum RAM used over a period of time.
- Parameters:
unit (str) – The unit of measurement for RAM e.g. gigabytes.
system_capacity (float) – A constant value for the RAM capacity of the entire operating system.
system (float) – The RAM usage across the entire operating system.
main (RSSValues) – The RAM usage of the main process.
descendants (RSSValues) – The summed RAM usage of the descendant processes (i.e. child processes, grandchild processes, etc.).
combined (RSSValues) – The summed RAM usage of both the main process and any descendant processes it may have.
- class gpu_tracker.tracker.MaxGPURAM(unit: str, system_capacity: float, system: float = 0.0, main: float = 0.0, descendants: float = 0.0, combined: float = 0.0)[source]
Information related to GPU RAM including the maximum GPU RAM used over a period of time.
- Parameters:
unit (str) – The unit of measurement for GPU RAM e.g. gigabytes.
system_capacity (float) – A constant value for the GPU RAM capacity of all the GPUs in the system.
system (float) – The GPU RAM usage of all the GPUs in the system.
main (float) – The GPU RAM usage of the main process.
descendants (float) – The summed GPU RAM usage of the descendant processes (i.e. child processes, grandchild processes, etc.).
combined (float) – The summed GPU RAM usage of both the main process and any descendant processes it may have.
- class gpu_tracker.tracker.ProcessingUnitPercentages(max_sum_percent: float = 0.0, max_hardware_percent: float = 0.0, mean_sum_percent: float = 0.0, mean_hardware_percent: float = 0.0)[source]
Utilization percentages of one or more processing units (i.e. GPUs or CPU cores). Max refers to the highest value measured over a duration of time. Mean refers to the average of the measured values during this time. Sum refers to the sum of the percentages of the processing units involved. If there is only one unit in question, this is the percentage of just that unit. Hardware refers to this sum divided by the number of units involved. If there is only one unit in question, this is the same as the sum.
- Parameters:
max_sum_percent (float) – The maximum sum of utilization percentages of the processing units at any given time.
max_hardware_percent (float) – The maximum utilization percentage of the group of units as a whole (i.e. max_sum_percent divided by the number of units involved).
mean_sum_percent (float) – The mean sum of utilization percentages of the processing units used by the process(es) over time.
mean_hardware_percent (float) – The mean utilization percentage of the group of units as a whole (i.e. mean_sum_percent divided by the number of units involved).
- class gpu_tracker.tracker.CPUUtilization(system_core_count: int, n_expected_cores: int, system: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, main: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, descendants: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, combined: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, main_n_threads: int = 0, descendants_n_threads: int = 0, combined_n_threads: int = 0)[source]
Information related to CPU usage, including core utilization percentages of the main process and any descendant processes it may have as well as system-wide utilization. The system hardware utilization percentages are strictly divided by the total number of cores in the system while that of the main, descendant, and combined processes can be divided by the expected number of cores used in a task.
- Parameters:
system_core_count (int) – The number of cores available to the entire operating system.
n_expected_cores (int) – The number of cores expected to be used by the main process and/or any descendant processes it may have.
system (ProcessingUnitPercentages) – The utilization percentages of all the cores in the entire operating system.
main (ProcessingUnitPercentages) – The utilization percentages of the cores used by the main process.
descendants (ProcessingUnitPercentages) – The utilization percentages summed across descendant processes (i.e. child processes, grandchild processes, etc.).
combined (ProcessingUnitPercentages) – The utilization percentages summed across both the descendant processes and the main process.
main_n_threads (int) – The maximum detected number of threads used by the main process at any time.
descendants_n_threads (int) – The maximum sum of threads used across the descendant processes at any time.
combined_n_threads (int) – The maximum sum of threads used by both the main and descendant processes.
- system: ProcessingUnitPercentages
- descendants: ProcessingUnitPercentages
- combined: ProcessingUnitPercentages
- class gpu_tracker.tracker.GPUUtilization(system_gpu_count: int, n_expected_gpus: int, gpu_percentages: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>)[source]
Utilization percentages of one or more GPUs being tracked. Hardware percentages are the summed percentages divided by the number of GPUs being tracked.
- Parameters:
system_gpu_count (int) – The number of GPUs in the system.
n_expected_gpus (int) – The number of GPUs to be tracked (e.g. GPUs actually used while there may be other GPUs in the system).
gpu_percentages (ProcessingUnitPercentages) – The utilization percentages of the GPU(s) being tracked.
- gpu_percentages: ProcessingUnitPercentages
- class gpu_tracker.tracker.ComputeTime(unit: str, time: float = 0.0)[source]
The time it takes for a task to complete.
- Parameters:
- class gpu_tracker.tracker.ResourceUsage(max_ram: MaxRAM, max_gpu_ram: MaxGPURAM, cpu_utilization: CPUUtilization, gpu_utilization: GPUUtilization, compute_time: ComputeTime)[source]
Contains data for computational resource usage.
- Parameters:
max_ram (MaxRAM) – The maximum RAM used at any point while tracking.
max_gpu_ram (MaxGPURAM) – The maximum GPU RAM used at any point while tracking.
cpu_utilization (CPUUtilization) – Core counts, utilization percentages of cores and maximum number of threads used while tracking.
gpu_utilization (GPUUtilization) – GPU counts and utilization percentages of the GPU(s).
compute_time (ComputeTime) – The real time spent tracking.
- cpu_utilization: CPUUtilization
- gpu_utilization: GPUUtilization
- compute_time: ComputeTime