API

gpu_tracker.tracker

The tracker module contains the Tracker class which can alternatively be imported directly from the gpu_tracker package.

class gpu_tracker.tracker.Tracker(sleep_time: float = 1.0, ram_unit: str = 'gigabytes', gpu_ram_unit: str = 'gigabytes', time_unit: str = 'hours', n_expected_cores: int = None, gpu_uuids: set[str] = None, disable_logs: bool = False, process_id: int = None, resource_usage_file: str | None = None, n_join_attempts: int = 5, join_timeout: float = 10.0, gpu_brand: str | None = None, tracking_file: str | None = None, overwrite: bool = False)[source]

Runs a sub-process that tracks computational resources of the calling process. Including the compute time, maximum CPU utilization, mean CPU utilization, maximum RAM, and maximum GPU RAM used within a context manager or explicit calls to start() and stop() methods. Calculated quantities are scaled, depending on the units chosen for them (e.g. megabytes vs. gigabytes, hours vs. days, etc.).

Variables:

resource_usage (ResourceUsage) – Data class containing the computational resource usage data collected by the tracking process.

Parameters:

sleep_time (float) – The number of seconds to sleep in between usage-collection iterations.
ram_unit (str) – One of ‘bytes’, ‘kilobytes’, ‘megabytes’, ‘gigabytes’, or ‘terabytes’.
gpu_ram_unit (str) – One of ‘bytes’, ‘kilobytes’, ‘megabytes’, ‘gigabytes’, or ‘terabytes’.
time_unit (str) – One of ‘seconds’, ‘minutes’, ‘hours’, or ‘days’.
n_expected_cores (int) – The number of cores expected to be used during tracking (e.g. number of processes spawned, number of parallelized threads, etc.). Used as the denominator when calculating the hardware percentages of the CPU utilization (except for system-wide CPU utilization which always divides by all the cores in the system). Defaults to all the cores in the system.
gpu_uuids (set[str]) – The UUIDs of the GPUs to track utilization for. The length of this set is used as the denominator when calculating the hardware percentages of the GPU utilization (i.e. n_expected_gpus). Defaults to all the GPUs in the system.
disable_logs (bool) – If set, warnings are suppressed during tracking. Otherwise, the Tracker logs warnings as usual.
process_id (int) – The ID of the process to track. Defaults to the current process.
resource_usage_file (str | None) – The file path to the pickle file containing the resource_usage attribute. This file is automatically deleted and the resource_usage attribute is set in memory if the tracking successfully completes. But if the tracking is interrupted, the tracking information will be saved in this file as a backup. Defaults to a randomly generated file name in the current working directory of the format .gpu-tracker_<random UUID>.pkl.
n_join_attempts (int) – The number of times the tracker attempts to join its underlying sub-process.
join_timeout (float) – The amount of time the tracker waits for its underlying sub-process to join.
gpu_brand (str | None) – The brand of GPU to profile. Valid values are “nvidia” and “amd”. Defaults to the brand of GPU detected in the system, checking Nvidia first.
tracking_file (str | None) – If specified, stores the individual resource usage measurements at each iteration. Valid file formats are CSV (.csv) and SQLite (.sqlite) where the SQLite file format stores the data in a table called “data” and allows for more efficient querying.
overwrite (bool) – Whether to overwrite the tracking_file if it already existed before the beginning of this tracking session.

Raises:

ValueError – Raised if invalid arguments are provided.

class State(*values)[source]

The state of the Tracker.

NEW = 0

STARTED = 1

STOPPED = 2

start()[source]: Begins tracking for the duration of time until stop() is called. Equivalent to entering the context manager.

stop()[source]: Stop tracking. Equivalent to exiting the context manager.

__str__() → str[source]

Constructs a string representation of the computational-resource-usage measurements and their units.

Return type:: str

to_json() → dict[str, dict][source]

Constructs a dictionary of the computational-resource-usage measurements and their units.

Return type:: dict[str, dict]

class gpu_tracker.tracker.RSSValues(total_rss: float = 0.0, private_rss: float = 0.0, shared_rss: float = 0.0)[source]

The resident set size (RSS) i.e. memory used by a process or processes.

Parameters:

total_rss (float) – The sum of private_rss and shared_rss.
private_rss (float) – The RAM usage exclusive to a process.
shared_rss (float) – The RAM usage of a process shared with at least one other process.

total_rss: float = 0.0

private_rss: float = 0.0

shared_rss: float = 0.0

class gpu_tracker.tracker.MaxRAM(unit: str, system_capacity: float, system: float = 0.0, main: ~gpu_tracker.tracker.RSSValues = <factory>, descendants: ~gpu_tracker.tracker.RSSValues = <factory>, combined: ~gpu_tracker.tracker.RSSValues = <factory>)[source]

Information related to RAM including the maximum RAM used over a period of time.

Parameters:

unit (str) – The unit of measurement for RAM e.g. gigabytes.
system_capacity (float) – A constant value for the RAM capacity of the entire operating system.
system (float) – The RAM usage across the entire operating system.
main (RSSValues) – The RAM usage of the main process.
descendants (RSSValues) – The summed RAM usage of the descendant processes (i.e. child processes, grandchild processes, etc.).
combined (RSSValues) – The summed RAM usage of both the main process and any descendant processes it may have.

unit: str

system_capacity: float

system: float = 0.0

main: RSSValues

descendants: RSSValues

combined: RSSValues

class gpu_tracker.tracker.MaxGPURAM(unit: str, system_capacity: float, system: float = 0.0, main: float = 0.0, descendants: float = 0.0, combined: float = 0.0)[source]

Information related to GPU RAM including the maximum GPU RAM used over a period of time.

Parameters:

unit (str) – The unit of measurement for GPU RAM e.g. gigabytes.
system_capacity (float) – A constant value for the GPU RAM capacity of all the GPUs in the system.
system (float) – The GPU RAM usage of all the GPUs in the system.
main (float) – The GPU RAM usage of the main process.
descendants (float) – The summed GPU RAM usage of the descendant processes (i.e. child processes, grandchild processes, etc.).
combined (float) – The summed GPU RAM usage of both the main process and any descendant processes it may have.

unit: str

system_capacity: float

system: float = 0.0

main: float = 0.0

descendants: float = 0.0

combined: float = 0.0

class gpu_tracker.tracker.ProcessingUnitPercentages(max_sum_percent: float = 0.0, max_hardware_percent: float = 0.0, mean_sum_percent: float = 0.0, mean_hardware_percent: float = 0.0)[source]

Utilization percentages of one or more processing units (i.e. GPUs or CPU cores). Max refers to the highest value measured over a duration of time. Mean refers to the average of the measured values during this time. Sum refers to the sum of the percentages of the processing units involved. If there is only one unit in question, this is the percentage of just that unit. Hardware refers to this sum divided by the number of units involved. If there is only one unit in question, this is the same as the sum.

Parameters:

max_sum_percent (float) – The maximum sum of utilization percentages of the processing units at any given time.
max_hardware_percent (float) – The maximum utilization percentage of the group of units as a whole (i.e. max_sum_percent divided by the number of units involved).
mean_sum_percent (float) – The mean sum of utilization percentages of the processing units used by the process(es) over time.
mean_hardware_percent (float) – The mean utilization percentage of the group of units as a whole (i.e. mean_sum_percent divided by the number of units involved).

max_sum_percent: float = 0.0

max_hardware_percent: float = 0.0

mean_sum_percent: float = 0.0

mean_hardware_percent: float = 0.0

class gpu_tracker.tracker.CPUUtilization(system_core_count: int, n_expected_cores: int, system: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, main: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, descendants: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, combined: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, main_n_threads: int = 0, descendants_n_threads: int = 0, combined_n_threads: int = 0)[source]

Information related to CPU usage, including core utilization percentages of the main process and any descendant processes it may have as well as system-wide utilization. The system hardware utilization percentages are strictly divided by the total number of cores in the system while that of the main, descendant, and combined processes can be divided by the expected number of cores used in a task.

Parameters:

system_core_count (int) – The number of cores available to the entire operating system.
n_expected_cores (int) – The number of cores expected to be used by the main process and/or any descendant processes it may have.
system (ProcessingUnitPercentages) – The utilization percentages of all the cores in the entire operating system.
main (ProcessingUnitPercentages) – The utilization percentages of the cores used by the main process.
descendants (ProcessingUnitPercentages) – The utilization percentages summed across descendant processes (i.e. child processes, grandchild processes, etc.).
combined (ProcessingUnitPercentages) – The utilization percentages summed across both the descendant processes and the main process.
main_n_threads (int) – The maximum detected number of threads used by the main process at any time.
descendants_n_threads (int) – The maximum sum of threads used across the descendant processes at any time.
combined_n_threads (int) – The maximum sum of threads used by both the main and descendant processes.

system_core_count: int

n_expected_cores: int

system: ProcessingUnitPercentages

main: ProcessingUnitPercentages

descendants: ProcessingUnitPercentages

combined: ProcessingUnitPercentages

main_n_threads: int = 0

descendants_n_threads: int = 0

combined_n_threads: int = 0

class gpu_tracker.tracker.GPUUtilization(system_gpu_count: int, n_expected_gpus: int, gpu_percentages: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>)[source]

Utilization percentages of one or more GPUs being tracked. Hardware percentages are the summed percentages divided by the number of GPUs being tracked.

Parameters:

system_gpu_count (int) – The number of GPUs in the system.
n_expected_gpus (int) – The number of GPUs to be tracked (e.g. GPUs actually used while there may be other GPUs in the system).
gpu_percentages (ProcessingUnitPercentages) – The utilization percentages of the GPU(s) being tracked.

system_gpu_count: int

n_expected_gpus: int

gpu_percentages: ProcessingUnitPercentages

class gpu_tracker.tracker.ComputeTime(unit: str, time: float = 0.0)[source]

The time it takes for a task to complete.

Parameters:

unit (str) – The unit of measurement for compute time e.g. hours.
time (float) – The real compute time.

unit: str

time: float = 0.0

class gpu_tracker.tracker.ResourceUsage(max_ram: MaxRAM, max_gpu_ram: MaxGPURAM, cpu_utilization: CPUUtilization, gpu_utilization: GPUUtilization, compute_time: ComputeTime)[source]

Contains data for computational resource usage.

Parameters:

max_ram (MaxRAM) – The maximum RAM used at any point while tracking.
max_gpu_ram (MaxGPURAM) – The maximum GPU RAM used at any point while tracking.
cpu_utilization (CPUUtilization) – Core counts, utilization percentages of cores and maximum number of threads used while tracking.
gpu_utilization (GPUUtilization) – GPU counts and utilization percentages of the GPU(s).
compute_time (ComputeTime) – The real time spent tracking.

max_ram: MaxRAM

max_gpu_ram: MaxGPURAM

cpu_utilization: CPUUtilization

gpu_utilization: GPUUtilization

compute_time: ComputeTime

gpu_tracker.sub_tracker

The sub_tracker module contains the SubTracker class which can alternatively be imported directly from the gpu_tracker package. Additionally, it contains the SubTrackingAnalyzer class which generates the SubTrackingResults from the data produced by the SubTracker and finally the TrackingComparison which generates the ComparisonResults comparing the SubTrackingResults of multiple tracking sessions.

class gpu_tracker.sub_tracker.SubTracker(code_block_name: str | None = None, code_block_attribute: str | None = None, sub_tracking_file: str | None = None, overwrite: bool = False)[source]

Context manager that logs to a file for the purposes of sub tracking a code block using the timestamps at which the codeblock begins and ends. Entering the context manager marks the beginning of the code block and exiting the context manager marks the end of the code block. At the beginning of the codeblock, the SubTracker logs a row to a tabular file (“.csv” or “.sqlite”) that includes the timestamp along with a name for the code block and an indication of whether it is the start or end of the code bock. This resulting file can be used alongside a tracking file created by a Tracker object for more granular analysis of specific code blocks.

Variables:

code_block_name (str) – The name of the code block being sub-tracked.
sub_tracking_file (str) – The path to the file where the sub-tracking info is logged.

Parameters:

code_block_name (str | None) – The name of the code block within a Tracker context that is being sub-tracked. Defaults to the file path where the SubTracker context is started followed by a colon followed by the code_block_attribute.
code_block_attribute (str | None) – Only used if code_block_name is None. Defaults to the line number where the SubTracker context is started.
sub_tracking_file (str | None) – The path to the file to log the time stamps of the code block being sub-tracked. To avoid file lock errors when a sub-tracking file is created in multiple different processes (i.e. multiple processes attempting to access the same file at the same time), the sub-tracking file of each process must have a unique name. For example, the ID of the process where the SubTracker context is created. Defaults to this process ID as the file name and in CSV format. These files can be combined into one using the Analyzer.combine_sub_tracking_files function.
overwrite (bool) – Whether to overwrite the sub_tracking_file if it already existed before the beginning of this tracking session.

gpu_tracker.sub_tracker.sub_track(code_block_name: str | None = None, code_block_attribute: str | None = None, sub_tracking_file: str | None = None, overwrite: bool = False)[source]

Decorator for sub tracking calls to a specified function. Creates a SubTracker context that wraps the function call.

Parameters:

code_block_name (str | None) – The code_block_name argument passed to the SubTracker. Defaults to the file path where the decorated function is defined followed by a colon followed by the code_block_attribute.
code_block_attribute (str | None) – The code_block_attribute argument passed to the SubTracker. Defaults to the name of the decorated function.
sub_tracking_file (str | None) – the sub_tracking_file argument passed to the SubTracker. Same default as the SubTracker constructor. If using the decorated function in multiprocessing, if you’d like to name it based on the ID of a child process for uniqueness, you may need to set the start method to “spawn” like so multiprocessing.set_start_method('spawn').
overwrite (bool) – The overwrite argument passed to the SubTracker.

class gpu_tracker.sub_tracker.SubTrackingAnalyzer(tracking_file: str | None, sub_tracking_file: str)[source]

Analyzes the per-code block tracking data using a tracking file and sub tracking file in order to produce summary statistics of resource usage for each individual code block.

Parameters:

tracking_file (str | None) – Path to the file containing the resource usage at each timepoint collected by a Tracker object.
sub_tracking_file (str) – Path to the file containing the start/stop timestamps of each call to a code block collected by a SubTracker object.

read_static_data() → Series[source]

Reads the static data from the tracking file, including the resource units of measurement and system capacities.

Returns:: The static data.
Return type:: Series

load_code_block_names() → list[str][source]

Loads the list of the names of the code blocks that were sub-tracked.

Returns:: The code block names.
Return type:: list[str]

combine_sub_tracking_files(files: list[str])[source]

Combines multiple sub-tracking files, perhaps that came from multiple processes running simultaneously, into a single sub-tracking file.

Parameters:: files (list[str]) – The list of sub-tracking files to combine. All must end in the same file extension i.e. either “.csv” or “.sqlite”.

load_timestamp_pairs(code_block_name: str) → list[tuple[float, float]][source]

Loads the pairs of start and stop timestamps for each call to a code block that was sub-tracked.

Parameters:: code_block_name (str) – The name of the code block to get timestamp pairs for.
Returns:: List of timestamp pairs.
Return type:: list[tuple[float, float]]

load_timepoints(timestamp_pairs: list[tuple[float, float]]) → DataFrame[source]

Loads the resource usage measurements at each timepoint tracked within the timestamp pairs of a given code block.

Parameters:: timestamp_pairs (list[tuple[float, float]]) – The list of start and stop timestamp pairs of the code block.
Returns:: The timepoint measurements.
Return type:: DataFrame

overall_timepoint_results() → DataFrame[source]

Computes summary statistics for resource measurements across all tracked timepoints as compared to an individual sub-tracked code block.

Returns:: Summary statistics across all timepoints.
Return type:: DataFrame

sub_tracking_results() → SubTrackingResults[source]

Generates a detailed report including summary statistics for the overall resource usage across all timepoints as well as that of each code block that was sub-tracked.

Returns:: A data object containing the overall summary statistics, summary statistics for each code block, the static data, etc.
Return type:: SubTrackingResults

class gpu_tracker.sub_tracker.TrackingComparison(file_path_map: dict[str, str])[source]

Compares multiple tracking sessions to determine differences in computational resource usage by loading sub-tracking results given their file paths. Sub-tracking results files must be in pickle format e.g. calling the SubTrackingAnalyzer.compare method and storing the returned SubTrackingResults in a pickle file. If code block results are not included in the sub-tracking files (i.e. no code blocks were sub-tracked), then only overall results are compared. Code blocks are compared by their name. If their name only differentiates by line number (i.e. their name is of the form <file-path:line-number>), then it’s assumed that the same order of the code blocks is used even if the line numbers are different. This is useful to determine how resource usage changes based on differences in implementation, input data, etc.

Variables:: results_map (dict[str, SubTrackingResults]) – Mapping of the name of each tracking session to the SubTrackingResults of the corresponding tracking sessions. Can be used for a user-defined custom comparison.
Parameters:: file_path_map (dict[str, str]) – Mapping of the name of each tracking session to the path of the pickle file containing the SubTrackingResults of the corresponding tracking sessions. Used to construct the results_map attribute.
Raises:: ValueError – Raised if the code block results of each tracking session don’t match.

compare(statistic: str = 'mean') → ComparisonResults[source]

Performs the comparison between tracking sessions, comparing both the code block results and the overall results. :param statistic: The summary statistic of the measurements to compare. One of ‘min’, ‘max’, ‘mean’, or ‘std’. :return: The results of the comparison including the overall resource usage, the resource usage of the code blocks, and the compute time of the code blocks for each tracking session.

Parameters:: statistic (str)
Return type:: ComparisonResults

class gpu_tracker.sub_tracker.CodeBlockResults(name: str, num_timepoints: int, num_calls: int, num_non_empty_calls: int, compute_time: Series, resource_usage: DataFrame)[source]

Results of a particular code block that was sub-tracked.

Parameters:

name (str) – The name of the code block.
num_timepoints (int) – The number of timepoints tracked across all calls to the code block.
num_calls (int) – The number times the code block was called / executed.
num_non_empty_calls (int) – The number code block calls with at least one timepoint tracked within the start / stop time.
compute_time (Series) – Compute time measurements for the code block including the total time spent running this code block, the average time between the start / stop time, etc.
resource_usage (DataFrame) – Summary statistics for the resource usage during the times the code block was called i.e. in between all its start / stop times

name: str

num_timepoints: int

num_calls: int

num_non_empty_calls: int

compute_time: Series

resource_usage: DataFrame

class gpu_tracker.sub_tracker.SubTrackingResults(overall: DataFrame, static_data: Series, code_block_results: list[CodeBlockResults])[source]

Comprehensive results for a tracking session including resource usage measurements for individual code blocks.

Parameters:

overall (DataFrame) – The overall summary statistics across all timepoints tracked.
static_data (Series) – The static data measured during a tracking session.
code_block_results (list[CodeBlockResults]) – Results for individual code blocks including summary statistics for the timepoints within each code block.

overall: DataFrame

static_data: Series

code_block_results: list[CodeBlockResults]

to_json() → dict[source]

Converts the sub-tracking results into JSON format.

Returns:: The JSON version of the sub-tracking results.
Return type:: dict

__str__() → str[source]

Converts the sub-tracking results to text format.

Returns:: The string representation of the sub-tracking results.
Return type:: str

class gpu_tracker.sub_tracker.ComparisonResults(overall_resource_usage: dict[str, Series], code_block_resource_usage: dict[str, dict[str, Series]], code_block_compute_time: dict[str, Series])[source]

Contains the comparison of the measurements of multiple tracking sessions provided by the TrackingComparison class’s compare method.

Parameters:

overall_resource_usage (dict[str, Series]) – For each measurement, compares the resource usage across tracking sessions.
code_block_resource_usage (dict[str, dict[str, Series]]) – For each measurement and for each code block, compares the resource usage of the code block across tracking sessions.
code_block_compute_time (dict[str, Series]) – For each code block, compares the compute time of the code block across tracking sessions.

overall_resource_usage: dict[str, Series]

code_block_resource_usage: dict[str, dict[str, Series]]

code_block_compute_time: dict[str, Series]

to_json() → dict[source]

Converts the tracking comparison results into JSON format.

Returns:: The JSON version of the comparison results.
Return type:: dict

__str__() → str[source]

Converts the tracking comparison results to text format.

Returns:: The string representation of the comparison results.
Return type:: str