API
gpu_tracker.tracker
The tracker
module contains the Tracker
class which can alternatively be imported directly from the gpu_tracker
package.
- class gpu_tracker.tracker.Tracker(sleep_time: float = 1.0, ram_unit: str = 'gigabytes', gpu_ram_unit: str = 'gigabytes', time_unit: str = 'hours', n_expected_cores: int = None, gpu_uuids: set[str] = None, disable_logs: bool = False, process_id: int = None, resource_usage_file: str | None = None, n_join_attempts: int = 5, join_timeout: float = 10.0, gpu_brand: str | None = None, tracking_file: str | None = None, overwrite: bool = False)[source]
Runs a sub-process that tracks computational resources of the calling process. Including the compute time, maximum CPU utilization, mean CPU utilization, maximum RAM, and maximum GPU RAM used within a context manager or explicit calls to
start()
andstop()
methods. Calculated quantities are scaled, depending on the units chosen for them (e.g. megabytes vs. gigabytes, hours vs. days, etc.).- Variables:
resource_usage (ResourceUsage) – Data class containing the computational resource usage data collected by the tracking process.
- Parameters:
sleep_time (float) – The number of seconds to sleep in between usage-collection iterations.
ram_unit (str) – One of ‘bytes’, ‘kilobytes’, ‘megabytes’, ‘gigabytes’, or ‘terabytes’.
gpu_ram_unit (str) – One of ‘bytes’, ‘kilobytes’, ‘megabytes’, ‘gigabytes’, or ‘terabytes’.
time_unit (str) – One of ‘seconds’, ‘minutes’, ‘hours’, or ‘days’.
n_expected_cores (int) – The number of cores expected to be used during tracking (e.g. number of processes spawned, number of parallelized threads, etc.). Used as the denominator when calculating the hardware percentages of the CPU utilization (except for system-wide CPU utilization which always divides by all the cores in the system). Defaults to all the cores in the system.
gpu_uuids (set[str]) – The UUIDs of the GPUs to track utilization for. The length of this set is used as the denominator when calculating the hardware percentages of the GPU utilization (i.e. n_expected_gpus). Defaults to all the GPUs in the system.
disable_logs (bool) – If set, warnings are suppressed during tracking. Otherwise, the Tracker logs warnings as usual.
process_id (int) – The ID of the process to track. Defaults to the current process.
resource_usage_file (str | None) – The file path to the pickle file containing the
resource_usage
attribute. This file is automatically deleted and theresource_usage
attribute is set in memory if the tracking successfully completes. But if the tracking is interrupted, the tracking information will be saved in this file as a backup. Defaults to a randomly generated file name in the current working directory of the format.gpu-tracker_<random UUID>.pkl
.n_join_attempts (int) – The number of times the tracker attempts to join its underlying sub-process.
join_timeout (float) – The amount of time the tracker waits for its underlying sub-process to join.
gpu_brand (str | None) – The brand of GPU to profile. Valid values are “nvidia” and “amd”. Defaults to the brand of GPU detected in the system, checking Nvidia first.
tracking_file (str | None) – If specified, stores the individual resource usage measurements at each iteration. Valid file formats are CSV (.csv) and SQLite (.sqlite) where the SQLite file format stores the data in a table called “data” and allows for more efficient querying.
overwrite (bool) – Whether to overwrite the
tracking_file
if it already existed before the beginning of this tracking session.
- Raises:
ValueError – Raised if invalid arguments are provided.
- start()[source]
Begins tracking for the duration of time until
stop()
is called. Equivalent to entering the context manager.
- class gpu_tracker.tracker.RSSValues(total_rss: float = 0.0, private_rss: float = 0.0, shared_rss: float = 0.0)[source]
The resident set size (RSS) i.e. memory used by a process or processes.
- Parameters:
- class gpu_tracker.tracker.MaxRAM(unit: str, system_capacity: float, system: float = 0.0, main: ~gpu_tracker.tracker.RSSValues = <factory>, descendants: ~gpu_tracker.tracker.RSSValues = <factory>, combined: ~gpu_tracker.tracker.RSSValues = <factory>)[source]
Information related to RAM including the maximum RAM used over a period of time.
- Parameters:
unit (str) – The unit of measurement for RAM e.g. gigabytes.
system_capacity (float) – A constant value for the RAM capacity of the entire operating system.
system (float) – The RAM usage across the entire operating system.
main (RSSValues) – The RAM usage of the main process.
descendants (RSSValues) – The summed RAM usage of the descendant processes (i.e. child processes, grandchild processes, etc.).
combined (RSSValues) – The summed RAM usage of both the main process and any descendant processes it may have.
- class gpu_tracker.tracker.MaxGPURAM(unit: str, system_capacity: float, system: float = 0.0, main: float = 0.0, descendants: float = 0.0, combined: float = 0.0)[source]
Information related to GPU RAM including the maximum GPU RAM used over a period of time.
- Parameters:
unit (str) – The unit of measurement for GPU RAM e.g. gigabytes.
system_capacity (float) – A constant value for the GPU RAM capacity of all the GPUs in the system.
system (float) – The GPU RAM usage of all the GPUs in the system.
main (float) – The GPU RAM usage of the main process.
descendants (float) – The summed GPU RAM usage of the descendant processes (i.e. child processes, grandchild processes, etc.).
combined (float) – The summed GPU RAM usage of both the main process and any descendant processes it may have.
- class gpu_tracker.tracker.ProcessingUnitPercentages(max_sum_percent: float = 0.0, max_hardware_percent: float = 0.0, mean_sum_percent: float = 0.0, mean_hardware_percent: float = 0.0)[source]
Utilization percentages of one or more processing units (i.e. GPUs or CPU cores). Max refers to the highest value measured over a duration of time. Mean refers to the average of the measured values during this time. Sum refers to the sum of the percentages of the processing units involved. If there is only one unit in question, this is the percentage of just that unit. Hardware refers to this sum divided by the number of units involved. If there is only one unit in question, this is the same as the sum.
- Parameters:
max_sum_percent (float) – The maximum sum of utilization percentages of the processing units at any given time.
max_hardware_percent (float) – The maximum utilization percentage of the group of units as a whole (i.e. max_sum_percent divided by the number of units involved).
mean_sum_percent (float) – The mean sum of utilization percentages of the processing units used by the process(es) over time.
mean_hardware_percent (float) – The mean utilization percentage of the group of units as a whole (i.e. mean_sum_percent divided by the number of units involved).
- class gpu_tracker.tracker.CPUUtilization(system_core_count: int, n_expected_cores: int, system: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, main: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, descendants: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, combined: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>, main_n_threads: int = 0, descendants_n_threads: int = 0, combined_n_threads: int = 0)[source]
Information related to CPU usage, including core utilization percentages of the main process and any descendant processes it may have as well as system-wide utilization. The system hardware utilization percentages are strictly divided by the total number of cores in the system while that of the main, descendant, and combined processes can be divided by the expected number of cores used in a task.
- Parameters:
system_core_count (int) – The number of cores available to the entire operating system.
n_expected_cores (int) – The number of cores expected to be used by the main process and/or any descendant processes it may have.
system (ProcessingUnitPercentages) – The utilization percentages of all the cores in the entire operating system.
main (ProcessingUnitPercentages) – The utilization percentages of the cores used by the main process.
descendants (ProcessingUnitPercentages) – The utilization percentages summed across descendant processes (i.e. child processes, grandchild processes, etc.).
combined (ProcessingUnitPercentages) – The utilization percentages summed across both the descendant processes and the main process.
main_n_threads (int) – The maximum detected number of threads used by the main process at any time.
descendants_n_threads (int) – The maximum sum of threads used across the descendant processes at any time.
combined_n_threads (int) – The maximum sum of threads used by both the main and descendant processes.
- system: ProcessingUnitPercentages
- descendants: ProcessingUnitPercentages
- combined: ProcessingUnitPercentages
- class gpu_tracker.tracker.GPUUtilization(system_gpu_count: int, n_expected_gpus: int, gpu_percentages: ~gpu_tracker.tracker.ProcessingUnitPercentages = <factory>)[source]
Utilization percentages of one or more GPUs being tracked. Hardware percentages are the summed percentages divided by the number of GPUs being tracked.
- Parameters:
system_gpu_count (int) – The number of GPUs in the system.
n_expected_gpus (int) – The number of GPUs to be tracked (e.g. GPUs actually used while there may be other GPUs in the system).
gpu_percentages (ProcessingUnitPercentages) – The utilization percentages of the GPU(s) being tracked.
- gpu_percentages: ProcessingUnitPercentages
- class gpu_tracker.tracker.ComputeTime(unit: str, time: float = 0.0)[source]
The time it takes for a task to complete.
- Parameters:
- class gpu_tracker.tracker.ResourceUsage(max_ram: MaxRAM, max_gpu_ram: MaxGPURAM, cpu_utilization: CPUUtilization, gpu_utilization: GPUUtilization, compute_time: ComputeTime)[source]
Contains data for computational resource usage.
- Parameters:
max_ram (MaxRAM) – The maximum RAM used at any point while tracking.
max_gpu_ram (MaxGPURAM) – The maximum GPU RAM used at any point while tracking.
cpu_utilization (CPUUtilization) – Core counts, utilization percentages of cores and maximum number of threads used while tracking.
gpu_utilization (GPUUtilization) – GPU counts and utilization percentages of the GPU(s).
compute_time (ComputeTime) – The real time spent tracking.
- cpu_utilization: CPUUtilization
- gpu_utilization: GPUUtilization
- compute_time: ComputeTime
gpu_tracker.sub_tracker
The sub_tracker
module contains the SubTracker
class which can alternatively be imported directly from the gpu_tracker
package. Additionally, it contains the SubTrackingAnalyzer
class which generates the SubTrackingResults
from the data produced by the SubTracker
and finally the TrackingComparison
which generates the ComparisonResults
comparing the SubTrackingResults
of multiple tracking sessions.
- class gpu_tracker.sub_tracker.SubTracker(code_block_name: str | None = None, code_block_attribute: str | None = None, sub_tracking_file: str | None = None, overwrite: bool = False)[source]
Context manager that logs to a file for the purposes of sub tracking a code block using the timestamps at which the codeblock begins and ends. Entering the context manager marks the beginning of the code block and exiting the context manager marks the end of the code block. At the beginning of the codeblock, the
SubTracker
logs a row to a tabular file (“.csv” or “.sqlite”) that includes the timestamp along with a name for the code block and an indication of whether it is the start or end of the code bock. This resulting file can be used alongside a tracking file created by aTracker
object for more granular analysis of specific code blocks.- Variables:
- Parameters:
code_block_name (str | None) – The name of the code block within a
Tracker
context that is being sub-tracked. Defaults to the file path where theSubTracker
context is started followed by a colon followed by thecode_block_attribute
.code_block_attribute (str | None) – Only used if
code_block_name
isNone
. Defaults to the line number where theSubTracker
context is started.sub_tracking_file (str | None) – The path to the file to log the time stamps of the code block being sub-tracked. To avoid file lock errors when a sub-tracking file is created in multiple different processes (i.e. multiple processes attempting to access the same file at the same time), the sub-tracking file of each process must have a unique name. For example, the ID of the process where the SubTracker context is created. Defaults to this process ID as the file name and in CSV format. These files can be combined into one using the
Analyzer.combine_sub_tracking_files
function.overwrite (bool) – Whether to overwrite the
sub_tracking_file
if it already existed before the beginning of this tracking session.
- gpu_tracker.sub_tracker.sub_track(code_block_name: str | None = None, code_block_attribute: str | None = None, sub_tracking_file: str | None = None, overwrite: bool = False)[source]
Decorator for sub tracking calls to a specified function. Creates a
SubTracker
context that wraps the function call.- Parameters:
code_block_name (str | None) – The
code_block_name
argument passed to theSubTracker
. Defaults to the file path where the decorated function is defined followed by a colon followed by thecode_block_attribute
.code_block_attribute (str | None) – The
code_block_attribute
argument passed to theSubTracker
. Defaults to the name of the decorated function.sub_tracking_file (str | None) – the
sub_tracking_file
argument passed to theSubTracker
. Same default as theSubTracker
constructor. If using the decorated function in multiprocessing, if you’d like to name it based on the ID of a child process for uniqueness, you may need to set the start method to “spawn” like somultiprocessing.set_start_method('spawn')
.overwrite (bool) – The
overwrite
argument passed to theSubTracker
.
- class gpu_tracker.sub_tracker.SubTrackingAnalyzer(tracking_file: str | None, sub_tracking_file: str)[source]
Analyzes the per-code block tracking data using a tracking file and sub tracking file in order to produce summary statistics of resource usage for each individual code block.
- Parameters:
- read_static_data() Series [source]
Reads the static data from the tracking file, including the resource units of measurement and system capacities.
- Returns:
The static data.
- Return type:
Series
- load_code_block_names() list[str] [source]
Loads the list of the names of the code blocks that were sub-tracked.
- combine_sub_tracking_files(files: list[str])[source]
Combines multiple sub-tracking files, perhaps that came from multiple processes running simultaneously, into a single sub-tracking file.
- load_timestamp_pairs(code_block_name: str) list[tuple[float, float]] [source]
Loads the pairs of start and stop timestamps for each call to a code block that was sub-tracked.
- load_timepoints(timestamp_pairs: list[tuple[float, float]]) DataFrame [source]
Loads the resource usage measurements at each timepoint tracked within the timestamp pairs of a given code block.
- overall_timepoint_results() DataFrame [source]
Computes summary statistics for resource measurements across all tracked timepoints as compared to an individual sub-tracked code block.
- Returns:
Summary statistics across all timepoints.
- Return type:
DataFrame
- sub_tracking_results() SubTrackingResults [source]
Generates a detailed report including summary statistics for the overall resource usage across all timepoints as well as that of each code block that was sub-tracked.
- Returns:
A data object containing the overall summary statistics, summary statistics for each code block, the static data, etc.
- Return type:
- class gpu_tracker.sub_tracker.TrackingComparison(file_path_map: dict[str, str])[source]
Compares multiple tracking sessions to determine differences in computational resource usage by loading sub-tracking results given their file paths. Sub-tracking results files must be in pickle format e.g. calling the
SubTrackingAnalyzer.compare
method and storing the returnedSubTrackingResults
in a pickle file. If code block results are not included in the sub-tracking files (i.e. no code blocks were sub-tracked), then only overall results are compared. Code blocks are compared by their name. If their name only differentiates by line number (i.e. their name is of the form <file-path:line-number>), then it’s assumed that the same order of the code blocks is used even if the line numbers are different. This is useful to determine how resource usage changes based on differences in implementation, input data, etc.- Variables:
results_map (dict[str, SubTrackingResults]) – Mapping of the name of each tracking session to the
SubTrackingResults
of the corresponding tracking sessions. Can be used for a user-defined custom comparison.- Parameters:
file_path_map (dict[str, str]) – Mapping of the name of each tracking session to the path of the pickle file containing the
SubTrackingResults
of the corresponding tracking sessions. Used to construct theresults_map
attribute.- Raises:
ValueError – Raised if the code block results of each tracking session don’t match.
- compare(statistic: str = 'mean') ComparisonResults [source]
Performs the comparison between tracking sessions, comparing both the code block results and the overall results. :param statistic: The summary statistic of the measurements to compare. One of ‘min’, ‘max’, ‘mean’, or ‘std’. :return: The results of the comparison including the overall resource usage, the resource usage of the code blocks, and the compute time of the code blocks for each tracking session.
- Parameters:
statistic (str)
- Return type:
- class gpu_tracker.sub_tracker.CodeBlockResults(name: str, num_timepoints: int, num_calls: int, num_non_empty_calls: int, compute_time: Series, resource_usage: DataFrame)[source]
Results of a particular code block that was sub-tracked.
- Parameters:
name (str) – The name of the code block.
num_timepoints (int) – The number of timepoints tracked across all calls to the code block.
num_calls (int) – The number times the code block was called / executed.
num_non_empty_calls (int) – The number code block calls with at least one timepoint tracked within the start / stop time.
compute_time (Series) – Compute time measurements for the code block including the total time spent running this code block, the average time between the start / stop time, etc.
resource_usage (DataFrame) – Summary statistics for the resource usage during the times the code block was called i.e. in between all its start / stop times
- compute_time: Series
- resource_usage: DataFrame
- class gpu_tracker.sub_tracker.SubTrackingResults(overall: DataFrame, static_data: Series, code_block_results: list[CodeBlockResults])[source]
Comprehensive results for a tracking session including resource usage measurements for individual code blocks.
- Parameters:
overall (DataFrame) – The overall summary statistics across all timepoints tracked.
static_data (Series) – The static data measured during a tracking session.
code_block_results (list[CodeBlockResults]) – Results for individual code blocks including summary statistics for the timepoints within each code block.
- overall: DataFrame
- static_data: Series
- code_block_results: list[CodeBlockResults]
- class gpu_tracker.sub_tracker.ComparisonResults(overall_resource_usage: dict[str, Series], code_block_resource_usage: dict[str, dict[str, Series]], code_block_compute_time: dict[str, Series])[source]
Contains the comparison of the measurements of multiple tracking sessions provided by the
TrackingComparison
class’scompare
method.- Parameters:
overall_resource_usage (dict[str, Series]) – For each measurement, compares the resource usage across tracking sessions.
code_block_resource_usage (dict[str, dict[str, Series]]) – For each measurement and for each code block, compares the resource usage of the code block across tracking sessions.
code_block_compute_time (dict[str, Series]) – For each code block, compares the compute time of the code block across tracking sessions.