Tutorial

API

Tracking

Basics

The gpu_tracker package provides the Tracker class which uses a subprocess to measure computational resource usage, namely the compute time, maximum CPU utilization, mean CPU utilization, maximum RAM used, maximum GPU utilization, mean GPU utilization, and maximum GPU RAM used. It supports both NVIDIA and AMD GPUs. The start() method starts this process which tracks usage in the background. The Tracker class can be used as a context manager. Upon entering the context, one can write the code for which resource usage is measured. The compute time will be the time from entering the context to exiting the context and the RAM, GPU RAM, CPU utilization, and GPU utilization quantities will be the respective computational resources used by the code that’s within the context.

import gpu_tracker as gput
from example_module import example_function
with gput.Tracker(n_expected_cores=1, sleep_time=0.1) as tracker:
    example_function()

The Tracker class implements the __str__ method so it can be printed as a string with the values and units of each computational resource formatted.

print(tracker)
Max RAM:
   Unit: gigabytes
   System capacity: 67.254
   System: 4.307
   Main:
      Total RSS: 0.924
      Private RSS: 0.755
      Shared RSS: 0.171
   Descendants:
      Total RSS: 0.0
      Private RSS: 0.0
      Shared RSS: 0.0
   Combined:
      Total RSS: 0.924
      Private RSS: 0.755
      Shared RSS: 0.171
Max GPU RAM:
   Unit: gigabytes
   System capacity: 16.376
   System: 0.535
   Main: 0.314
   Descendants: 0.0
   Combined: 0.314
CPU utilization:
   System core count: 12
   Number of expected cores: 1
   System:
      Max sum percent: 222.6
      Max hardware percent: 18.55
      Mean sum percent: 149.285
      Mean hardware percent: 12.44
   Main:
      Max sum percent: 103.3
      Max hardware percent: 103.3
      Mean sum percent: 94.285
      Mean hardware percent: 94.285
   Descendants:
      Max sum percent: 0.0
      Max hardware percent: 0.0
      Mean sum percent: 0.0
      Mean hardware percent: 0.0
   Combined:
      Max sum percent: 103.3
      Max hardware percent: 103.3
      Mean sum percent: 94.285
      Mean hardware percent: 94.285
   Main number of threads: 15
   Descendants number of threads: 0
   Combined number of threads: 15
GPU utilization:
   System GPU count: 1
   Number of expected GPUs: 1
   GPU percentages:
      Max sum percent: 5.0
      Max hardware percent: 5.0
      Mean sum percent: 0.385
      Mean hardware percent: 0.385
Compute time:
   Unit: hours
   Time: 0.001

The output is organized by computational resource followed by information specific to that resource. The system capacity is a constant for the total RAM capacity across the entire operating system. There is a system capacity field both for RAM and GPU RAM. This is not to be confused with the system field, which measures the maximum RAM / GPU RAM (operating system wide) that was actually used over the duration of the computational-resource tracking. Both the RAM and GPU RAM have 3 additional fields, namely the usage of the main process itself followed by the summed usage of any descendant processes it may have (i.e. child processes, grandchild processes, etc.), and combined usage which is the sum of the main and its descendant processes. RAM is divided further to include the private RSS (RAM usage unique to the process), shared RSS (RAM that’s shared by a process and at least one other process), and total RSS (the sum of private and shared RSS). The private and shared RSS values are only available on Linux distributions. So for non-linux operating systems, the private and shared RSS will remain 0 and only the total RSS will be reported. Theoretically, the combined total RSS would never exceed the overall system RAM usage, but inaccuracies resulting from shared RSS can cause this to happen, especially for non-linux operating systems (see note below).

The Tracker assumes that GPU memory is not shared across multiple processes and if it is, the reported GPU RAM of “descendant” and “combined” may be an overestimation.

The CPU utilization includes the system core count field which is the total number of cores available system-wide. Utilization is measured for the main process, its descendants, the main process and its descendants combined, and CPU utilization across the entire system. The sum percent is the sum of the percentages of all the cores being used. The hardware percent is that divided by the expected number of cores being used i.e. the optional n_expected_cores parameter (defaults to the number of cores in the entire system) for the main, descendants, and combined measurements. For the system measurements, hardware percent is divided by the total number of cores in the system regardless of the value of n_expected_cores. The max percent is the highest percentage detected through the duration of tracking while the mean percent is the average of all the percentages detected over that duration. The CPU utilization concludes with the maximum number of threads used at any time for the main process and the sum of the threads used across its descendant processes and combined.

The GPU utilization is similar to the CPU utilization but rather than being based on utilization of processes, it can only measure the utilization percentages of the GPUs themselves, regardless of what processes are using them. To ameliorate this limitation, the optional gpu_uuids parameter can be set to specify which GPUs to measure utilization for (defaults to all the GPUs in the system). The system GPU count is the total number of GPUs in the system. The sum percent is the sum of all the percentages of these GPUs and the hardware percent is that divided by the expected number of GPUs being used (i.e. len(gpu_uuids)). Likewise with CPU utilization, the max and mean of both the sum and hardware percentages are provided.

The compute time is the real time that the computational-resource tracking lasted (as compared to CPU time).

NOTE The keywords “descendants” and “combined” in the output above indicate a sum of the RSS used by multiple processes. It’s important to keep in mind that on non-linux operating systems, this sum does not take into account shared memory but rather adds up the total RSS of all processes, which can lead to an overestimation. For Linux distributions, however, pieces of shared memory are only counted once.

The Tracker can alternatively be used by explicitly calling its start() and stop() methods which behave the same as entering and exiting the context manager respectively.

tracker = gput.Tracker()
tracker.start()
example_function()
tracker.stop()

Arguments and Attributes

The units of the computational resources can be modified as desired. The following example measures the RAM in megabytes, the GPU RAM in megabytes, and the compute time in seconds.

with gput.Tracker(ram_unit='megabytes', gpu_ram_unit='megabytes', time_unit='seconds', sleep_time=0.1) as tracker:
    example_function()
print(tracker)
Max RAM:
   Unit: megabytes
   System capacity: 67254.166
   System: 1984.791
   Main:
      Total RSS: 873.853
      Private RSS: 638.353
      Shared RSS: 235.68
   Descendants:
      Total RSS: 0.0
      Private RSS: 0.0
      Shared RSS: 0.0
   Combined:
      Total RSS: 873.853
      Private RSS: 638.353
      Shared RSS: 235.68
Max GPU RAM:
   Unit: megabytes
   System capacity: 16376.0
   System: 728.0
   Main: 506.0
   Descendants: 0.0
   Combined: 506.0
CPU utilization:
   System core count: 12
   Number of expected cores: 12
   System:
      Max sum percent: 161.6
      Max hardware percent: 13.467
      Mean sum percent: 145.517
      Mean hardware percent: 12.126
   Main:
      Max sum percent: 101.5
      Max hardware percent: 8.458
      Mean sum percent: 98.683
      Mean hardware percent: 8.224
   Descendants:
      Max sum percent: 0.0
      Max hardware percent: 0.0
      Mean sum percent: 0.0
      Mean hardware percent: 0.0
   Combined:
      Max sum percent: 101.5
      Max hardware percent: 8.458
      Mean sum percent: 98.683
      Mean hardware percent: 8.224
   Main number of threads: 15
   Descendants number of threads: 0
   Combined number of threads: 15
GPU utilization:
   System GPU count: 1
   Number of expected GPUs: 1
   GPU percentages:
      Max sum percent: 3.0
      Max hardware percent: 3.0
      Mean sum percent: 0.25
      Mean hardware percent: 0.25
Compute time:
   Unit: seconds
   Time: 2.729

The same information as the text format can be provided as a dictionary via the to_json() method of the Tracker.

import json
print(json.dumps(tracker.to_json(), indent=1))
{
 "max_ram": {
  "unit": "megabytes",
  "system_capacity": 67254.165504,
  "system": 1984.790528,
  "main": {
   "total_rss": 873.8529279999999,
   "private_rss": 638.353408,
   "shared_rss": 235.679744
  },
  "descendants": {
   "total_rss": 0.0,
   "private_rss": 0.0,
   "shared_rss": 0.0
  },
  "combined": {
   "total_rss": 873.8529279999999,
   "private_rss": 638.353408,
   "shared_rss": 235.679744
  }
 },
 "max_gpu_ram": {
  "unit": "megabytes",
  "system_capacity": 16376.0,
  "system": 728.0,
  "main": 506.0,
  "descendants": 0.0,
  "combined": 506.0
 },
 "cpu_utilization": {
  "system_core_count": 12,
  "n_expected_cores": 12,
  "system": {
   "max_sum_percent": 161.60000000000002,
   "max_hardware_percent": 13.466666666666669,
   "mean_sum_percent": 145.51666666666668,
   "mean_hardware_percent": 12.12638888888889
  },
  "main": {
   "max_sum_percent": 101.5,
   "max_hardware_percent": 8.458333333333334,
   "mean_sum_percent": 98.68333333333334,
   "mean_hardware_percent": 8.22361111111111
  },
  "descendants": {
   "max_sum_percent": 0.0,
   "max_hardware_percent": 0.0,
   "mean_sum_percent": 0.0,
   "mean_hardware_percent": 0.0
  },
  "combined": {
   "max_sum_percent": 101.5,
   "max_hardware_percent": 8.458333333333334,
   "mean_sum_percent": 98.68333333333334,
   "mean_hardware_percent": 8.22361111111111
  },
  "main_n_threads": 15,
  "descendants_n_threads": 0,
  "combined_n_threads": 15
 },
 "gpu_utilization": {
  "system_gpu_count": 1,
  "n_expected_gpus": 1,
  "gpu_percentages": {
   "max_sum_percent": 3.0,
   "max_hardware_percent": 3.0,
   "mean_sum_percent": 0.25,
   "mean_hardware_percent": 0.25
  }
 },
 "compute_time": {
  "unit": "seconds",
  "time": 2.728560209274292
 }
}

Using Python data classes, the Tracker class additionally has a resource_usage attribute containing fields that provide the usage information for each individual computational resource.

tracker.resource_usage.max_ram
MaxRAM(unit='megabytes', system_capacity=67254.165504, system=1984.790528, main=RSSValues(total_rss=873.8529279999999, private_rss=638.353408, shared_rss=235.679744), descendants=RSSValues(total_rss=0.0, private_rss=0.0, shared_rss=0.0), combined=RSSValues(total_rss=873.8529279999999, private_rss=638.353408, shared_rss=235.679744))
tracker.resource_usage.max_ram.unit
'megabytes'
tracker.resource_usage.max_ram.main
RSSValues(total_rss=873.8529279999999, private_rss=638.353408, shared_rss=235.679744)
tracker.resource_usage.max_ram.main.total_rss
873.8529279999999
tracker.resource_usage.max_gpu_ram
MaxGPURAM(unit='megabytes', system_capacity=16376.0, system=728.0, main=506.0, descendants=0.0, combined=506.0)
tracker.resource_usage.compute_time
ComputeTime(unit='seconds', time=2.728560209274292)

Below is an example of using a child process. Notice the descendants fields are now non-zero.

import multiprocessing as mp
ctx = mp.get_context(method='spawn')
child_process = ctx.Process(target=example_function)
with gput.Tracker(n_expected_cores=2, sleep_time=0.4) as tracker:
    child_process.start()
    example_function()
    child_process.join()
child_process.close()
print(tracker)
Max RAM:
   Unit: gigabytes
   System capacity: 67.254
   System: 2.388
   Main:
      Total RSS: 0.849
      Private RSS: 0.528
      Shared RSS: 0.325
   Descendants:
      Total RSS: 0.845
      Private RSS: 0.734
      Shared RSS: 0.112
   Combined:
      Total RSS: 1.371
      Private RSS: 1.05
      Shared RSS: 0.325
Max GPU RAM:
   Unit: gigabytes
   System capacity: 16.376
   System: 1.236
   Main: 0.506
   Descendants: 0.506
   Combined: 1.012
CPU utilization:
   System core count: 12
   Number of expected cores: 2
   System:
      Max sum percent: 338.0
      Max hardware percent: 28.167
      Mean sum percent: 183.644
      Mean hardware percent: 15.304
   Main:
      Max sum percent: 101.0
      Max hardware percent: 50.5
      Mean sum percent: 60.178
      Mean hardware percent: 30.089
   Descendants:
      Max sum percent: 354.1
      Max hardware percent: 177.05
      Mean sum percent: 109.033
      Mean hardware percent: 54.517
   Combined:
      Max sum percent: 452.2
      Max hardware percent: 226.1
      Mean sum percent: 169.211
      Mean hardware percent: 84.606
   Main number of threads: 15
   Descendants number of threads: 13
   Combined number of threads: 28
GPU utilization:
   System GPU count: 1
   Number of expected GPUs: 1
   GPU percentages:
      Max sum percent: 5.0
      Max hardware percent: 5.0
      Mean sum percent: 0.556
      Mean hardware percent: 0.556
Compute time:
   Unit: hours
   Time: 0.001

Sometimes the code can fail. In order to collect the resource usage up to the point of failure, use a try/except block like so:

try:
    with gput.Tracker() as tracker:
        example_function()
        raise RuntimeError('AN ERROR')
except Exception as error:
    print(f'The following error occured while tracking: {error}')
finally:
    print(tracker.resource_usage.max_gpu_ram.main)
The following error occured while tracking: AN ERROR
0.506

If you do not catch the error in your code or if tracking otherwise is interrupted (e.g. you are debugging your code and you stop partway), the resource_usage attribute will not be set and that information will not be able to be obtained in memory. In such a case, the resource_usage attribute will be stored in a hidden pickle file in the working directory with a randomly generated name. Its file path can be optionally overriden with the resource_usage_file parameter.

tracker = gput.Tracker(resource_usage_file='path/to/my-file.pkl')

While the Tracker class automatically detects which brand of GPU is installed (either NVIDIA or AMD), one can explicitly choose the GPU brand with the gpu_brand parameter

tracker = gput.Tracker(gpu_brand='nvidia')

While the Tracker by default stores aggregates of the computational resource usage across the timepoints, one can store the individual measured values at every timepoint in a file, either CSV or SQLite format, using the tracking_file parameter. NOTE for the CSV format, the static data (e.g. RAM system capacity, number of cores in the OS, etc.) is stored on the the first two rows with the headers on the first row followed by the static data on the second row. The headers of the timepoint data is on the third row followed by the timepoint data on the remaining rows. The SQLite file, however, stores the static data and timepoint data in different tables: “data” and “static_data” respectively.

tracker = gput.Tracker(tracking_file='my-file.csv')
tracker = gput.Tracker(tracking_file='my-file.sqlite')

Sub-tracking

Logging Code Block Timestamps

While the Tracker object by itself can track a block of code, there are some cases where one might want to track one code block and a smaller code block within it or track multiple code blocks at a time without creating several tracking processes simultaneously, especially when tracking a code block that is called within multi-processing or a code block that is called several times. Similarly, one might want to track the resource usage of a particular function whenever it is called. Whether a function or some other specified code block, the SubTracker class can determine the computational resources used during the start times and stop times of a given code block. This includes the mean resources used during the times the code block is called, the mean time taken to complete the code block each time it is called, the number of times it is called, etc. Sub-tracking uses the tracking file specified by the tracking_file parameter of the Tracker object alonside a sub-tracking file which contains the start and stop times of each code block one desires to sub-track. The sub-tracking file can be created in Python using the SubTracker class, a context manager around the desired code block. Setting the overwrite parameter (default False) of the Tracker and SubTracker to True overwrites the tracking_file or sub_tracking_file respectively if a file of that path already exists. Keep this paramter at False to avoid loss of data if it is still needed.

tracker = gput.Tracker(sleep_time=0.5, tracking_file='tracking.csv', overwrite=False)
tracker.start()
# Perform other computation here
for _ in range(5):
    with gput.SubTracker(code_block_name='my-code-block', sub_tracking_file='sub-tracking.csv', overwrite=False):
        example_function()
# Perform other computation here

In the above example, a tracking session is initiated within the context of the Tracker object whose tracking file is ‘tracking.csv’. Then we have a for loop wherein a function is called 5 times. Other computation might be performed before or after this for loop, but if the computational resource usage of the contents of the for loop is of interest in particular, that code block can be sub-tracked by wrapping it within the context of the SubTracker object whose sub-tracking file is ‘sub-tracking.csv’. Alternatively, SQLite (.sqlite) files can be used to speed up querying in the case of very long tracking sessions. The name of the code block is ‘my-code-block’, given to distinguish it from other code blocks being sub-tracked.

If one wants to sub-track all calls to a particular function, the sub_track function decorator can be used instead of wrapping the function call with a SubTracker context every time it is called:

@gput.sub_track(code_block_name='my-function', sub_tracking_file='sub-tracking.csv', overwrite=False)
def my_function(*args, **kwargs):
    example_function()

for _ in range(3):
    my_function()
tracker.stop()

When sub-tracking a code block using the SubTracker context, the default code_block_name is the relative path of the Python file followed by a colon followed by the line number where the SubTracker context is initialized. When sub-tracking a function, the default code_block_name is the relative path of the Python file followed by a colon followed by the name of the function.

Analysis

Once a tracking file and at least one sub-tracking file have been created, the results can be analyzed using the SubTrackingAnalyzer class, instantiated by passing in the path to the tracking file and the path to the sub-tracking file.

analyzer = gput.SubTrackingAnalyzer(tracking_file='tracking.csv', sub_tracking_file='sub-tracking.csv')

When sub-tracking a code block within a function that’s part of multi-processing (i.e. called within one of multiple sub-processes), the sub-tracking file must be unique to that process, which is why the default sub_tracking_file is the process ID followed by “.csv”. One way or another, a different sub-tracking file must be created per worker to prevent multiple processes from logging to the same file. The SubTrackingAnalyzer has a combine_sub_tracking_files method that can combine these multiple sub-tracking files into a single sub-tracking file whose path is specified by the sub_tracking_file parameter above. Once a sub-tracking file is created from a single process or combined from multiple, the results can be obtained via the sub_tracking_results method.

results = analyzer.sub_tracking_results()
type(results)
gpu_tracker.sub_tracker.SubTrackingResults

The sub_tracking_results method returns a SubTrackingResults object which contains summary statistics of the overall resource usage (all time points in the tracking file) and the per code block resource usage (the timepoints within calls to a code block i.e. the start/stop times) as DataFrame or Series objects from the pandas package.

results.overall
min max mean std
main_ram 0.341217 0.920560 0.861921 0.100084
descendants_ram 0.000000 0.000000 0.000000 0.000000
combined_ram 0.341217 0.920560 0.861921 0.100084
system_ram 4.602618 5.701517 5.281926 0.220270
main_gpu_ram 0.000000 0.506000 0.448364 0.151267
descendants_gpu_ram 0.000000 0.000000 0.000000 0.000000
combined_gpu_ram 0.000000 0.506000 0.448364 0.151267
system_gpu_ram 0.215000 0.727000 0.668909 0.152657
gpu_sum_utilization_percent 0.000000 0.000000 0.000000 0.000000
gpu_hardware_utilization_percent 0.000000 0.000000 0.000000 0.000000
main_n_threads 12.000000 15.000000 14.757576 0.791766
descendants_n_threads 0.000000 0.000000 0.000000 0.000000
combined_n_threads 12.000000 15.000000 14.757576 0.791766
cpu_system_sum_utilization_percent 15.400000 138.400000 121.918182 19.484617
cpu_system_hardware_utilization_percent 1.283333 11.533333 10.159848 1.623718
cpu_main_sum_utilization_percent 91.400000 103.300000 99.060606 2.571228
cpu_main_hardware_utilization_percent 7.616667 8.608333 8.255051 0.214269
cpu_descendants_sum_utilization_percent 0.000000 0.000000 0.000000 0.000000
cpu_descendants_hardware_utilization_percent 0.000000 0.000000 0.000000 0.000000
cpu_combined_sum_utilization_percent 91.400000 103.300000 99.060606 2.571228
cpu_combined_hardware_utilization_percent 7.616667 8.608333 8.255051 0.214269

The SubTrackingResults class additionally contains the static data i.e. the information that remains constant throughout the tracking session.

results.static_data
ram_unit                   gigabytes
gpu_ram_unit               gigabytes
time_unit                      hours
ram_system_capacity        67.254166
gpu_ram_system_capacity       16.376
system_core_count                 12
n_expected_cores                  12
system_gpu_count                   1
n_expected_gpus                    1
Name: 0, dtype: object

The code_block_results attribute of the SubTrackingResults class is a list of CodeBlockResults objects, containing the resource usage and compute time summary statistics. In this case, there are two CodeBlockResults objects in the list since there were two code blocks sub-tracked in this tracking session.

[my_code_block_results, my_function_results] = results.code_block_results
type(my_code_block_results)
gpu_tracker.sub_tracker.CodeBlockResults

The compute_time attribute of the CodeBlockResults class contains summary statistics for the time spent on the code block, where total is the total amount of time spent within the code block during the tracking session, mean is the average time taken on each call to the code block, etc. The resource_usage attribute provides summary statistics for the computational resources used during calls to the code block i.e. within the start/stop times.

my_code_block_results.compute_time
min       2.630907
max       2.869182
mean      2.685580
std       0.102789
total    13.427902
dtype: float64
my_code_block_results.resource_usage
min max mean std
main_ram 0.341217 0.912278 0.846999 0.122948
descendants_ram 0.000000 0.000000 0.000000 0.000000
combined_ram 0.341217 0.912278 0.846999 0.122948
system_ram 4.602618 5.261357 5.170665 0.147118
main_gpu_ram 0.000000 0.506000 0.415429 0.182971
descendants_gpu_ram 0.000000 0.000000 0.000000 0.000000
combined_gpu_ram 0.000000 0.506000 0.415429 0.182971
system_gpu_ram 0.215000 0.727000 0.635714 0.184676
gpu_sum_utilization_percent 0.000000 0.000000 0.000000 0.000000
gpu_hardware_utilization_percent 0.000000 0.000000 0.000000 0.000000
main_n_threads 12.000000 15.000000 14.619048 0.973457
descendants_n_threads 0.000000 0.000000 0.000000 0.000000
combined_n_threads 12.000000 15.000000 14.619048 0.973457
cpu_system_sum_utilization_percent 15.400000 138.400000 120.142857 24.347907
cpu_system_hardware_utilization_percent 1.283333 11.533333 10.011905 2.028992
cpu_main_sum_utilization_percent 91.400000 103.300000 98.652381 2.733243
cpu_main_hardware_utilization_percent 7.616667 8.608333 8.221032 0.227770
cpu_descendants_sum_utilization_percent 0.000000 0.000000 0.000000 0.000000
cpu_descendants_hardware_utilization_percent 0.000000 0.000000 0.000000 0.000000
cpu_combined_sum_utilization_percent 91.400000 103.300000 98.652381 2.733243
cpu_combined_hardware_utilization_percent 7.616667 8.608333 8.221032 0.227770

Additionally, the CodeBlockResults class also has attributes for the name of the code block, the number of times it was called during the tracking session, the number of calls that included at least one timepoint, and the total number of timepoints measured within all calls to the code block.

my_code_block_results.name, my_code_block_results.num_calls, my_code_block_results.num_non_empty_calls, my_code_block_results.num_timepoints
('my-code-block', 5, 5, 21)

The analysis results can also be printed in their entirety. Alternatively, the to_json method can provide this comprehensive information in JSON format.

print(results)
Overall:
                                                        min         max        mean        std
    main_ram                                       0.341860    0.944374    0.856037   0.125014
    descendants_ram                                0.000000    0.000000    0.000000   0.000000
    combined_ram                                   0.341860    0.944374    0.856037   0.125014
    system_ram                                     4.859711    5.553644    5.253445   0.134081
    main_gpu_ram                                   0.000000    0.506000    0.429920   0.170432
    descendants_gpu_ram                            0.000000    0.000000    0.000000   0.000000
    combined_gpu_ram                               0.000000    0.506000    0.429920   0.170432
    system_gpu_ram                                 0.215000    0.727000    0.650320   0.172010
    gpu_sum_utilization_percent                    0.000000    3.000000    0.120000   0.600000
    gpu_hardware_utilization_percent               0.000000    3.000000    0.120000   0.600000
    main_n_threads                                12.000000   15.000000   14.720000   0.842615
    descendants_n_threads                          0.000000    0.000000    0.000000   0.000000
    combined_n_threads                            12.000000   15.000000   14.720000   0.842615
    cpu_system_sum_utilization_percent            11.900000  133.400000  119.212000  22.741909
    cpu_system_hardware_utilization_percent        0.991667   11.116667    9.934333   1.895159
    cpu_main_sum_utilization_percent              78.000000  103.200000   96.924000   6.390767
    cpu_main_hardware_utilization_percent          6.500000    8.600000    8.077000   0.532564
    cpu_descendants_sum_utilization_percent        0.000000    0.000000    0.000000   0.000000
    cpu_descendants_hardware_utilization_percent   0.000000    0.000000    0.000000   0.000000
    cpu_combined_sum_utilization_percent          78.000000  103.200000   96.924000   6.390767
    cpu_combined_hardware_utilization_percent      6.500000    8.600000    8.077000   0.532564
Static Data:
       ram_unit gpu_ram_unit time_unit ram_system_capacity gpu_ram_system_capacity system_core_count n_expected_cores system_gpu_count n_expected_gpus
      gigabytes    gigabytes     hours           67.254166                  16.376                12               12                1               1
Code Block Results:
    Name:                my-code-block
    Num Timepoints:      12
    Num Calls:           3
    Num Non Empty Calls: 3
    Compute Time:
                   min       max      mean       std     total
              2.580433  2.789909  2.651185  0.120147  7.953554
    Resource Usage:
                                                                min         max        mean        std
            main_ram                                       0.341860    0.936559    0.808736   0.167663
            descendants_ram                                0.000000    0.000000    0.000000   0.000000
            combined_ram                                   0.341860    0.936559    0.808736   0.167663
            system_ram                                     4.859711    5.553644    5.231854   0.191567
            main_gpu_ram                                   0.000000    0.506000    0.363500   0.225892
            descendants_gpu_ram                            0.000000    0.000000    0.000000   0.000000
            combined_gpu_ram                               0.000000    0.506000    0.363500   0.225892
            system_gpu_ram                                 0.215000    0.727000    0.583250   0.228088
            gpu_sum_utilization_percent                    0.000000    0.000000    0.000000   0.000000
            gpu_hardware_utilization_percent               0.000000    0.000000    0.000000   0.000000
            main_n_threads                                12.000000   15.000000   14.416667   1.164500
            descendants_n_threads                          0.000000    0.000000    0.000000   0.000000
            combined_n_threads                            12.000000   15.000000   14.416667   1.164500
            cpu_system_sum_utilization_percent            11.900000  130.800000  113.641667  32.352363
            cpu_system_hardware_utilization_percent        0.991667   10.900000    9.470139   2.696030
            cpu_main_sum_utilization_percent              79.600000  103.100000   96.583333   6.726587
            cpu_main_hardware_utilization_percent          6.633333    8.591667    8.048611   0.560549
            cpu_descendants_sum_utilization_percent        0.000000    0.000000    0.000000   0.000000
            cpu_descendants_hardware_utilization_percent   0.000000    0.000000    0.000000   0.000000
            cpu_combined_sum_utilization_percent          79.600000  103.100000   96.583333   6.726587
            cpu_combined_hardware_utilization_percent      6.633333    8.591667    8.048611   0.560549

    Name:                my-function
    Num Timepoints:      12
    Num Calls:           3
    Num Non Empty Calls: 3
    Compute Time:
                   min       max      mean       std     total
              2.538011  2.577679  2.553176  0.021419  7.659528
    Resource Usage:
                                                                 min         max        mean       std
            main_ram                                        0.864592    0.944374    0.896998  0.034505
            descendants_ram                                 0.000000    0.000000    0.000000  0.000000
            combined_ram                                    0.864592    0.944374    0.896998  0.034505
            system_ram                                      5.203415    5.315219    5.271566  0.038751
            main_gpu_ram                                    0.314000    0.506000    0.490000  0.055426
            descendants_gpu_ram                             0.000000    0.000000    0.000000  0.000000
            combined_gpu_ram                                0.314000    0.506000    0.490000  0.055426
            system_gpu_ram                                  0.535000    0.727000    0.711000  0.055426
            gpu_sum_utilization_percent                     0.000000    3.000000    0.250000  0.866025
            gpu_hardware_utilization_percent                0.000000    3.000000    0.250000  0.866025
            main_n_threads                                 15.000000   15.000000   15.000000  0.000000
            descendants_n_threads                           0.000000    0.000000    0.000000  0.000000
            combined_n_threads                             15.000000   15.000000   15.000000  0.000000
            cpu_system_sum_utilization_percent            120.300000  133.400000  124.566667  4.001439
            cpu_system_hardware_utilization_percent        10.025000   11.116667   10.380556  0.333453
            cpu_main_sum_utilization_percent               94.700000  103.200000   98.841667  2.677332
            cpu_main_hardware_utilization_percent           7.891667    8.600000    8.236806  0.223111
            cpu_descendants_sum_utilization_percent         0.000000    0.000000    0.000000  0.000000
            cpu_descendants_hardware_utilization_percent    0.000000    0.000000    0.000000  0.000000
            cpu_combined_sum_utilization_percent           94.700000  103.200000   98.841667  2.677332
            cpu_combined_hardware_utilization_percent       7.891667    8.600000    8.236806  0.223111

Comparison

The TrackingComparison class allows for comparing the resource usage of multiple tracking sessions, both the overall usage of the sessions and any code blocks that were sub-tracked. This is helpful if one wants to see how changes to the process might impact the computational efficiency of it, such as changes to implementation, input data, etc. To do this, the TrackingComparison takes a mapping of the given name of a tracking session to the file path where a SubTrackingResults object is stored in pickle format. Say we had two tracking sessions and we wanted to compare them. First, we store the results of the first tracking session in a pickle file. If we’d like to re-use the same names for the tracking_file and sub_tracking_file in the second tracking session, we can safely set the overwrite argument to True since their data has been saved in ‘results.pkl’.

import pickle as pkl
import os

with open('results.pkl', 'wb') as file:
    pkl.dump(results, file)

Once we have the results of the first tracking session saved, we can start a new tracking session in another run of the program that we are profiling. Say we made some code changes and we want to compare the two implementations, we can populate a new tracking_file and sub_tracking_file with data from the new tracking session.

import gpu_tracker as gput
from example_module import example_function
import pickle as pkl

@gput.sub_track(code_block_name='my-function', sub_tracking_file='sub-tracking.csv', overwrite=True)
def my_function(*args, **kwargs):
    example_function()

with gput.Tracker(sleep_time=0.5, tracking_file='tracking.csv', overwrite=True):
    for _ in range(3):
        with gput.SubTracker(code_block_name='my-code-block', sub_tracking_file='sub-tracking.csv', overwrite=True):
            example_function()
        my_function()
results2 = gput.SubTrackingAnalyzer(tracking_file='tracking.csv', sub_tracking_file='sub-tracking.csv').sub_tracking_results()
with open('results2.pkl', 'wb') as file:
    pkl.dump(results2, file)

The first tracking session stored its results in ‘results.pkl’ while the second tracking session stored its results in ‘results2.pkl’. Say we decided to call the first session ‘A’ and the second session ‘B’. The TrackingComparison object would be initialized like so:

comparison = gput.TrackingComparison(file_path_map={'A': 'results.pkl', 'B': 'results2.pkl'})

Once the TrackingComparison is created, its compare method generates the ComparisonResults object detailing the computational resource usage measured in one tracking session to that of the other tracking sessions. The statistic parameter determines which summary statistic of the measurements to compare, defaulting to ‘mean’. In this example, we will compare the maximum measurements by setting statistic to ‘max’.

results = comparison.compare(statistic='max')
type(results)
gpu_tracker.sub_tracker.ComparisonResults

The overall_resource_usage attribute of the ComparisonResults class is a dictionary mapping each measurement to a Series comparing that measurement across all timepoints in one tracking session to another.

results.overall_resource_usage.keys()
dict_keys(['main_ram', 'descendants_ram', 'combined_ram', 'system_ram', 'main_gpu_ram', 'descendants_gpu_ram', 'combined_gpu_ram', 'system_gpu_ram', 'gpu_sum_utilization_percent', 'gpu_hardware_utilization_percent', 'main_n_threads', 'descendants_n_threads', 'combined_n_threads', 'cpu_system_sum_utilization_percent', 'cpu_system_hardware_utilization_percent', 'cpu_main_sum_utilization_percent', 'cpu_main_hardware_utilization_percent', 'cpu_descendants_sum_utilization_percent', 'cpu_descendants_hardware_utilization_percent', 'cpu_combined_sum_utilization_percent', 'cpu_combined_hardware_utilization_percent'])

For example, we can compare the overall maximum ‘main_ram’ of tracking session ‘A’ to tracking session ‘B’.

results.overall_resource_usage['main_ram']
A    0.920560
B    0.944374
dtype: float64

The code_block_resource_usage attribute is a dictionary that compares the same resource usage but for each code block rather than overall.

results.code_block_resource_usage.keys()
dict_keys(['main_ram', 'descendants_ram', 'combined_ram', 'system_ram', 'main_gpu_ram', 'descendants_gpu_ram', 'combined_gpu_ram', 'system_gpu_ram', 'gpu_sum_utilization_percent', 'gpu_hardware_utilization_percent', 'main_n_threads', 'descendants_n_threads', 'combined_n_threads', 'cpu_system_sum_utilization_percent', 'cpu_system_hardware_utilization_percent', 'cpu_main_sum_utilization_percent', 'cpu_main_hardware_utilization_percent', 'cpu_descendants_sum_utilization_percent', 'cpu_descendants_hardware_utilization_percent', 'cpu_combined_sum_utilization_percent', 'cpu_combined_hardware_utilization_percent'])

Each measurement is a dictionary mapping each code block name to the resources used across tracking sessions in that code block.

results.code_block_resource_usage['main_ram'].keys()
dict_keys(['my-code-block', 'my-function'])

For example, the maximum ‘main_ram’ used by ‘my-code-block’ in tracking session ‘A’ can be compared to that of tracking session ‘B’.

results.code_block_resource_usage['main_ram']['my-code-block']
A    0.912278
B    0.936559
dtype: float64

Finally the code_block_compute_time attribute is a dictionary that compares the compute time summary statistics for each code block and for each tracking session.

results.code_block_compute_time.keys()
dict_keys(['my-code-block', 'my-function'])

For example, we can compare the maximum compute time of ‘my-code-block’ in tracking session ‘A’ to that of tracking session ‘B’.

results.code_block_compute_time['my-code-block']
B    2.789909
A    2.869182
dtype: float64

The comparison results can also be printed in their entirety. Alternatively, the to_json method can provide this comprehensive information in JSON format.

print(results)
Overall Resource Usage:
    Main Ram:
                    A         B
              0.92056  0.944374
    Descendants Ram:
                A    B
              0.0  0.0
    Combined Ram:
                    A         B
              0.92056  0.944374
    System Ram:
                     B         A
              5.553644  5.701517
    Main Gpu Ram:
                  A      B
              0.506  0.506
    Descendants Gpu Ram:
                A    B
              0.0  0.0
    Combined Gpu Ram:
                  A      B
              0.506  0.506
    System Gpu Ram:
                  A      B
              0.727  0.727
    Gpu Sum Utilization Percent:
                A    B
              0.0  3.0
    Gpu Hardware Utilization Percent:
                A    B
              0.0  3.0
    Main N Threads:
                 A     B
              15.0  15.0
    Descendants N Threads:
                A    B
              0.0  0.0
    Combined N Threads:
                 A     B
              15.0  15.0
    Cpu System Sum Utilization Percent:
                  B      A
              133.4  138.4
    Cpu System Hardware Utilization Percent:
                      B          A
              11.116667  11.533333
    Cpu Main Sum Utilization Percent:
                  B      A
              103.2  103.3
    Cpu Main Hardware Utilization Percent:
                B         A
              8.6  8.608333
    Cpu Descendants Sum Utilization Percent:
                A    B
              0.0  0.0
    Cpu Descendants Hardware Utilization Percent:
                A    B
              0.0  0.0
    Cpu Combined Sum Utilization Percent:
                  B      A
              103.2  103.3
    Cpu Combined Hardware Utilization Percent:
                B         A
              8.6  8.608333
Code Block Resource Usage:
    Main Ram:
            my-code-block:
                             A         B
                      0.912278  0.936559
            my-function:
                            A         B
                      0.92056  0.944374
    Descendants Ram:
            my-code-block:
                        A    B
                      0.0  0.0
            my-function:
                        A    B
                      0.0  0.0
    Combined Ram:
            my-code-block:
                             A         B
                      0.912278  0.936559
            my-function:
                            A         B
                      0.92056  0.944374
    System Ram:
            my-code-block:
                             A         B
                      5.261357  5.553644
            my-function:
                             B         A
                      5.315219  5.701517
    Main Gpu Ram:
            my-code-block:
                          A      B
                      0.506  0.506
            my-function:
                          A      B
                      0.506  0.506
    Descendants Gpu Ram:
            my-code-block:
                        A    B
                      0.0  0.0
            my-function:
                        A    B
                      0.0  0.0
    Combined Gpu Ram:
            my-code-block:
                          A      B
                      0.506  0.506
            my-function:
                          A      B
                      0.506  0.506
    System Gpu Ram:
            my-code-block:
                          A      B
                      0.727  0.727
            my-function:
                          A      B
                      0.727  0.727
    Gpu Sum Utilization Percent:
            my-code-block:
                        A    B
                      0.0  0.0
            my-function:
                        A    B
                      0.0  3.0
    Gpu Hardware Utilization Percent:
            my-code-block:
                        A    B
                      0.0  0.0
            my-function:
                        A    B
                      0.0  3.0
    Main N Threads:
            my-code-block:
                         A     B
                      15.0  15.0
            my-function:
                         A     B
                      15.0  15.0
    Descendants N Threads:
            my-code-block:
                        A    B
                      0.0  0.0
            my-function:
                        A    B
                      0.0  0.0
    Combined N Threads:
            my-code-block:
                         A     B
                      15.0  15.0
            my-function:
                         A     B
                      15.0  15.0
    Cpu System Sum Utilization Percent:
            my-code-block:
                          B      A
                      130.8  138.4
            my-function:
                          A      B
                      131.1  133.4
    Cpu System Hardware Utilization Percent:
            my-code-block:
                         B          A
                      10.9  11.533333
            my-function:
                           A          B
                      10.925  11.116667
    Cpu Main Sum Utilization Percent:
            my-code-block:
                          B      A
                      103.1  103.3
            my-function:
                          A      B
                      102.1  103.2
    Cpu Main Hardware Utilization Percent:
            my-code-block:
                             B         A
                      8.591667  8.608333
            my-function:
                             A    B
                      8.508333  8.6
    Cpu Descendants Sum Utilization Percent:
            my-code-block:
                        A    B
                      0.0  0.0
            my-function:
                        A    B
                      0.0  0.0
    Cpu Descendants Hardware Utilization Percent:
            my-code-block:
                        A    B
                      0.0  0.0
            my-function:
                        A    B
                      0.0  0.0
    Cpu Combined Sum Utilization Percent:
            my-code-block:
                          B      A
                      103.1  103.3
            my-function:
                          A      B
                      102.1  103.2
    Cpu Combined Hardware Utilization Percent:
            my-code-block:
                             B         A
                      8.591667  8.608333
            my-function:
                             A    B
                      8.508333  8.6
Code Block Compute Time:
    my-code-block:
                     B         A
              2.789909  2.869182
    my-function:
                     A         B
              2.570437  2.577679

CLI

Tracking

Basics

The gpu-tracker package also comes with a commandline interface that can track the computational-resource-usage of any shell command, not just Python code. Entering gpu-tracker -h in a shell will show the help message.

$ gpu-tracker -h
Tracks the computational resource usage (RAM, GPU RAM, CPU utilization, GPU utilization, and compute time) of a process corresponding to a given shell command.

Usage:
    gpu-tracker -h | --help
    gpu-tracker -v | --version
    gpu-tracker --execute=<command> [--output=<output>] [--format=<format>] [--tconfig=<config-file>] [--st=<sleep-time>] [--ru=<ram-unit>] [--gru=<gpu-ram-unit>] [--tu=<time-unit>] [--nec=<num-cores>] [--guuids=<gpu-uuids>] [--disable-logs] [--gb=<gpu-brand>] [--tf=<tracking-file>] [--overwrite]
    gpu-tracker sub-track combine --stf=<sub-track-file> [-p <file-path>]...
    gpu-tracker sub-track analyze --tf=<tracking-file> --stf=<sub-track-file> [--output=<output>] [--format=<format>]
    gpu-tracker sub-track compare [--output=<output>] [--format=<format>] [--cconfig=<config-file>] [-m <name>=<file-path>...] [--stat=<statistic>]

Options:
    -h --help               Show this help message and exit.
    -v --version            Show package version and exit.
    -e --execute=<command>  The command to run along with its arguments all within quotes e.g. "ls -l -a".
    -o --output=<output>    File path to store the computational-resource-usage measurements in the case of tracking or the analysis report in the case of sub-tracking. If not set, prints to the screen.
    -f --format=<format>    File format of the output. Either 'json', 'text', or 'pickle'. Defaults to 'text'.
    --tconfig=<config-file> JSON config file containing the key word arguments to the ``Tracker`` class (see API) to be optionally used instead of the corresponding commandline options. If any commandline options are set, they will override the corresponding arguments provided by the config file.
    --st=<sleep-time>       The number of seconds to sleep in between usage-collection iterations.
    --ru=<ram-unit>         One of 'bytes', 'kilobytes', 'megabytes', 'gigabytes', or 'terabytes'.
    --gru=<gpu-ram-unit>    One of 'bytes', 'kilobytes', 'megabytes', 'gigabytes', or 'terabytes'.
    --tu=<time-unit>        One of 'seconds', 'minutes', 'hours', or 'days'.
    --nec=<num-cores>       The number of cores expected to be used. Defaults to the number of cores in the entire operating system.
    --guuids=<gpu-uuids>    Comma separated list of the UUIDs of the GPUs for which to track utilization e.g. gpu-uuid1,gpu-uuid2,etc. Defaults to all the GPUs in the system.
    --disable-logs          If set, warnings are suppressed during tracking. Otherwise, the Tracker logs warnings as usual.
    --gb=<gpu-brand>        The brand of GPU to profile. Valid values are nvidia and amd. Defaults to the brand of GPU detected in the system, checking NVIDIA first.
    --tf=<tracking-file>    If specified, stores the individual resource usage measurements at each iteration. Valid file formats are CSV (.csv) and SQLite (.sqlite) where the SQLite file format stores the data in a table called "data" and allows for more efficient querying.
    --overwrite             Whether to overwrite the tracking file if it already existed before the beginning of this tracking session. Do not set if the data in the existing tracking file is still needed.
    sub-track               Perform sub-tracking related commands.
    combine                 Combines multiple sub-tracking files into one. This is usually a result of sub-tracking a code block that is called in multiple simultaneous processes.
    --stf=<sub-track-file>  The path to the sub-tracking file used to specify the timestamps of specific code-blocks. If not generated by the gpu-tracker API, must be either a CSV or SQLite file (where the SQLite file contains a table called "data") where the headers are precisely process_id, code_block_name, position, and timestamp. The process_id is the ID of the process where the code block is called. code_block_name is the name of the code block. position is whether it is the start or the stopping point of the code block where 0 represents start and 1 represents stop. And timestamp is the timestamp where the code block starts or where it stops.
    -p <file-path>          Paths to the sub-tracking files to combine. Must all be the same file format and the same file format as the resulting sub-tracking file (either .csv or .sqlite). If only one path is provided, it is interpreted as a path to a directory and all the files in this directory are combined.
    analyze                 Generate the sub-tracking analysis report using the tracking file and sub-tracking file for resource usage of specific code blocks.
    compare                 Compares multiple tracking sessions to determine differences in computational resource usage by loading sub-tracking results given their file paths. Sub-tracking results files must be in pickle format e.g. running the ``sub-track analyze`` command and specifying a file path for ``--output`` and 'pickle' for the ``--format`` option. If code block results are not included in the sub-tracking files (i.e. no code blocks were sub-tracked), then only overall results are compared.
    --cconfig=<config-file> JSON config file containing the ``file_path_map`` argument for the ``TrackerComparison`` class and ``statistic`` argument for its ``compare`` method (see API) that can be used instead of the corresponding ``-m <name>=<path>`` and ``--stat=<statistic>`` commandline options respectively. If additional ``-m <name>=<path>`` options are added on the commandline in addition to a config file, they will be added to the ``file_path_map`` in the config file. If a ``--stat`` option is provided on the commandline, it will override the ``statistic`` in the config file.
    -m <name>=<file-path>   Mapping of tracking session names to the path of the file containing the sub-tracking results of said tracking session. Must be in pickle format.
    --stat=<statistic>      The summary statistic of the measurements to compare. One of 'min', 'max', 'mean', or 'std'. Defaults to 'mean'.

The -e or --execute is a required option where the desired shell command is provided, with both the command and its proceeding arguments surrounded by quotes. Below is an example of running the bash command with an argument of example-script.sh. When the command completes, its status code is reported.

$ gpu-tracker -e "bash example-script.sh" --st=0.3
Resource tracking complete. Process completed with status code: 0
Max RAM:
   Unit: gigabytes
   System capacity: 67.254
   System: 5.61
   Main:
      Total RSS: 0.003
      Private RSS: 0.0
      Shared RSS: 0.003
   Descendants:
      Total RSS: 0.879
      Private RSS: 0.76
      Shared RSS: 0.119
   Combined:
      Total RSS: 0.881
      Private RSS: 0.761
      Shared RSS: 0.12
Max GPU RAM:
   Unit: gigabytes
   System capacity: 16.376
   System: 1.043
   Main: 0.0
   Descendants: 0.314
   Combined: 0.314
CPU utilization:
   System core count: 12
   Number of expected cores: 12
   System:
      Max sum percent: 324.8
      Max hardware percent: 27.067
      Mean sum percent: 152.109
      Mean hardware percent: 12.676
   Main:
      Max sum percent: 0.0
      Max hardware percent: 0.0
      Mean sum percent: 0.0
      Mean hardware percent: 0.0
   Descendants:
      Max sum percent: 201.8
      Max hardware percent: 16.817
      Mean sum percent: 102.245
      Mean hardware percent: 8.52
   Combined:
      Max sum percent: 201.8
      Max hardware percent: 16.817
      Mean sum percent: 102.245
      Mean hardware percent: 8.52
   Main number of threads: 1
   Descendants number of threads: 12
   Combined number of threads: 13
GPU utilization:
   System GPU count: 1
   Number of expected GPUs: 1
   GPU percentages:
      Max sum percent: 0.0
      Max hardware percent: 0.0
      Mean sum percent: 0.0
      Mean hardware percent: 0.0
Compute time:
   Unit: hours
   Time: 0.001

Notice that the RAM and GPU RAM usage primarily takes place in the descendant processes since, in this example, the bash command itself calls the commands relevant to resource usage.

Options

The units of the computational resources can be modified. For example, –tu stands for time-unit, –gru stands for gpu-ram-unit, and –ru stands for ram-unit.

$ gpu-tracker -e 'bash example-script.sh' --tu=seconds --gru=megabytes --ru=megabytes --st=0.2
Resource tracking complete. Process completed with status code: 0
Max RAM:
   Unit: megabytes
   System capacity: 67254.17
   System: 2420.457
   Main:
      Total RSS: 3.109
      Private RSS: 0.319
      Shared RSS: 2.789
   Descendants:
      Total RSS: 849.125
      Private RSS: 731.435
      Shared RSS: 118.125
   Combined:
      Total RSS: 850.338
      Private RSS: 731.754
      Shared RSS: 119.017
Max GPU RAM:
   Unit: megabytes
   System capacity: 16376.0
   System: 1235.0
   Main: 0.0
   Descendants: 506.0
   Combined: 506.0
CPU utilization:
   System core count: 12
   Number of expected cores: 12
   System:
      Max sum percent: 316.4
      Max hardware percent: 26.367
      Mean sum percent: 168.077
      Mean hardware percent: 14.006
   Main:
      Max sum percent: 0.0
      Max hardware percent: 0.0
      Mean sum percent: 0.0
      Mean hardware percent: 0.0
   Descendants:
      Max sum percent: 517.3
      Max hardware percent: 43.108
      Mean sum percent: 130.623
      Mean hardware percent: 10.885
   Combined:
      Max sum percent: 517.3
      Max hardware percent: 43.108
      Mean sum percent: 130.623
      Mean hardware percent: 10.885
   Main number of threads: 1
   Descendants number of threads: 12
   Combined number of threads: 13
GPU utilization:
   System GPU count: 1
   Number of expected GPUs: 1
   GPU percentages:
      Max sum percent: 5.0
      Max hardware percent: 5.0
      Mean sum percent: 0.462
      Mean hardware percent: 0.462
Compute time:
   Unit: seconds
   Time: 3.995

By default, the computational-resource-usage statistics are printed to the screen. The -o or --output option can be specified to store that same content in a file.

$ gpu-tracker -e 'bash example-script.sh' -o out.txt --st=0.2
Resource tracking complete. Process completed with status code: 0
$ cat out.txt
Max RAM:
   Unit: gigabytes
   System capacity: 67.254
   System: 2.43
   Main:
      Total RSS: 0.003
      Private RSS: 0.0
      Shared RSS: 0.003
   Descendants:
      Total RSS: 0.884
      Private RSS: 0.766
      Shared RSS: 0.118
   Combined:
      Total RSS: 0.885
      Private RSS: 0.766
      Shared RSS: 0.119
Max GPU RAM:
   Unit: gigabytes
   System capacity: 16.376
   System: 1.043
   Main: 0.0
   Descendants: 0.314
   Combined: 0.314
CPU utilization:
   System core count: 12
   Number of expected cores: 12
   System:
      Max sum percent: 405.0
      Max hardware percent: 33.75
      Mean sum percent: 165.357
      Mean hardware percent: 13.78
   Main:
      Max sum percent: 0.0
      Max hardware percent: 0.0
      Mean sum percent: 0.0
      Mean hardware percent: 0.0
   Descendants:
      Max sum percent: 573.7
      Max hardware percent: 47.808
      Mean sum percent: 124.871
      Mean hardware percent: 10.406
   Combined:
      Max sum percent: 573.7
      Max hardware percent: 47.808
      Mean sum percent: 124.871
      Mean hardware percent: 10.406
   Main number of threads: 1
   Descendants number of threads: 12
   Combined number of threads: 13
GPU utilization:
   System GPU count: 1
   Number of expected GPUs: 1
   GPU percentages:
      Max sum percent: 5.0
      Max hardware percent: 5.0
      Mean sum percent: 0.357
      Mean hardware percent: 0.357
Compute time:
   Unit: hours
   Time: 0.001

By default, the format of the output is “text”. The -f or --format option can specify the format to be “json” instead.

$ gpu-tracker -e 'bash example-script.sh' -f json --st=0.2
Resource tracking complete. Process completed with status code: 0
{
 "max_ram": {
  "unit": "gigabytes",
  "system_capacity": 67.2541696,
  "system": 2.5132195840000002,
  "main": {
   "total_rss": 0.00311296,
   "private_rss": 0.000323584,
   "shared_rss": 0.002789376
  },
  "descendants": {
   "total_rss": 0.8446238720000001,
   "private_rss": 0.7268597760000001,
   "shared_rss": 0.11776409600000001
  },
  "combined": {
   "total_rss": 0.8458403840000001,
   "private_rss": 0.7271833600000001,
   "shared_rss": 0.11865702400000001
  }
 },
 "max_gpu_ram": {
  "unit": "gigabytes",
  "system_capacity": 16.376,
  "system": 1.235,
  "main": 0.0,
  "descendants": 0.506,
  "combined": 0.506
 },
 "cpu_utilization": {
  "system_core_count": 12,
  "n_expected_cores": 12,
  "system": {
   "max_sum_percent": 316.3,
   "max_hardware_percent": 26.358333333333334,
   "mean_sum_percent": 167.90769230769232,
   "mean_hardware_percent": 13.992307692307692
  },
  "main": {
   "max_sum_percent": 0.0,
   "max_hardware_percent": 0.0,
   "mean_sum_percent": 0.0,
   "mean_hardware_percent": 0.0
  },
  "descendants": {
   "max_sum_percent": 527.1,
   "max_hardware_percent": 43.925000000000004,
   "mean_sum_percent": 130.81538461538463,
   "mean_hardware_percent": 10.90128205128205
  },
  "combined": {
   "max_sum_percent": 527.1,
   "max_hardware_percent": 43.925000000000004,
   "mean_sum_percent": 130.81538461538463,
   "mean_hardware_percent": 10.90128205128205
  },
  "main_n_threads": 1,
  "descendants_n_threads": 12,
  "combined_n_threads": 13
 },
 "gpu_utilization": {
  "system_gpu_count": 1,
  "n_expected_gpus": 1,
  "gpu_percentages": {
   "max_sum_percent": 5.0,
   "max_hardware_percent": 5.0,
   "mean_sum_percent": 0.38461538461538464,
   "mean_hardware_percent": 0.38461538461538464
  }
 },
 "compute_time": {
  "unit": "hours",
  "time": 0.0010899075534608628
 }
}
$ gpu-tracker -e 'bash example-script.sh' -f json -o out.json --st=0.3
Resource tracking complete. Process completed with status code: 0
$ cat out.json
{
 "max_ram": {
  "unit": "gigabytes",
  "system_capacity": 67.2541696,
  "system": 2.325712896,
  "main": {
   "total_rss": 0.0031088640000000002,
   "private_rss": 0.00031948800000000004,
   "shared_rss": 0.002789376
  },
  "descendants": {
   "total_rss": 0.822874112,
   "private_rss": 0.705110016,
   "shared_rss": 0.11776409600000001
  },
  "combined": {
   "total_rss": 0.824086528,
   "private_rss": 0.705429504,
   "shared_rss": 0.11865702400000001
  }
 },
 "max_gpu_ram": {
  "unit": "gigabytes",
  "system_capacity": 16.376,
  "system": 1.235,
  "main": 0.0,
  "descendants": 0.392,
  "combined": 0.392
 },
 "cpu_utilization": {
  "system_core_count": 12,
  "n_expected_cores": 12,
  "system": {
   "max_sum_percent": 332.1,
   "max_hardware_percent": 27.675,
   "mean_sum_percent": 166.07,
   "mean_hardware_percent": 13.839166666666666
  },
  "main": {
   "max_sum_percent": 0.0,
   "max_hardware_percent": 0.0,
   "mean_sum_percent": 0.0,
   "mean_hardware_percent": 0.0
  },
  "descendants": {
   "max_sum_percent": 104.1,
   "max_hardware_percent": 8.674999999999999,
   "mean_sum_percent": 99.77000000000001,
   "mean_hardware_percent": 8.314166666666665
  },
  "combined": {
   "max_sum_percent": 104.1,
   "max_hardware_percent": 8.674999999999999,
   "mean_sum_percent": 99.77000000000001,
   "mean_hardware_percent": 8.314166666666665
  },
  "main_n_threads": 1,
  "descendants_n_threads": 12,
  "combined_n_threads": 13
 },
 "gpu_utilization": {
  "system_gpu_count": 1,
  "n_expected_gpus": 1,
  "gpu_percentages": {
   "max_sum_percent": 5.0,
   "max_hardware_percent": 5.0,
   "mean_sum_percent": 0.5,
   "mean_hardware_percent": 0.5
  }
 },
 "compute_time": {
  "unit": "hours",
  "time": 0.0010636144214206272
 }
}

Alternative to typing out the tracking configuration via commandline options, one can specify a config JSON file via the --tconfig option.

$ cat config.json
{
  "sleep_time": 0.5,
  "ram_unit": "megabytes",
  "gpu_ram_unit": "megabytes",
  "time_unit": "seconds"
}
$ gpu-tracker -e 'bash example-script.sh' --tconfig=config.json
Resource tracking complete. Process completed with status code: 0
Max RAM:
   Unit: megabytes
   System capacity: 67254.166
   System: 4511.437
   Main:
      Total RSS: 2.957
      Private RSS: 0.319
      Shared RSS: 2.638
   Descendants:
      Total RSS: 894.923
      Private RSS: 781.222
      Shared RSS: 113.701
   Combined:
      Total RSS: 896.135
      Private RSS: 781.541
      Shared RSS: 114.594
Max GPU RAM:
   Unit: megabytes
   System capacity: 16376.0
   System: 727.0
   Main: 0.0
   Descendants: 314.0
   Combined: 314.0
CPU utilization:
   System core count: 12
   Number of expected cores: 12
   System:
      Max sum percent: 259.3
      Max hardware percent: 21.608
      Mean sum percent: 160.9
      Mean hardware percent: 13.408
   Main:
      Max sum percent: 0.0
      Max hardware percent: 0.0
      Mean sum percent: 0.0
      Mean hardware percent: 0.0
   Descendants:
      Max sum percent: 102.8
      Max hardware percent: 8.567
      Mean sum percent: 96.529
      Mean hardware percent: 8.044
   Combined:
      Max sum percent: 102.8
      Max hardware percent: 8.567
      Mean sum percent: 96.529
      Mean hardware percent: 8.044
   Main number of threads: 1
   Descendants number of threads: 12
   Combined number of threads: 13
GPU utilization:
   System GPU count: 1
   Number of expected GPUs: 1
   GPU percentages:
      Max sum percent: 0.0
      Max hardware percent: 0.0
      Mean sum percent: 0.0
      Mean hardware percent: 0.0
Compute time:
   Unit: seconds
   Time: 3.913

Sub-tracking

Basics

The sub-track subcommand introduces functionality related to sub-tracking i.e. analyzing computational resource usage for individual code blocks rather than the entire process. This requires a tracking file and a sub-tracking file. The tracking file can be created by specifying the --tf option when profiling a process using --execute. The sub-tracking file can be created using the gpu-tracker API i.e. the SubTracker class. If the process being profiled is not a python script, the sub-tracking file can be generated in any programming language as long as it follows the following format:

It is either a CSV or SQLite file where the headers are process_id,code_block_name,position,timestamp. The process_id column is the ID (integer) of the process where the code block was called. The code_block_name is the given name (string) of the code block to distinguish it from other code blocks being sub-tracked. The position is an integer of either the value 0 or 1 where 0 indicates the start of the code block and 1 indicates the stopping point of the code block. Finally timestamp (float) is the timestamp when the code block either starts (where position is 0) or when it stops (where position is 1). Both a start timestamp and stop timestamp must be logged for every call to the code block of interest. If using an SQLite file for more efficient querying of longer tracking sessions, the name of the table must be ‘data’.

If sub-tracking a code block that is called in multiple processes, the sub-tracking files of that code block must be unique to each process. For convenience, the sub-track combine subcommand allows for combining these into a single sub-tracking file that can be used for downstream analysis. This example combines ‘sub-tracking1.csv’ and ‘sub-tracking2.csv’ into a single sub-tracking file of the name ‘combined-file.csv’. Alternatively, if the -p option is only used once, rather than being interpretted as list of files, it is instead interpretted as the path to a directory containing the sub-tracking files to combine.

$ gpu-tracker sub-track combine --stf=combined-file.csv -p sub-tracking1.csv -p sub-tracking2.csv

Analysis

Once a tracking and sub-tracking file is available, the sub-track analyze subcommand can generate the sub-tracking results. These can be stored in JSON, text, or pickle format where the pickle format is the same as the SubTrackingResults object from the API. If the --output option is specified, the content can be stored in the given file path. By default, the content prints to the screen and it is in text format by default.

$ gpu-tracker sub-track analyze --tf=tracking.csv --stf=sub-tracking.csv
Overall:
                                                        min         max        mean        std
    main_ram                                       0.341860    0.944374    0.856037   0.125014
    descendants_ram                                0.000000    0.000000    0.000000   0.000000
    combined_ram                                   0.341860    0.944374    0.856037   0.125014
    system_ram                                     4.859711    5.553644    5.253445   0.134081
    main_gpu_ram                                   0.000000    0.506000    0.429920   0.170432
    descendants_gpu_ram                            0.000000    0.000000    0.000000   0.000000
    combined_gpu_ram                               0.000000    0.506000    0.429920   0.170432
    system_gpu_ram                                 0.215000    0.727000    0.650320   0.172010
    gpu_sum_utilization_percent                    0.000000    3.000000    0.120000   0.600000
    gpu_hardware_utilization_percent               0.000000    3.000000    0.120000   0.600000
    main_n_threads                                12.000000   15.000000   14.720000   0.842615
    descendants_n_threads                          0.000000    0.000000    0.000000   0.000000
    combined_n_threads                            12.000000   15.000000   14.720000   0.842615
    cpu_system_sum_utilization_percent            11.900000  133.400000  119.212000  22.741909
    cpu_system_hardware_utilization_percent        0.991667   11.116667    9.934333   1.895159
    cpu_main_sum_utilization_percent              78.000000  103.200000   96.924000   6.390767
    cpu_main_hardware_utilization_percent          6.500000    8.600000    8.077000   0.532564
    cpu_descendants_sum_utilization_percent        0.000000    0.000000    0.000000   0.000000
    cpu_descendants_hardware_utilization_percent   0.000000    0.000000    0.000000   0.000000
    cpu_combined_sum_utilization_percent          78.000000  103.200000   96.924000   6.390767
    cpu_combined_hardware_utilization_percent      6.500000    8.600000    8.077000   0.532564
Static Data:
       ram_unit gpu_ram_unit time_unit ram_system_capacity gpu_ram_system_capacity system_core_count n_expected_cores system_gpu_count n_expected_gpus
      gigabytes    gigabytes     hours           67.254166                  16.376                12               12                1               1
Code Block Results:
    Name:                my-code-block
    Num Timepoints:      12
    Num Calls:           3
    Num Non Empty Calls: 3
    Compute Time:
                   min       max      mean       std     total
              2.580433  2.789909  2.651185  0.120147  7.953554
    Resource Usage:
                                                                min         max        mean        std
            main_ram                                       0.341860    0.936559    0.808736   0.167663
            descendants_ram                                0.000000    0.000000    0.000000   0.000000
            combined_ram                                   0.341860    0.936559    0.808736   0.167663
            system_ram                                     4.859711    5.553644    5.231854   0.191567
            main_gpu_ram                                   0.000000    0.506000    0.363500   0.225892
            descendants_gpu_ram                            0.000000    0.000000    0.000000   0.000000
            combined_gpu_ram                               0.000000    0.506000    0.363500   0.225892
            system_gpu_ram                                 0.215000    0.727000    0.583250   0.228088
            gpu_sum_utilization_percent                    0.000000    0.000000    0.000000   0.000000
            gpu_hardware_utilization_percent               0.000000    0.000000    0.000000   0.000000
            main_n_threads                                12.000000   15.000000   14.416667   1.164500
            descendants_n_threads                          0.000000    0.000000    0.000000   0.000000
            combined_n_threads                            12.000000   15.000000   14.416667   1.164500
            cpu_system_sum_utilization_percent            11.900000  130.800000  113.641667  32.352363
            cpu_system_hardware_utilization_percent        0.991667   10.900000    9.470139   2.696030
            cpu_main_sum_utilization_percent              79.600000  103.100000   96.583333   6.726587
            cpu_main_hardware_utilization_percent          6.633333    8.591667    8.048611   0.560549
            cpu_descendants_sum_utilization_percent        0.000000    0.000000    0.000000   0.000000
            cpu_descendants_hardware_utilization_percent   0.000000    0.000000    0.000000   0.000000
            cpu_combined_sum_utilization_percent          79.600000  103.100000   96.583333   6.726587
            cpu_combined_hardware_utilization_percent      6.633333    8.591667    8.048611   0.560549

    Name:                my-function
    Num Timepoints:      12
    Num Calls:           3
    Num Non Empty Calls: 3
    Compute Time:
                   min       max      mean       std     total
              2.538011  2.577679  2.553176  0.021419  7.659528
    Resource Usage:
                                                                 min         max        mean       std
            main_ram                                        0.864592    0.944374    0.896998  0.034505
            descendants_ram                                 0.000000    0.000000    0.000000  0.000000
            combined_ram                                    0.864592    0.944374    0.896998  0.034505
            system_ram                                      5.203415    5.315219    5.271566  0.038751
            main_gpu_ram                                    0.314000    0.506000    0.490000  0.055426
            descendants_gpu_ram                             0.000000    0.000000    0.000000  0.000000
            combined_gpu_ram                                0.314000    0.506000    0.490000  0.055426
            system_gpu_ram                                  0.535000    0.727000    0.711000  0.055426
            gpu_sum_utilization_percent                     0.000000    3.000000    0.250000  0.866025
            gpu_hardware_utilization_percent                0.000000    3.000000    0.250000  0.866025
            main_n_threads                                 15.000000   15.000000   15.000000  0.000000
            descendants_n_threads                           0.000000    0.000000    0.000000  0.000000
            combined_n_threads                             15.000000   15.000000   15.000000  0.000000
            cpu_system_sum_utilization_percent            120.300000  133.400000  124.566667  4.001439
            cpu_system_hardware_utilization_percent        10.025000   11.116667   10.380556  0.333453
            cpu_main_sum_utilization_percent               94.700000  103.200000   98.841667  2.677332
            cpu_main_hardware_utilization_percent           7.891667    8.600000    8.236806  0.223111
            cpu_descendants_sum_utilization_percent         0.000000    0.000000    0.000000  0.000000
            cpu_descendants_hardware_utilization_percent    0.000000    0.000000    0.000000  0.000000
            cpu_combined_sum_utilization_percent           94.700000  103.200000   98.841667  2.677332
            cpu_combined_hardware_utilization_percent       7.891667    8.600000    8.236806  0.223111

The overall resource usage of the tracking session is provided as well as its static data. This is followed by the compute time and resource usage of each code block.

Comparison

Storing the results of the sub-tracking analysis in a pickle file allows for one tracking session to be compared to another.
$ gpu-tracker sub-track analyze --tf=tracking.csv --stf=sub-tracking.csv --format=pickle --output=my-results.pkl

The sub-track compare subcommand compares the computational resource usage of multiple tracking sessions. This is useful when you want to determine how a change can impact the computational efficiency of your process, whether it be different input data, an alternative implementation, etc. The -m option creates a mapping from the given name of a tracking session to the file path where its sub-tracking results are stored in pickle format. Say you wanted to call one tracking session ‘A’ and then the second tracking session ‘B’ where the results of tracking session ‘A’ are stored in ‘results.pkl’ and that of session ‘B’ are in ‘results2.pkl’.

$ gpu-tracker sub-track compare -m A=results.pkl -m B=results2.pkl
Overall Resource Usage:
    Main Ram:
                     B         A
              0.856037  0.861921
    Descendants Ram:
                A    B
              0.0  0.0
    Combined Ram:
                     B         A
              0.856037  0.861921
    System Ram:
                     B         A
              5.253445  5.281926
    Main Gpu Ram:
                    B         A
              0.42992  0.448364
    Descendants Gpu Ram:
                A    B
              0.0  0.0
    Combined Gpu Ram:
                    B         A
              0.42992  0.448364
    System Gpu Ram:
                    B         A
              0.65032  0.668909
    Gpu Sum Utilization Percent:
                A     B
              0.0  0.12
    Gpu Hardware Utilization Percent:
                A     B
              0.0  0.12
    Main N Threads:
                  B          A
              14.72  14.757576
    Descendants N Threads:
                A    B
              0.0  0.0
    Combined N Threads:
                  B          A
              14.72  14.757576
    Cpu System Sum Utilization Percent:
                    B           A
              119.212  121.918182
    Cpu System Hardware Utilization Percent:
                     B          A
              9.934333  10.159848
    Cpu Main Sum Utilization Percent:
                   B          A
              96.924  99.060606
    Cpu Main Hardware Utilization Percent:
                  B         A
              8.077  8.255051
    Cpu Descendants Sum Utilization Percent:
                A    B
              0.0  0.0
    Cpu Descendants Hardware Utilization Percent:
                A    B
              0.0  0.0
    Cpu Combined Sum Utilization Percent:
                   B          A
              96.924  99.060606
    Cpu Combined Hardware Utilization Percent:
                  B         A
              8.077  8.255051
Code Block Resource Usage:
    Main Ram:
my-code-block:
                             B         A
                      0.808736  0.846999
my-function:
                             A         B
                      0.888034  0.896998
    Descendants Ram:
my-code-block:
                        A    B
                      0.0  0.0
my-function:
                        A    B
                      0.0  0.0
    Combined Ram:
my-code-block:
                             B         A
                      0.808736  0.846999
my-function:
                             A         B
                      0.888034  0.896998
    System Ram:
my-code-block:
                             A         B
                      5.170665  5.231854
my-function:
                             B         A
                      5.271566  5.476632
    Main Gpu Ram:
my-code-block:
                           B         A
                      0.3635  0.415429
my-function:
                         B      A
                      0.49  0.506
    Descendants Gpu Ram:
my-code-block:
                        A    B
                      0.0  0.0
my-function:
                        A    B
                      0.0  0.0
    Combined Gpu Ram:
my-code-block:
                           B         A
                      0.3635  0.415429
my-function:
                         B      A
                      0.49  0.506
    System Gpu Ram:
my-code-block:
                            B         A
                      0.58325  0.635714
my-function:
                          B      A
                      0.711  0.727
    Gpu Sum Utilization Percent:
my-code-block:
                        A    B
                      0.0  0.0
my-function:
                        A     B
                      0.0  0.25
    Gpu Hardware Utilization Percent:
my-code-block:
                        A    B
                      0.0  0.0
my-function:
                        A     B
                      0.0  0.25
    Main N Threads:
my-code-block:
                              B          A
                      14.416667  14.619048
my-function:
                         A     B
                      15.0  15.0
    Descendants N Threads:
my-code-block:
                        A    B
                      0.0  0.0
my-function:
                        A    B
                      0.0  0.0
    Combined N Threads:
my-code-block:
                              B          A
                      14.416667  14.619048
my-function:
                         A     B
                      15.0  15.0
    Cpu System Sum Utilization Percent:
my-code-block:
                               B           A
                      113.641667  120.142857
my-function:
                               B        A
                      124.566667  125.025
    Cpu System Hardware Utilization Percent:
my-code-block:
                             B          A
                      9.470139  10.011905
my-function:
                              B         A
                      10.380556  10.41875
    Cpu Main Sum Utilization Percent:
my-code-block:
                              B          A
                      96.583333  98.652381
my-function:
                              B       A
                      98.841667  99.775
    Cpu Main Hardware Utilization Percent:
my-code-block:
                             B         A
                      8.048611  8.221032
my-function:
                             B         A
                      8.236806  8.314583
    Cpu Descendants Sum Utilization Percent:
my-code-block:
                        A    B
                      0.0  0.0
my-function:
                        A    B
                      0.0  0.0
    Cpu Descendants Hardware Utilization Percent:
my-code-block:
                        A    B
                      0.0  0.0
my-function:
                        A    B
                      0.0  0.0
    Cpu Combined Sum Utilization Percent:
my-code-block:
                              B          A
                      96.583333  98.652381
my-function:
                              B       A
                      98.841667  99.775
    Cpu Combined Hardware Utilization Percent:
my-code-block:
                             B         A
                      8.048611  8.221032
my-function:
                             B         A
                      8.236806  8.314583
Code Block Compute Time:
my-code-block:
                     B        A
              2.651185  2.68558
my-function:
                     B         A
              2.553176  2.559218

Both the overall usage is compared and per code block. The default format is text and the default output is printing to the console. The --format and --output options can be configured similarly to those in the sub-track analyze subcommand. By default, the ‘mean’ of measurements is compared. Alternatively, the --stat option can be set to ‘min’, ‘max’, or ‘std’ to compare a different summary statistic.