CLI

Note: Many KEGG entry IDs contain colons and kegg_pull saves KEGG entry files with their ID in the file name. When running on Windows, all file names with colons will have their colons replaced with underscores.

kegg_pull Commandline Interface

Top-level commandline interface.

Usage:
    kegg_pull -h | --help           Show this help message.
    kegg_pull -v | --version        Displays the package version.
    kegg_pull --full-help           Show the help message of all sub commands.
    kegg_pull pull ...              Pull, separate, and store an arbitrary number of KEGG entries to the local file system.
    kegg_pull entry-ids ...         Obtain a list of KEGG entry IDs.
    kegg_pull map ...               Obtain a mapping of entry IDs (KEGG or outside databases) to the IDs of related entries.
    kegg_pull pathway-organizer ... Creates a flattened version of a pathways Brite hierarchy.
    kegg_pull rest ...              Executes one of the KEGG REST API operations.

Pulling, Parsing, and Saving KEGG Entries

Provides commandline functionality for pulling KEGG entries from the KEGG REST API, parsing them, and saving the entries as files.

A JSON file, called pull-results.json, is saved, describing the results of the pull. Below is the interpretation of each of the fields:

percent-success: The percentage of the requested entries that were successfully pulled and saved in a file.

pull-minutes: The number of minutes that the pull took to complete.

num-successful: The number of entries that were successfully pulled and saved in a file.

num-failed: The number of entries that failed to be pulled.

num-timed-out: The number of entries that timed out when requested.

num-total: The number of total entry IDs requested.

successful-entry-ids: The list of successful entry IDs.

failed-entry-ids: The list of failed entry IDs.

timed-out-entry-ids: The list of timed out entry IDs.

If the --unsuccessful-threshold option is set and surpassed, an aborted-pull-results.json file is instead output with the following fields:

num-remaining-entry-ids: The number of requested entries remaining after the process aborted. The process aborted before kegg_pull could even try to pull these entries.

num-successful: The number of entries that were successfully pulled before the process aborted.

num-failed: The number of entries that failed by the time the process aborted.

num-timed-out: The number of entries that timed out by the time the process aborted.

remaining-entry-ids: The IDs of the remaining entries.

successful-entry-ids: The IDs of the successful entries.

failed-entry-ids: The IDs of the failed entries.

timed-out-entry-ids: The IDs of the timed out entries.

Usage:
    kegg_pull pull -h | --help
    kegg_pull pull database <database> [--force-single-entry] [--multi-process] [--n-workers=<n-workers>] [--output=<output>] [--print] [--sep=<print-separator>] [--entry-field=<entry-field>] [--n-tries=<n-tries>] [--time-out=<time-out>] [--sleep-time=<sleep-time>] [--ut=<unsuccessful-threshold>]
    kegg_pull pull entry-ids <entry-ids> [--force-single-entry] [--multi-process] [--n-workers=<n-workers>] [--output=<output>] [--print] [--sep=<print-separator>] [--entry-field=<entry-field>] [--n-tries=<n-tries>] [--time-out=<time-out>] [--sleep-time=<sleep-time>] [--ut=<unsuccessful-threshold>]

Options:
    -h --help                       Show this help message.
    database                        Pulls all the entries in a KEGG database.
    <database>                      The KEGG database from which to pull entries.
    --force-single-entry            Forces pulling only one entry at a time for every request to the KEGG web API. This flag is automatically set if <database> is "brite".
    --multi-process                 If set, the entries are pulled across multiple processes to increase speed. Otherwise, the entries are pulled sequentially in a single process.
    --n-workers=<n-workers>         The number of sub-processes to create when pulling. Defaults to the number of cores available. Ignored if --multi-process is not set.
    --output=<output>               The directory where the pulled KEGG entries will be stored. Defaults to the current working directory. If ends in ".zip", entries are saved to a ZIP archive instead of a directory. Ignored if --print is set.
    --print                         If set, prints the entries to the screen rather than saving them to the file system. Separates entries by the --sep option if set.
    --sep=<print-separator>         The string that separates the entries which are printed to the screen when the --print option is set. Ignored if the --print option is not set. Defaults to printing the entry id, followed by the entry, followed by a newline.
    --entry-field=<entry-field>     Optional field to extract from the entries pulled rather than the standard flat file format (or "htext" in the case of brite entries).
    --n-tries=<n-tries>             The number of times to attempt a KEGG request before marking it as timed out or failed. Defaults to 3.
    --time-out=<time-out>           The number of seconds to wait for a KEGG request before marking it as timed out. Defaults to 60.
    --sleep-time=<sleep-time>       The amount of time to wait after a KEGG request times out (or potentially blacklists with a 403 error code) before attempting it again. Defaults to 5.0.
    --ut=<unsuccessful-threshold>   If set, the ratio of unsuccessful entry IDs (failed or timed out) to total entry IDs at which kegg_pull quits. Valid values are between 0.0 and 1.0 non-inclusive.
    entry-ids                       Pulls entries specified by a comma separated list. Or from standard input: one entry ID per line; Press CTRL+D to finalize input or pipe (e.g. cat file.txt | kegg_pull pull entry-ids - ...).
    <entry-ids>                     Comma separated list of entry IDs to pull (e.g. id1,id2,id3 etc.). Or if equal to "-", entry IDs are read from standard input. Will likely need to set --force-single-entry if any of the entries are from the brite database.

Pulling Lists of KEGG Entry IDs

Provides commandline functionality for pulling lists of KEGG entry IDs from the KEGG REST API.

Usage:
    kegg_pull entry-ids -h | --help
    kegg_pull entry-ids database <database> [--output=<output>]
    kegg_pull entry-ids keywords <database> <keywords> [--output=<output>]
    kegg_pull entry-ids molec-attr <database> (--formula=<formula>|--em=<exact-mass>...|--mw=<molecular-weight>...) [--output=<output>]

Options:
    -h --help               Show this help message.
    database                Pulls all the entry IDs within a given database.
    <database>              The KEGG database from which to pull a list of entry IDs.
    --output=<output>       Path to the file (either in a directory or ZIP archive) to store the output (1 entry ID per line). Prints to the console if not specified. If a ZIP archive, the file path must be in the form of /path/to/zip-archive.zip:/path/to/file (e.g. ./archive.zip:file.txt).
    keywords                Searches for entries within a database based on provided keywords.
    <keywords>              Comma separated list of keywords to search entries with (e.g. kw1,kw2,kw3 etc.). Or if equal to "-", keywords are read from standard input, one keyword per line; Press CTRL+D to finalize input or pipe (e.g. cat file.txt | kegg_pull rest find brite - ...).
    molec-attr              Searches a database of molecule-type KEGG entries by molecular attributes.
    --formula=<formula>     Sequence of atoms in a chemical formula format to search for (e.g. "O5C7" searches for molecule entries containing 5 oxygen atoms and/or 7 carbon atoms).
    --em=<exact-mass>       Either a single number (e.g. "--em=155.5") or two numbers (e.g. "--em=155.5 --em=244.4"). If a single number, searches for molecule entries with an exact mass equal to that value rounded by the last decimal point. If two numbers, searches for molecule entries with an exact mass within the two values (a range).
    --mw=<molecular-weight> Same as "--em=<exact-mass>" but searches based on the molecular weight.

Constructing Mappings From KEGG “link” And “conv” Operations

Provides commandline functionality for converting the output from the KEGG “link” or “conv” REST operations into mappings of the entry IDs from one database to the IDs of related entries.

Usage:
    kegg_pull map -h | --help
    kegg_pull map conv <kegg-database> <outside-database> [--reverse] [--output=<output>]
    kegg_pull map link <source-database> <target-database> [--deduplicate] [--add-glycans] [--add-drugs] [--output=<output>]
    kegg_pull map (link|conv) entry-ids <entry-ids> <target-database> [--reverse] [--output=<output>]
    kegg_pull map link <source-database> <intermediate-database> <target-database> [--deduplicate] [--add-glycans] [--add-drugs] [--output=<output>]

Options:
    -h --help               Show this help message.
    conv                    Converts the output of the KEGG "conv" operation into a JSON mapping.
    <kegg-database>         The name of the KEGG database with entry IDs mapped to the outside database.
    <outside-database>      The name of the outside database with entry IDs mapped from the KEGG database.
    --reverse               Reverses the mapping with the target becoming the source and the source becoming the target.
    --output=<output>       The location (either a directory or ZIP archive) of the JSON file to store the mapping. If not set, prints a JSON representation of the mapping to the console. If a ZIP archive, the file path must be in the form of /path/to/zip-archive.zip:/path/to/file (e.g. ./archive.zip:mapping.json).
    link                    Converts the output of the KEGG "link" operation into a JSON mapping.
    <source-database>       The name of the database with entry IDs mapped to the target database.
    <target-database>       The name of the database with entry IDs mapped from the source database.
    --deduplicate           Some mappings including pathway entry IDs result in half beginning with the normal "path:map" prefix but the other half with a different prefix. If set, removes the IDs corresponding to identical entries but with a different prefix. Raises an exception if neither the source nor the target database are "pathway".
    --add-glycans           Whether to add the corresponding compound IDs of equivalent glycan entries. Logs a warning if neither the source nor the target database are "compound".
    --add-drugs             Whether to add the corresponding compound IDs of equivalent drug entries. Logs a warning if neither the source nor the target database are "compound".
    entry-ids               Create a mapping to a target database from a list of specific entry IDs.
    <entry-ids>             Comma separated list of entry IDs (e.g. Id1,Id2,Id3 etc.). Or if equal to "-", entry IDs are read from standard input, one entry ID per line; Press CTRL+D to finalize input or pipe (e.g. cat file.txt | kegg_pull map entry-ids drug - ...).
    <intermediate-database> The name of an intermediate KEGG database with which to find cross-references to cross-references e.g. "kegg_pull map link ko reaction compound" creates a mapping from ko-to-compound via ko-to-reaction cross-references connected to reaction-to-compound cross-references.

Flattening A Pathways Brite Hierarchy

Provides commandline functionality for flattening a pathways Brite hierarchy (ID: ‘br:br08901’) into a collection of its nodes, mapping a node ID to information about it, enabling combinations with other KEGG data.

Usage:
    kegg_pull pathway-organizer [--tln=<top-level-nodes>] [--fn=<filter-nodes>] [--output=<output>]

Options:
    -h --help               Show this help message.
    --tln=<top-level-nodes> Node names in the highest level of the hierarchy to select from. If not set, all top level nodes are traversed to create the mapping of node key to node info. Either a comma separated list (e.g. node1,node2,node3 etc.) or if equal to "-", read from standard input one node per line; Press CTRL+D to finalize input or pipe (e.g. cat nodes.txt | kegg_pull pathway-organizer --tln=- ...). If both "--tln" and "--fn" are set as "-", one of the lines must be the delimiter "---" without quotes in order to distinguish the input, with the top level nodes first and filter nodes second.
    --fn=<filter-nodes>     Names (not keys) of nodes to exclude from the mapping of node key to node info. Neither these nodes nor any of their children will be included. If not set, no nodes will be excluded. Either a comma separated list (e.g. node1,node2,node3 etc.) or if equal to "-", read from standard input one node per line; Press CTRL+D to finalize input or pipe (e.g. cat nodes.txt | kegg_pull pathway-organizer --fn=- ...). If both "--tln" and "--fn" are set as "-", one of the lines must be the delimiter "---" without quotes in order to distinguish the input, with the top level nodes first and filter nodes second.
    --output=<output>       The file to store the flattened Brite hierarchy as a JSON structure with node keys mapping to node info, either a JSON file or ZIP archive. Prints to the console if not set. If saving to a ZIP archive, the file path must be in the form of /path/to/zip-archive.zip:/path/to/file (e.g. ./archive.zip:mapping.json).

KEGG REST API Operations

Provides commandline functionality for accessing the KEGG REST API including all its operations.

Usage:
    kegg_pull rest -h | --help
    kegg_pull rest info <database> [--test] [--output=<output>]
    kegg_pull rest list <database> [--test] [--output=<output>]
    kegg_pull rest get <entry-ids> [--entry-field=<entry-field>] [--test] [--output=<output>]
    kegg_pull rest find <database> <keywords> [--test] [--output=<output>]
    kegg_pull rest find <database> (--formula=<formula>|--em=<exact-mass>...|--mw=<molecular-weight>...) [--test] [--output=<output>]
    kegg_pull rest conv <kegg-database> <outside-database> [--test] [--output=<output>]
    kegg_pull rest conv entry-ids <entry-ids> <target-database> [--test] [--output=<output>]
    kegg_pull rest link <target-database> <source-database> [--test] [--output=<output>]
    kegg_pull rest link entry-ids <entry-ids> <target-database> [--test] [--output=<output>]
    kegg_pull rest ddi <drug-entry-ids> [--test] [--output=<output>]

Options:
    -h --help                   Show this help message.
    info                        Executes the "info" KEGG API operation, pulling information about a KEGG database.
    <database>                  The name of the database to pull information about or entry IDs from.
    --test                      If set, test the request to ensure it works rather than sending it. Print True if the request would succeed and False if the request would fail. Ignores --output if this options is set along with --test.
    --output=<output>           Path to the file (either in a directory or ZIP archive) to store the response body from the KEGG web API operation. Prints to the console if not specified. If a ZIP archive, the file path must be in the form of /path/to/zip-archive.zip:/path/to/file (e.g. ./archive.zip:file.txt).
    list                        Executes the "list" KEGG API operation, pulling the entry IDs of the provided database.
    get                         Executes the "get" KEGG API operation, pulling the entries of the provided entry IDs.
    <entry-ids>                 Comma separated list of entry IDs (e.g. id1,id2,id3 etc.). Or if equal to "-", entry IDs are read from standard input, one entry ID per line; Press CTRL+D to finalize input or pipe (e.g. cat file.txt | kegg_pull rest get - ...).
    --entry-field=<entry-field> Optional field to extract from an entry instead of the default entry info (i.e. flat file or htext in the case of brite entries).
    find                        Executes the "find" KEGG API operation, finding entry IDs based on provided queries.
    <keywords>                  Comma separated list of keywords to search entries with (e.g. kw1,kw2,kw3 etc.). Or if equal to "-", keywords are read from standard input, one keyword per line; Press CTRL+D to finalize input or pipe (e.g. cat file.txt | kegg_pull rest find brite - ...).
    --formula=<formula>         Sequence of atoms in a chemical formula format to search for (e.g. "O5C7" searches for molecule entries containing 5 oxygen atoms and/or 7 carbon atoms).
    --em=<exact-mass>           Either a single number (e.g. --em=155.5) or two numbers (e.g. --em=155.5 --em=244.4). If a single number, searches for molecule entries with an exact mass equal to that value rounded by the last decimal point. If two numbers, searches for molecule entries with an exact mass within the two values (a range).
    --mw=<molecular-weight>     Same as --em but searches based on the molecular weight.
    conv                        Executes the "conv" KEGG API operation, converting entry IDs from an outside database to those of a KEGG database and vice versa.
    <kegg-database>             The name of the KEGG database from which to view equivalent outside database entry IDs.
    <outside-database>          The name of the non-KEGG database from which to view equivalent KEGG database entry IDs.
    entry-ids                   Perform the "conv" or "link" operation of the form that maps specific provided entry IDs to a target database.
    link                        Executes the "link" KEGG API operation, showing the IDs of entries that are connected/related to entries of other databases.
    <target-database>           The name of the database that the entry IDs of the source database or provided entry IDs are mapped to.
    <source-database>           The name of the database from which cross-references are found in the target database.
    ddi                         Executes the "ddi" KEGG API operation, searching for drug to drug interactions. Providing one entry ID reports all known interactions, while providing multiple checks if any drug pair in a given set of drugs is CI or P. If providing multiple, all entries must belong to the same database.
    <drug-entry-ids>            Comma separated list of drug entry IDs from the following databases: drug, ndc, or yj (e.g. id1,id2,id3 etc.). Or if equal to "-", entry IDs are read from standard input, one entry ID per line; Press CTRL+D to finalize input or pipe (e.g. cat file.txt | kegg_pull rest ddi - ...).