CLI Reference

The recount3 command-line tool implements a discover -> manifest -> materialize workflow.

Command-line interface for recount3.

A discover -> manifest -> materialize workflow for the recount3 data mirror.

Summary

Use recount3 to:

  • ids - Emit unique sample and project IDs.

  • search - Discover resources and print a machine-readable manifest

    (JSONL or TSV).

  • download - Materialize resources from a manifest (dir or .zip).

  • bundle - Operate on multiple resources (e.g., stack count matrices).

  • smoke-test - Small connectivity test for CI / local validation.

Quick start

Discover a handful of gene-level count files, save a manifest, and download:

$ recount3 search gene-exon \

organism=human data_source=sra genomic_unit=gene project=SRP012345 \ –format=jsonl > manifest.jsonl

$ recount3 download –from=manifest.jsonl –dest=./downloads –jobs=8

Or stream directly, without an intermediate file:

$ recount3 search annotations \

organism=human genomic_unit=gene annotation_extension=G026 \ –format=jsonl | \ recount3 download –from=- –dest=./annots

Commands

ids

Emit unique ID lists. By default prints to stdout.

Flags:

–organism=human|mouse|”” Empty means all organisms. –samples-out=<file> Write samples to plain text file (else stdout). –projects-out=<file> Write projects to plain text file (else stdout).

search

Discover resources and print a manifest (JSONL or TSV). Filters are passed as space-separated key=value tokens.

Output:

By default results are written to stdout (pipe-friendly). Use --output <file> to write a specific file, or --outdir <dir> to create a timestamped filename in that directory.

Modes and required filters:

annotations organism, genomic_unit, annotation_extension gene-exon organism, data_source, genomic_unit, project

(optional: annotation_extension; default G026)

junctions organism, data_source, project

(optional: junction_type=ALL, junction_extension=MM)

metadata organism, data_source, table_name, project bigwig organism, data_source, project, sample project organism, data_source, project

(optional: genomic_unit=gene,exon; annotation=default|all|gencode_v26,gencode_v29

Human-readable name or convenience alias. ‘default’ → primary annotation (G026 for human, M023 for mouse). ‘all’ → every available annotation. A comma list of names (e.g. gencode_v26) or raw extension codes (e.g. G026) also works.

annotation_extension=G026,G029

Raw annotation file-extension codes. When set, overrides ‘annotation’ completely. Use this when you already know the exact code(s) you need.

junction_type=ALL; junction_extension=MM,RR,ID; include_metadata=true|false; include_bigwig=true|false)

sources organism source-meta organism, data_source

Example:
$ recount3 search junctions \

organism=human data_source=sra project=SRP000000 \ junction_type=ALL junction_extension=MM –format=tsv

download

Materialize resources from a manifest file or one inline JSON object. Writes one JSONL progress event per resource to stdout.

Source:

–from=<path>|- Read JSONL manifest from file or stdin (‘-‘). –inline=’<json>’ One JSON object for a single resource.

Destination:
--dest=<dir-or-zip>

Directory or .zip file path.

--overwrite

Overwrite existing files (dir mode only).

Behavior:
--jobs=<n>

Max parallel downloads (default 4).

--cache=MODE

Cache behavior (default: enable). MODE is one of: enable - use cache; disable - bypass cache; update - force re-download then cache.

bundle stack-counts

Concatenate compatible count matrices (gene/exon or junctions).

Required:
--from=<manifest>

JSONL manifest (or ‘-’ for stdin).

--out=<path>

Output file (.csv, .tsv, .tsv.gz, or .parquet).

Options:

–compat=family|feature Compatibility mode (default: family). –join=inner|outer Pandas join type (default: inner). –axis=0|1 Concatenate rows (0) or columns (1). –verify-integrity Fail on duplicate index after concat.

smoke-test

Download a few tiny files to verify connectivity and configuration.

Options:
--dest=<dir>

Destination directory (default ./recount3-smoke).

--limit=<n>

Number of resources to attempt (default 1).

Input and output formats

JSONL (a.k.a. NDJSON)

One JSON object per line. Great for streaming, grepping, and piping.

  • Search output / Download input (manifest): Each line contains all resource description fields plus two convenience keys:

    • url: the fully qualified HTTP URL.

    • arcname: the destination path inside a .zip archive.

    Example (one line, wrapped for readability):

    {“resource_type”:”gene_exon_counts”,

    “organism”:”human”,”data_source”:”sra”,”genomic_unit”:”gene”, “project”:”SRP012345”,”sample”:”SRR999000”,”table_name”:”gene”, “url”:”https://…/gene/SRR999000.gz”, “arcname”:”gene/SRR999000.gz”}

  • Download progress events (stdout): One event per resource:

    {“url”:”…”,”status”:”ok”,”dest”:”/path/to/file”} {“url”:”…”,”status”:”skipped”,”dest”:”/existing/file”} {“url”:”…”,”status”:”error”,”dest”:null,”error”:”<repr>”}

TSV

Tab-separated text for quick human scanning or spreadsheet import. TSV is available for search --format=tsv only; download expects JSONL.

Configuration

Configuration is centralized in Config. Values come from:

  1. CLI flags (highest precedence)

  2. Environment variables

  3. Library defaults (lowest precedence)

Relevant environment variables (if set):

RECOUNT3_URL Base URL (trailing slash added automatically) RECOUNT3_CACHE_DIR Directory for on-disk cache RECOUNT3_CACHE_DISABLE “1” disables cache, anything else enables RECOUNT3_HTTP_TIMEOUT HTTP timeout in seconds (int) RECOUNT3_MAX_RETRIES Max retry attempts for transient errors (int) RECOUNT3_INSECURE_SSL “1” to disable TLS verification (unsafe) RECOUNT3_USER_AGENT Custom HTTP User-Agent string

Global flags mirror these settings:

–base-url, –cache-dir, –timeout, –retries, –insecure-ssl, –user-agent, –chunk-size

Logging

Logging defaults to INFO. Use --quiet for WARNING or --verbose for DEBUG. Log messages follow pattern-string formatting (not f-strings), per the Google guide, and include greppable context (e.g., url=..., dest=...).

Exit codes

0 Success 1 Malformed --inline JSON in download 2 Fatal error (missing filters, I/O failures, bad configuration;

also argparse validation errors such as unrecognized flags)

3 Partial failure in download (some items failed) 130 Interrupted (Ctrl-C)

Security and safety

  • TLS verification is on by default. --insecure-ssl disables it and should only be used to debug certificate issues.

  • The cache reduces repeated downloads. Choose --cache=disable to bypass it when correctness requires a direct fetch.

Performance tips

  • Increase --jobs to improve throughput when network-bound.

  • Keep the cache enabled for repeated workflows.

  • Use streaming pipelines with JSONL and standard tools (jq, grep, head/tail) to avoid loading everything into memory.

Example recipes

List human SRA data sources, then download their metadata:

$ recount3 search sources organism=human –format=jsonl > sources.jsonl $ recount3 search source-meta organism=human data_source=sra –format=jsonl \

> meta.jsonl

$ recount3 download –from=meta.jsonl –dest=./meta

Stack gene-level matrices across samples and write Parquet:

$ recount3 search gene-exon \

organism=human data_source=sra genomic_unit=gene project=SRP012345 \ –format=jsonl > counts.jsonl

$ recount3 bundle stack-counts –from=counts.jsonl –compat=family \

–join=inner –axis=1 –out=counts.parquet

Troubleshooting

  • “Missing required filters”: Check the mode-specific filter list above.

  • “json.JSONDecodeError”: Ensure your manifest is valid JSONL. Each line must be one JSON object.

  • Permission/Path errors: Verify --dest exists (or its parent for .zip) and is writable; on shared filesystems, reduce --jobs to avoid pressure.

  • TLS/SSL errors: Try updating CA certs, or as a last resort temporarily use --insecure-ssl to isolate the issue.

Import safety

Only defines functions and constants. Performs no I/O at import time so it is safe to run under pydoc and unit tests.

Full usage

Run any subcommand with --help for the full option list:

recount3 --help
recount3 search --help
recount3 download --help
recount3 bundle stack-counts --help
recount3 bundle se --help
recount3 bundle rse --help
recount3 smoke-test --help