CLI Reference
The recount3 command-line tool implements a
discover -> manifest -> materialize workflow.
Command-line interface for recount3.
A discover -> manifest -> materialize workflow for the recount3 data mirror.
Summary
Use recount3 to:
ids - Emit unique sample and project IDs.
- search - Discover resources and print a machine-readable manifest
(JSONL or TSV).
download - Materialize resources from a manifest (dir or .zip).
bundle - Operate on multiple resources (e.g., stack count matrices).
smoke-test - Small connectivity test for CI / local validation.
Quick start
Discover a handful of gene-level count files, save a manifest, and download:
- $ recount3 search gene-exon \
organism=human data_source=sra genomic_unit=gene project=SRP012345 \ –format=jsonl > manifest.jsonl
$ recount3 download –from=manifest.jsonl –dest=./downloads –jobs=8
Or stream directly, without an intermediate file:
- $ recount3 search annotations \
organism=human genomic_unit=gene annotation_extension=G026 \ –format=jsonl | \ recount3 download –from=- –dest=./annots
Commands
- ids
Emit unique ID lists. By default prints to stdout.
- Flags:
–organism=human|mouse|”” Empty means all organisms. –samples-out=<file> Write samples to plain text file (else stdout). –projects-out=<file> Write projects to plain text file (else stdout).
- search
Discover resources and print a manifest (JSONL or TSV). Filters are passed as space-separated key=value tokens.
- Output:
By default results are written to stdout (pipe-friendly). Use
--output <file>to write a specific file, or--outdir <dir>to create a timestamped filename in that directory.- Modes and required filters:
annotations organism, genomic_unit, annotation_extension gene-exon organism, data_source, genomic_unit, project
(optional: annotation_extension; default G026)
junctions organism, data_source, project
(optional: junction_type=ALL, junction_extension=MM)
metadata organism, data_source, table_name, project bigwig organism, data_source, project, sample project organism, data_source, project
(optional: genomic_unit=gene,exon; annotation=default|all|gencode_v26,gencode_v29
Human-readable name or convenience alias. ‘default’ → primary annotation (G026 for human, M023 for mouse). ‘all’ → every available annotation. A comma list of names (e.g. gencode_v26) or raw extension codes (e.g. G026) also works.
- annotation_extension=G026,G029
Raw annotation file-extension codes. When set, overrides ‘annotation’ completely. Use this when you already know the exact code(s) you need.
junction_type=ALL; junction_extension=MM,RR,ID; include_metadata=true|false; include_bigwig=true|false)
sources organism source-meta organism, data_source
- Example:
- $ recount3 search junctions \
organism=human data_source=sra project=SRP000000 \ junction_type=ALL junction_extension=MM –format=tsv
- download
Materialize resources from a manifest file or one inline JSON object. Writes one JSONL progress event per resource to stdout.
- Source:
–from=<path>|- Read JSONL manifest from file or stdin (‘-‘). –inline=’<json>’ One JSON object for a single resource.
- Destination:
- --dest=<dir-or-zip>
Directory or .zip file path.
- --overwrite
Overwrite existing files (dir mode only).
- Behavior:
- --jobs=<n>
Max parallel downloads (default 4).
- --cache=MODE
Cache behavior (default: enable). MODE is one of: enable - use cache; disable - bypass cache; update - force re-download then cache.
- bundle stack-counts
Concatenate compatible count matrices (gene/exon or junctions).
- Required:
- --from=<manifest>
JSONL manifest (or ‘-’ for stdin).
- --out=<path>
Output file (.csv, .tsv, .tsv.gz, or .parquet).
- Options:
–compat=family|feature Compatibility mode (default: family). –join=inner|outer Pandas join type (default: inner). –axis=0|1 Concatenate rows (0) or columns (1). –verify-integrity Fail on duplicate index after concat.
- smoke-test
Download a few tiny files to verify connectivity and configuration.
- Options:
- --dest=<dir>
Destination directory (default ./recount3-smoke).
- --limit=<n>
Number of resources to attempt (default 1).
Input and output formats
- JSONL (a.k.a. NDJSON)
One JSON object per line. Great for streaming, grepping, and piping.
Search output / Download input (manifest): Each line contains all resource description fields plus two convenience keys:
url: the fully qualified HTTP URL.arcname: the destination path inside a .zip archive.
Example (one line, wrapped for readability):
- {“resource_type”:”gene_exon_counts”,
“organism”:”human”,”data_source”:”sra”,”genomic_unit”:”gene”, “project”:”SRP012345”,”sample”:”SRR999000”,”table_name”:”gene”, “url”:”https://…/gene/SRR999000.gz”, “arcname”:”gene/SRR999000.gz”}
Download progress events (stdout): One event per resource:
{“url”:”…”,”status”:”ok”,”dest”:”/path/to/file”} {“url”:”…”,”status”:”skipped”,”dest”:”/existing/file”} {“url”:”…”,”status”:”error”,”dest”:null,”error”:”<repr>”}
- TSV
Tab-separated text for quick human scanning or spreadsheet import. TSV is available for
search --format=tsvonly;downloadexpects JSONL.
Configuration
Configuration is centralized in Config. Values come from:
CLI flags (highest precedence)
Environment variables
Library defaults (lowest precedence)
- Relevant environment variables (if set):
RECOUNT3_URL Base URL (trailing slash added automatically) RECOUNT3_CACHE_DIR Directory for on-disk cache RECOUNT3_CACHE_DISABLE “1” disables cache, anything else enables RECOUNT3_HTTP_TIMEOUT HTTP timeout in seconds (int) RECOUNT3_MAX_RETRIES Max retry attempts for transient errors (int) RECOUNT3_INSECURE_SSL “1” to disable TLS verification (unsafe) RECOUNT3_USER_AGENT Custom HTTP User-Agent string
- Global flags mirror these settings:
–base-url, –cache-dir, –timeout, –retries, –insecure-ssl, –user-agent, –chunk-size
Logging
Logging defaults to INFO. Use --quiet for WARNING or --verbose for
DEBUG. Log messages follow pattern-string formatting (not f-strings), per the
Google guide, and include greppable context (e.g., url=..., dest=...).
Exit codes
0 Success 1 Malformed
--inlineJSON indownload2 Fatal error (missing filters, I/O failures, bad configuration;also argparse validation errors such as unrecognized flags)
3 Partial failure in
download(some items failed) 130 Interrupted (Ctrl-C)
Security and safety
TLS verification is on by default.
--insecure-ssldisables it and should only be used to debug certificate issues.The cache reduces repeated downloads. Choose
--cache=disableto bypass it when correctness requires a direct fetch.
Performance tips
Increase
--jobsto improve throughput when network-bound.Keep the cache enabled for repeated workflows.
Use streaming pipelines with JSONL and standard tools (
jq,grep,head/tail) to avoid loading everything into memory.
Example recipes
List human SRA data sources, then download their metadata:
$ recount3 search sources organism=human –format=jsonl > sources.jsonl $ recount3 search source-meta organism=human data_source=sra –format=jsonl \
> meta.jsonl
$ recount3 download –from=meta.jsonl –dest=./meta
Stack gene-level matrices across samples and write Parquet:
- $ recount3 search gene-exon \
organism=human data_source=sra genomic_unit=gene project=SRP012345 \ –format=jsonl > counts.jsonl
- $ recount3 bundle stack-counts –from=counts.jsonl –compat=family \
–join=inner –axis=1 –out=counts.parquet
Troubleshooting
“Missing required filters”: Check the mode-specific filter list above.
“json.JSONDecodeError”: Ensure your manifest is valid JSONL. Each line must be one JSON object.
Permission/Path errors: Verify
--destexists (or its parent for .zip) and is writable; on shared filesystems, reduce--jobsto avoid pressure.TLS/SSL errors: Try updating CA certs, or as a last resort temporarily use
--insecure-sslto isolate the issue.
Import safety
Only defines functions and constants. Performs no I/O at import time so it is safe to run under pydoc and unit tests.
Full usage
Run any subcommand with --help for the full option list:
recount3 --help
recount3 search --help
recount3 download --help
recount3 bundle stack-counts --help
recount3 bundle se --help
recount3 bundle rse --help
recount3 smoke-test --help