recount3
recount3
The recount3 package is a Python library and command-line tool for
interacting with the recount3 data repository, a uniformly processed
collection of RNA-seq studies covering human and mouse samples from SRA,
GTEx, and TCGA.
The package provides a typed API for discovering, downloading, and assembling recount3 resources into analysis-ready objects. A companion CLI implements a discover -> manifest -> materialize workflow suitable for use in scripts and pipelines.
The recount3 package can be used in two ways:
As a Python library for searching, downloading, and assembling recount3 gene/exon/junction count matrices, sample metadata, genome annotations, and BigWig coverage files into
SummarizedExperiment/RangedSummarizedExperimentobjects via the BiocPy stack.As a command-line tool (
recount3) to search the mirror, produce JSONL manifests, and materialize resources to disk or into a.ziparchive.
Links
Installation
The core package requires Python 3.10 or newer and depends on NumPy, pandas, and SciPy. Optional extras unlock BiocPy integration and BigWig support.
Install on Linux, Mac OS X
python3 -m pip install recount3
Install on Windows
py -3 -m pip install recount3
Optional extras
Install with BiocPy support (SummarizedExperiment, RangedSummarizedExperiment,
GenomicRanges):
python3 -m pip install "recount3[biocpy]"
Install with BigWig support (pybigwig):
python3 -m pip install "recount3[bigwig]"
Install all optional extras:
python3 -m pip install "recount3[biocpy,bigwig]"
Upgrade on Linux, Mac OS X
python3 -m pip install recount3 --upgrade
Upgrade on Windows
py -3 -m pip install recount3 --upgrade
Quickstart
Python API
Discover all resources for a project, stack the gene-level count matrices
across samples, and build a RangedSummarizedExperiment:
>>> from recount3 import R3ResourceBundle
>>>
>>> # Discover every resource for a human SRA project.
>>> bundle = R3ResourceBundle.discover(
... organism="human",
... data_source="sra",
... project="SRP009615",
... )
>>>
>>> # Stack raw gene-count matrices across all samples.
>>> counts = bundle.only_counts().stack_count_matrices(genomic_unit="gene")
>>>
>>> # Build a RangedSummarizedExperiment (requires recount3[biocpy]).
>>> rse = bundle.to_ranged_summarized_experiment(genomic_unit="gene")
Command-line tool
Discover resources, save a JSONL manifest, and download in parallel:
# Search for gene-level count files and write a manifest.
recount3 search gene-exon \
organism=human data_source=sra genomic_unit=gene project=SRP009615 \
--format=jsonl > manifest.jsonl
# Materialize all resources from the manifest (8 parallel jobs).
recount3 download --from=manifest.jsonl --dest=./downloads --jobs=8
Stream search output directly into download without an intermediate file:
recount3 search annotations \
organism=human genomic_unit=gene annotation_extension=G026 \
--format=jsonl | \
recount3 download --from=- --dest=./annotations
Note
Read the full documentation on Pages for the complete API reference, CLI guide, and worked examples.
Dependencies
Core (installed automatically):
numpy>=2.0
pandas>=2.2
scipy>=1.13
Optional: BiocPy integration (recount3[biocpy]):
biocframe>=0.7
genomicranges>=0.8
summarizedexperiment>=0.6
Optional: BigWig support (recount3[bigwig]):
pybigwig>=0.3.18
Questions, Feature Requests, and Bug Reports
Please submit questions, feature requests, and bug reports on Issues.
License
This package is distributed under the BSD license.
Contents: