recount3

recount3

Clear BSD License with extra clauses Current library version Supported Python versions GitHub project

The recount3 package is a Python library and command-line tool for interacting with the recount3 data repository, a uniformly processed collection of RNA-seq studies covering human and mouse samples from SRA, GTEx, and TCGA.

The package provides a typed API for discovering, downloading, and assembling recount3 resources into analysis-ready objects. A companion CLI implements a discover -> manifest -> materialize workflow suitable for use in scripts and pipelines.

The recount3 package can be used in two ways:

  • As a Python library for searching, downloading, and assembling recount3 gene/exon/junction count matrices, sample metadata, genome annotations, and BigWig coverage files into SummarizedExperiment / RangedSummarizedExperiment objects via the BiocPy stack.

  • As a command-line tool (recount3) to search the mirror, produce JSONL manifests, and materialize resources to disk or into a .zip archive.

Installation

The core package requires Python 3.10 or newer and depends on NumPy, pandas, and SciPy. Optional extras unlock BiocPy integration and BigWig support.

Install on Linux, Mac OS X

python3 -m pip install recount3

Install on Windows

py -3 -m pip install recount3

Optional extras

Install with BiocPy support (SummarizedExperiment, RangedSummarizedExperiment, GenomicRanges):

python3 -m pip install "recount3[biocpy]"

Install with BigWig support (pybigwig):

python3 -m pip install "recount3[bigwig]"

Install all optional extras:

python3 -m pip install "recount3[biocpy,bigwig]"

Upgrade on Linux, Mac OS X

python3 -m pip install recount3 --upgrade

Upgrade on Windows

py -3 -m pip install recount3 --upgrade

Quickstart

Python API

Discover all resources for a project, stack the gene-level count matrices across samples, and build a RangedSummarizedExperiment:

>>> from recount3 import R3ResourceBundle
>>>
>>> # Discover every resource for a human SRA project.
>>> bundle = R3ResourceBundle.discover(
...     organism="human",
...     data_source="sra",
...     project="SRP009615",
... )
>>>
>>> # Stack raw gene-count matrices across all samples.
>>> counts = bundle.only_counts().stack_count_matrices(genomic_unit="gene")
>>>
>>> # Build a RangedSummarizedExperiment (requires recount3[biocpy]).
>>> rse = bundle.to_ranged_summarized_experiment(genomic_unit="gene")

Command-line tool

Discover resources, save a JSONL manifest, and download in parallel:

# Search for gene-level count files and write a manifest.
recount3 search gene-exon \
    organism=human data_source=sra genomic_unit=gene project=SRP009615 \
    --format=jsonl > manifest.jsonl

# Materialize all resources from the manifest (8 parallel jobs).
recount3 download --from=manifest.jsonl --dest=./downloads --jobs=8

Stream search output directly into download without an intermediate file:

recount3 search annotations \
    organism=human genomic_unit=gene annotation_extension=G026 \
    --format=jsonl | \
recount3 download --from=- --dest=./annotations

Note

Read the full documentation on Pages for the complete API reference, CLI guide, and worked examples.

Dependencies

Core (installed automatically):

numpy>=2.0
pandas>=2.2
scipy>=1.13

Optional: BiocPy integration (recount3[biocpy]):

biocframe>=0.7
genomicranges>=0.8
summarizedexperiment>=0.6

Optional: BigWig support (recount3[bigwig]):

pybigwig>=0.3.18

Questions, Feature Requests, and Bug Reports

Please submit questions, feature requests, and bug reports on Issues.

License

This package is distributed under the BSD license.

Indices and tables