Tutorial

This tutorial walks through common workflows using the recount3 Python package. All examples assume the package is installed:

pip install "recount3[biocpy]"

For BigWig support add the bigwig extra as well:

pip install "recount3[biocpy,bigwig]"

Discovering Resources for a Project

R3ResourceBundle is the main entry point. Call discover() to fetch every resource associated with a project:

from recount3 import R3ResourceBundle

bundle = R3ResourceBundle.discover(
    organism="human",
    data_source="sra",
    project="SRP009615",
)
print(f"Found {len(bundle.resources)} resources.")

Supported organisms are "human" and "mouse". Supported data sources are "sra", "gtex", and "tcga".

Stacking Count Matrices

Use stack_count_matrices() to concatenate all gene-level (or exon-level, junction-level) count files from a bundle into a single pandas.DataFrame:

counts = bundle.only_counts().stack_count_matrices(genomic_unit="gene")
print(counts.shape)   # (n_genes, n_samples)

Building a SummarizedExperiment

Requires the biocpy extra (biocframe, summarizedexperiment).

se = bundle.to_summarized_experiment(genomic_unit="gene")
print(se)

Building a RangedSummarizedExperiment

A RangedSummarizedExperiment attaches genomic coordinate ranges to each feature row. A GTF annotation file is required; include it in the bundle via search_project_all (done automatically by discover()) or add an annotation resource manually.

rse = bundle.to_ranged_summarized_experiment(genomic_unit="gene")
print(rse)
print(rse.shape)          # (n_genes, n_samples)
print(rse.colnames[:5])   # first five sample IDs

Accessing Metadata

Column metadata (sample annotations) is merged from every available metadata table and aligned to the count columns automatically:

col_data = rse.coldata.to_pandas()
print(col_data.columns.tolist())

Filtering Bundles

Bundles are immutable snapshots. Use filter() to narrow down by any description field:

gene_only = bundle.filter(genomic_unit="gene")
annots    = bundle.filter(resource_type="annotations")
metadata  = bundle.only_metadata()

Working with Individual Resources

Each R3Resource knows its URL and can be downloaded and loaded independently:

from recount3 import R3GeneOrExonCounts, R3Resource

desc = R3GeneOrExonCounts(
    organism="human",
    data_source="sra",
    genomic_unit="gene",
    project="SRP009615",
    sample="SRR387777",
)
res = R3Resource(desc)
res.download(path=None, cache_mode="enable")  # saves to local cache
df = res.load()                               # returns pd.DataFrame
print(df.head())

Searching Without a Bundle

The recount3.search helpers return lists of R3Resource objects directly:

from recount3 import search_count_files_gene_or_exon

resources = search_count_files_gene_or_exon(
    organism="human",
    data_source="sra",
    genomic_unit="gene",
    project="SRP009615",
)
print(f"{len(resources)} count files found.")

Cache Management

Downloaded files are cached under ~/.cache/recount3/files by default. Use the recount3.config helpers to inspect or clear the cache:

from recount3 import recount3_cache, recount3_cache_files, recount3_cache_rm

print(recount3_cache())          # Path to the cache directory
print(recount3_cache_files())    # List all cached files

# Remove only junction files (dry run first):
removed = recount3_cache_rm(
    predicate=lambda p: ".junctions." in p.name,
    dry_run=True,
)
print(f"Would remove {len(removed)} files.")

The cache directory and other settings can be overridden via environment variables (RECOUNT3_CACHE_DIR, RECOUNT3_URL, etc.) or by constructing a custom Config and passing it to any resource or search function.

SRA Attribute Expansion

recount3 stores SRA sample attributes as a single pipe-delimited string column. recount3.se.build_ranged_summarized_experiment() expands them automatically, but you can also call the helper directly:

from recount3.se import expand_sra_attributes

rse_expanded = expand_sra_attributes(rse)
col_data = rse_expanded.coldata.to_pandas()
# Columns like "sra_attribute.age", "sra_attribute.disease" are now present.