The mwtab Tutorial#

The mwtab package provides classes and other facilities for downloading, parsing, accessing, and manipulating data stored in either the mwTab or JSON representation of mwTab files.

Also, the mwtab package provides a simple command-line interface to convert between mwTab and JSON representations, download entries from Metabolomics Workbench, access the MW REST interface, validate the consistency of the mwTab files, or extract metadata and metabolites from these files.

Brief mwTab Format Overview#

Note

For the full official specification see the following link (mwTab file specification): http://www.metabolomicsworkbench.org/data/tutorials.php

The mwTab formatted files consist of multiple blocks. Each new block starts with #.

Some of the blocks contain only “key-value”-like pairs.

#METABOLOMICS WORKBENCH STUDY_ID:ST000001 ANALYSIS_ID:AN000001
VERSION              1
CREATED_ON           2016-09-17
#PROJECT
PR:PROJECT_TITLE                     FatB Gene Project
PR:PROJECT_TYPE                      Genotype treatment
PR:PROJECT_SUMMARY                   Experiment to test the consequence of a mutation at the FatB gene (At1g08510)
PR:PROJECT_SUMMARY                   the wound-response of Arabidopsis

Note

Some “key-value”-like pairs can span multiple lines.

The #SUBJECT_SAMPLE_FACTORS block is specially formatted, i.e. it contains header specification and tab-separated values.

#SUBJECT_SAMPLE_FACTORS:             SUBJECT(optional)[tab]SAMPLE[tab]FACTORS(NAME:VALUE pairs separated by |)[tab]Additional sample data
SUBJECT_SAMPLE_FACTORS               -       LabF_115873     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS               -       LabF_115878     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS               -       LabF_115883     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS               -       LabF_115888     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS               -       LabF_115893     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS               -       LabF_115898     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded

The #MS_METABOLITE_DATA (results) block contains Samples identifiers, Factors identifiers as well as tab-separated data between *_START and *_END.

#MS_METABOLITE_DATA
MS_METABOLITE_DATA:UNITS     Peak height
MS_METABOLITE_DATA_START
Samples      LabF_115904     LabF_115909     LabF_115914     LabF_115919     LabF_115924     LabF_115929     LabF_115842     LabF_115847     LabF_115852     LabF_115857     LabF_115862     LabF_115867     LabF_115873     LabF_115878     LabF_115883     LabF_115888     LabF_115893     LabF_115898     LabF_115811     LabF_115816     LabF_115821     LabF_115826     LabF_115831     LabF_115836
Factors      Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded      Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded      Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded      Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded      Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded      Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded
1_2_4-benzenetriol   1874.0000       3566.0000       1945.0000       1456.0000       2004.0000       1995.0000       4040.0000       2432.0000       2189.0000       1931.0000       1307.0000       2880.0000       2218.0000       1754.0000       1369.0000       1201.0000       3324.0000       1355.0000       2257.0000       1718.0000       1740.0000       3472.0000       2054.0000       1367.0000
1-monostearin        987.0000        450.0000        1910.0000       549.0000        1032.0000       902.0000        393.0000        705.0000        100.0000        481.0000        265.0000        120.0000        1185.0000       867.0000        676.0000        569.0000        579.0000        387.0000        1035.0000       789.0000        875.0000        224.0000        641.0000        693.0000
...
MS_METABOLITE_DATA_END

The #METABOLITES metadata block contains a header specifying fields and tab-separated data between *_START and *_END.

#METABOLITES
METABOLITES_START
metabolite_name      moverz_quant    ri      ri_type pubchem_id      inchi_key       kegg_id other_id        other_id_type
1,2,4-benzenetriol   239     522741  Fiehn   10787           C02814  205673  BinBase
1-monostearin        399     959625  Fiehn   107036          D01947  202835  BinBase
2-hydroxyvaleric acid        131     310750  Fiehn   98009                   218773  BinBase
3-phosphoglycerate   299     611619  Fiehn   724             C00597  217821  BinBase
...
METABOLITES_END

The #NMR_BINNED_DATA metadata block contains a header specifying fields and tab-separated data between *_START and *_END. Note that this block has been deprecated. Binned data is now uploaded as a separate text file and referenced with the “NM:NMR_RESULTS_FILE” tag in the NM metadata block.

#NMR_BINNED_DATA
NMR_BINNED_DATA_START
Bin range(ppm)       CDC029  CDC030  CDC032  CPL101  CPL102  CPL103  CPL201  CPL202  CPL203  CDS039  CDS052  CDS054
0.50...0.56  0.00058149      1.6592  0.039301        0       0       0       0.034018        0.0028746       0.0021478       0.013387        0       0
0.56...0.58  0       0.74267 0       0.007206        0       0       0       0       0       0       0       0.0069721
0.58...0.60  0.051165        0.8258  0.089149        0.060972        0.026307        0.045697        0.069541        0       0       0.14516 0.057489        0.042255
...
NMR_BINNED_DATA_END

Order of metadata and data blocks (MS)

#METABOLOMICS WORKBENCH
VERSION              1
CREATED_ON           2016-09-17
...
#PROJECT
...
#STUDY
...
#SUBJECT
...
#SUBJECT_SAMPLE_FACTORS:             SUBJECT(optional)[tab]SAMPLE[tab]FACTORS(NAME:VALUE pairs separated by |)[tab]Additional sample data
...
#COLLECTION
...
#TREATMENT
...
#SAMPLEPREP
...
#CHROMATOGRAPHY
...
#ANALYSIS
...
#MS
...
#MS_METABOLITE_DATA
MS_METABOLITE_DATA:UNITS     peak area
MS_METABOLITE_DATA_START
...
MS_METABOLITE_DATA_END
#METABOLITES
METABOLITES_START
...
METABOLITES_END
#END

Using mwtab as a Library#

Importing the mwtab Package#

If the mwtab package is installed on the system, it can be imported:

[1]:

import mwtab

Constructing a MWTabFile Generator#

The fileio module provides the read_files() generator function that yields MWTabFile instances. Constructing a MWTabFile generator is easy - specify the path to a local mwTab file, directory of files, or archive of files:

[2]:

import mwtab

mwfile_gen = mwtab.read_files("ST000017_AN000035.txt")  # single mwTab file
mwfiles_gen = mwtab.read_files("ST000017_AN000035.txt", "ST000040_AN000060.json")  # several mwTab files
mwdir_gen = mwtab.read_files("mwfiles_dir_mwtab")  # directory of mwTab files
mwzip_gen = mwtab.read_files("mwfiles_mwtab.zip")  # archive of mwTab files
mwanalysis_gen = mwtab.read_files("35", "60")       # ANALYSIS_ID of mwTab files
# REST callable url of mwTab file
mwurl_gen = mwtab.read_files("https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000035/mwtab/txt")

Processing a MWTabFile Generator#

The MWTabFile generator can be processed in several ways:

Feed it to a for-loop and process one file at a time:

[3]:

for mwfile in mwtab.read_files("35", "60"):
    print("STUDY_ID:", mwfile.study_id)       # print STUDY_ID
    print("ANALYSIS_ID", mwfile.analysis_id)  # print ANALYSIS_ID
    print("SOURCE", mwfile.source)            # print source
    for block_name in mwfile:                 # print names of blocks
        print("\t", block_name)

STUDY_ID: ST000017
ANALYSIS_ID AN000035
SOURCE https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000035/mwtab/txt
         METABOLOMICS WORKBENCH
         PROJECT
         STUDY
         SUBJECT
         SUBJECT_SAMPLE_FACTORS
         COLLECTION
         TREATMENT
         SAMPLEPREP
         CHROMATOGRAPHY
         ANALYSIS
         MS
         MS_METABOLITE_DATA
STUDY_ID: ST000040
ANALYSIS_ID AN000060
SOURCE https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000060/mwtab/txt
         METABOLOMICS WORKBENCH
         PROJECT
         STUDY
         SUBJECT
         SUBJECT_SAMPLE_FACTORS
         COLLECTION
         TREATMENT
         SAMPLEPREP
         CHROMATOGRAPHY
         ANALYSIS
         MS
         MS_METABOLITE_DATA

Note

Once the generator is consumed, it becomes empty and needs to be created again.

Since the MWTabFile generator behaves like an iterator, we can call the next() built-in function:

[4]:

mwfiles_generator = mwtab.read_files("35", "60")

mwfile1 = next(mwfiles_generator)
mwfile2 = next(mwfiles_generator)

Note

Once the generator is consumed, the StopIteration Exception will be raised on the subsequent next call.

Convert the MWTabFile generator into a list of MWTabFile objects:

[5]:

mwfiles_generator = mwtab.read_files("35", "60")
mwfiles_list = list(mwfiles_generator)

Warning

Do not convert the MWTabFile generator into a list if the generator can yield a large number of files, e.g. several thousand, otherwise it can consume all available memory.

Accessing Data From a Single MWTabFile#

Since a MWTabFile is a Python dict, data can be accessed and manipulated as with any regular Python dict object using bracket accessors.

Accessing the top-level “keys” in a MWTabFile:

[7]:

mwfile = next(mwtab.read_files("ST000017_AN000035.txt"))

# list MWTabFile-level keys, i.e. saveframe names
list(mwfile.keys())

[7]:

['METABOLOMICS WORKBENCH',
 'PROJECT',
 'STUDY',
 'SUBJECT',
 'SUBJECT_SAMPLE_FACTORS',
 'COLLECTION',
 'TREATMENT',
 'SAMPLEPREP',
 'CHROMATOGRAPHY',
 'ANALYSIS',
 'MS',
 'MS_METABOLITE_DATA']

Accessing individual blocks in MWTabFile:

[8]:

# access "PROJECT" block
mwfile["PROJECT"]

[8]:

{'PROJECT_TITLE': 'Rat Stamina Studies',
 'PROJECT_TYPE': 'Feeding',
 'PROJECT_SUMMARY': 'Stamina in rats',
 'INSTITUTE': 'University of Michigan',
 'DEPARTMENT': 'Internal Medicine',
 'LABORATORY': 'Burant Lab',
 'LAST_NAME': 'Beecher',
 'FIRST_NAME': 'Chris',
 'ADDRESS': '-',
 'EMAIL': 'chrisbee@med.umich.edu',
 'PHONE': '734-232-0815',
 'FUNDING_SOURCE': 'NIH: R01 DK077200'}

Accessing individual “key-value” pairs within blocks:

[9]:

# access "INSTITUTE" field within "PROJECT" block
mwfile["PROJECT"]["INSTITUTE"]

[9]:

'University of Michigan'

Accessing data in the #SUBJECT_SAMPLE_FACTORS block:

[10]:

# access "SUBJECT_SAMPLE_FACTORS" block and print first three
mwfile["SUBJECT_SAMPLE_FACTORS"][:3]

[10]:

[{'Subject ID': '-',
  'Sample ID': 'S00009477',
  'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'}},
 {'Subject ID': '-',
  'Sample ID': 'S00009478',
  'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'}},
 {'Subject ID': '-',
  'Sample ID': 'S00009479',
  'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'}}]

[11]:

# access individual factors (by index)
mwfile["SUBJECT_SAMPLE_FACTORS"][0]

[11]:

{'Subject ID': '-',
 'Sample ID': 'S00009477',
 'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'}}

[12]:

# access individual fields within factors
mwfile["SUBJECT_SAMPLE_FACTORS"][0]["Sample ID"]

[12]:

'S00009477'

Accessing data in the #MS_METABOLITE_DATA block:

[13]:

# access data block keys
list(mwfile["MS_METABOLITE_DATA"].keys())

[13]:

['Units', 'Data', 'Metabolites']

[14]:

# access units field
mwfile["MS_METABOLITE_DATA"]["Units"]

[14]:

'peak area'

[15]:

# access samples field (by index)
mwfile["MS_METABOLITE_DATA"]["Data"][0].keys()

[15]:

['Metabolite', 'S00009477', 'S00009478', 'S00009479', 'S00009480', 'S00009481', 'S00009500', 'S00009501', 'S00009502', 'S00009503', 'S00009470', 'S00009471', 'S00009472', 'S00009473', 'S00009474', 'S00009475', 'S00009494', 'S00009495', 'S00009496', 'S00009497', 'S00009498', 'S00009499', 'S00009488', 'S00009489', 'S00009490', 'S00009491', 'S00009492', 'S00009493', 'S00009509', 'S00009510', 'S00009511', 'S00009512', 'S00009513', 'S00009514', 'S00009482', 'S00009483', 'S00009484', 'S00009486', 'S00009504', 'S00009505', 'S00009506', 'S00009507', 'S00009508']

[16]:

# access metabolite data and print first three
mwfile["MS_METABOLITE_DATA"]["Metabolites"][:3]

[16]:

[{'Metabolite': '11BETA,21-DIHYDROXY-5BETA-PREGNANE-3,20-DIONE',
  'moverz_quant': '',
  'ri': '',
  'ri_type': '',
  'pubchem_id': '44263339',
  'inchi_key': '',
  'kegg_id': 'C05475',
  'other_id': '775216_UNIQUE',
  'other_id_type': 'UM_Target_ID'},
 {'Metabolite': '11-BETA-HYDROXYANDROST-4-ENE-3,17-DIONE',
  'moverz_quant': '',
  'ri': '',
  'ri_type': '',
  'pubchem_id': '94141',
  'inchi_key': '',
  'kegg_id': 'C05284',
  'other_id': '771312_PRIMARY',
  'other_id_type': 'UM_Target_ID'},
 {'Metabolite': '13(S)-HPODE',
  'moverz_quant': '',
  'ri': '',
  'ri_type': '',
  'pubchem_id': '1426',
  'inchi_key': '',
  'kegg_id': 'C04717',
  'other_id': '775541_UNIQUE',
  'other_id_type': 'UM_Target_ID'}]

Manipulating Data From a Single MWTabFile#

In order to change values within the MWTabFile, descend into the appropriate level using square bracket accessors and set a new value.

Change regular “key-value” pairs:

[17]:

# access phone number information
mwfile["PROJECT"]["PHONE"]

[17]:

'734-232-0815'

[18]:

# change phone number information
mwfile["PROJECT"]["PHONE"] = "1-530-754-8258"

[19]:

# check that it has been modified
mwfile["PROJECT"]["PHONE"]

[19]:

'1-530-754-8258'

Change #SUBJECT_SAMPLE_FACTORS values:

[20]:

# access the first subject sample factor by index
mwfile["SUBJECT_SAMPLE_FACTORS"][0]

[20]:

{'Subject ID': '-',
 'Sample ID': 'S00009477',
 'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'}}

[21]:

# provide additional details to the first subject sample factor
mwfile["SUBJECT_SAMPLE_FACTORS"][0]["Additional sample data"] = {"Additional detail key": "Additional detail value"}

[22]:

# check that it has been modified
mwfile["SUBJECT_SAMPLE_FACTORS"][0]

[22]:

{'Subject ID': '-',
 'Sample ID': 'S00009477',
 'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'},
 'Additional sample data': {'Additional detail key': 'Additional detail value'}}

Printing an MWTabFile and its Components#

MWTabFile objects provide the print_file() method which can be used to output the file in either mwTab or JSON format. The method takes a file_format keyword argument which specifices the output format to be displayed.

The MWTabFile can be printed to output in mwTab format in its entirety using:

mwfile.print_file(file_format=”mwtab”)
Print the first 20 lines in mwTab format.

[23]:

from io import StringIO
mwtab_file_str = StringIO()
mwfile.print_file(file_format="mwtab", f=mwtab_file_str)

# print out first 20 lines
print("\n".join(mwtab_file_str.getvalue().split("\n")[:20]))

#METABOLOMICS WORKBENCH STUDY_ID:ST000017 ANALYSIS_ID:AN000035 PROJECT_ID:PR000016
VERSION                 1
CREATED_ON              2016-09-17
#PROJECT
PR:PROJECT_TITLE                        Rat Stamina Studies
PR:PROJECT_TYPE                         Feeding
PR:PROJECT_SUMMARY                      Stamina in rats
PR:INSTITUTE                            University of Michigan
PR:DEPARTMENT                           Internal Medicine
PR:LABORATORY                           Burant Lab
PR:LAST_NAME                            Beecher
PR:FIRST_NAME                           Chris
PR:ADDRESS                              -
PR:EMAIL                                chrisbee@med.umich.edu
PR:PHONE                                1-530-754-8258
PR:FUNDING_SOURCE                       NIH: R01 DK077200
#STUDY
ST:STUDY_TITLE                          Rat HCR/LCR Stamina Study
ST:STUDY_TYPE                           LC-MS analysis
ST:STUDY_SUMMARY                        To determine the basis of running capacity and health differences in outbread

The MWTabFile can be printed to output in JSON format in its entirety using:

mwfile.print_file(file_format=”json”)
Print the first 20 lines in JSON format.

[24]:

from io import StringIO
mwtab_file_str = StringIO()
mwfile.print_file(file_format="json", f=mwtab_file_str)

# print out first 20 lines
print("\n".join(mwtab_file_str.getvalue().split("\n")[:20]))

{
    "METABOLOMICS WORKBENCH": {
        "STUDY_ID": "ST000017",
        "ANALYSIS_ID": "AN000035",
        "PROJECT_ID": "PR000016",
        "VERSION": "1",
        "CREATED_ON": "2016-09-17"
    },
    "PROJECT": {
        "PROJECT_TITLE": "Rat Stamina Studies",
        "PROJECT_TYPE": "Feeding",
        "PROJECT_SUMMARY": "Stamina in rats",
        "INSTITUTE": "University of Michigan",
        "DEPARTMENT": "Internal Medicine",
        "LABORATORY": "Burant Lab",
        "LAST_NAME": "Beecher",
        "FIRST_NAME": "Chris",
        "ADDRESS": "-",
        "EMAIL": "chrisbee@med.umich.edu",
        "PHONE": "1-530-754-8258",

Print a single block in mwTab format.

[25]:

mwfile.print_block("STUDY", file_format="mwtab")

ST:STUDY_TITLE                          Rat HCR/LCR Stamina Study
ST:STUDY_TYPE                           LC-MS analysis
ST:STUDY_SUMMARY                        To determine the basis of running capacity and health differences in outbread
ST:STUDY_SUMMARY                        N/NIH rats selected for high capacity (HCR) and low capacity (LCR) running (a for
ST:STUDY_SUMMARY                        VO2max) (see:Science. 2005 Jan 21;307(5708):418-20). Plasma collected at 12 of
ST:STUDY_SUMMARY                        age in generation 28 rats after ad lib feeding or 40% caloric restriction at week
ST:STUDY_SUMMARY                        8 of age. All animals fasted 4 hours prior to collection between 5-8
ST:INSTITUTE                            University of Michigan
ST:DEPARTMENT                           Internal Medicine
ST:LABORATORY                           Burant Lab (MMOC)
ST:LAST_NAME                            Qi
ST:FIRST_NAME                           Nathan
ST:ADDRESS                              -
ST:EMAIL                                nathanqi@med.umich.edu
ST:PHONE                                734-232-0815
ST:NUM_GROUPS                           2
ST:TOTAL_SUBJECTS                       42

Print a single block in JSON format.

[26]:

mwfile.print_block("STUDY", file_format="json")

{
    "STUDY_TITLE": "Rat HCR/LCR Stamina Study",
    "STUDY_TYPE": "LC-MS analysis",
    "STUDY_SUMMARY": "To determine the basis of running capacity and health differences in outbread N/NIH rats selected for high capacity (HCR) and low capacity (LCR) running (a for VO2max) (see:Science. 2005 Jan 21;307(5708):418-20). Plasma collected at 12 of age in generation 28 rats after ad lib feeding or 40% caloric restriction at week 8 of age. All animals fasted 4 hours prior to collection between 5-8",
    "INSTITUTE": "University of Michigan",
    "DEPARTMENT": "Internal Medicine",
    "LABORATORY": "Burant Lab (MMOC)",
    "LAST_NAME": "Qi",
    "FIRST_NAME": "Nathan",
    "ADDRESS": "-",
    "EMAIL": "nathanqi@med.umich.edu",
    "PHONE": "734-232-0815",
    "NUM_GROUPS": "2",
    "TOTAL_SUBJECTS": "42"
}

Writing Data From a MWTabFile Object Into a File#

Data from a MWTabFile can be written into file in original mwTab format or in equivalent JSON format using write():

Writing into a mwTab formatted file:

[27]:

with open("out/ST000017_AN000035_modified.txt", "w") as outfile:
    mwfile.write(outfile, file_format="mwtab")

Writing into a JSON file:

[28]:

with open("out/ST000017_AN000035_modified.json", "w") as outfile:
    mwfile.write(outfile, file_format="json")

Extracting Metadata and Metabolites from mwTab Files#

The mwtab.mwextract module can be used to extract metadata from mwTab files. The module contains two main methods: 1) extract_metadata() which can be used to parse metadata values from a mwTab file, and 2) extract_metabolites() which can be used to gather a list of metabolites and samples containing the found metabolites from multiple mwTab files which contain a given metadata key value pair.

Extracting Metadata Values#

Extracting metadata values from a given mwTab file:

[29]:

from mwtab.mwextract import extract_metadata

extract_metadata(mwfile, ["STUDY_TYPE", "SUBJECT_TYPE"])

[29]:

{'STUDY_TYPE': {'LC-MS analysis'}, 'SUBJECT_TYPE': {'Animal'}}

Extracting Metabolites Values#

Extracting metabolite information from multiple mwTab files and outputting the first three metabolites:

[30]:

from mwtab.mwextract import extract_metabolites, generate_matchers
from mwtab import read_files

mwtab_gen = read_files(
    "ST000017_AN000035.txt",
    "ST000040_AN000060.txt"
)

matchers = generate_matchers([
    ("ST:STUDY_TYPE",
    "LC-MS analysis")
])
list(extract_metabolites(mwtab_gen, matchers).keys())[:3]

[30]:

['11BETA_21-DIHYDROXY-5BETA-PREGNANE-3_20-DIONE',
 '11-BETA-HYDROXYANDROST-4-ENE-3_17-DIONE',
 '13(S)-HPODE']

Extracting metabolite information from multiple mwTab files using regular expressions and outputting the first three metabolites:

[31]:

from mwtab.mwextract import extract_metabolites, generate_matchers
from mwtab import read_files
from re import compile

mwtab_gen = read_files(
    "ST000017_AN000035.txt",
    "ST000040_AN000060.txt"
)

matchers = generate_matchers([
    ("ST:STUDY_TYPE",
    compile("(LC-MS)"))
])
list(extract_metabolites(mwtab_gen, matchers).keys())[:3]

[31]:

['11BETA_21-DIHYDROXY-5BETA-PREGNANE-3_20-DIONE',
 '11-BETA-HYDROXYANDROST-4-ENE-3_17-DIONE',
 '13(S)-HPODE']

Converting mwTab Files#

mwTab files can be converted between the mwTab file format and their JSON representation using the mwtab.converter module.

One-to-one file conversions#

Converting from the mwTab file format into its equivalent JSON file format:

[32]:

from mwtab.converter import Converter

# Using valid ANALYSIS_ID to access file from URL: from_path="1"
converter = Converter(from_path="35", to_path="out/ST000017_AN000035.json",
                      from_format="mwtab", to_format="json")
converter.convert()

Converting from JSON file format back to mwTab file format:

[33]:

from mwtab.converter import Converter

converter = Converter(from_path="out/ST000017_AN000035.json", to_path="out/ST000017_AN000035.txt",
                      from_format="json", to_format="mwtab")
converter.convert()

Many-to-many files conversions#

Converting from a directory of mwTab formatted files into their equivalent JSON formatted files:

[34]:

from mwtab.converter import Converter

converter = Converter(from_path="mwfiles_dir_mwtab",
                      to_path="out/mwfiles_dir_json",
                      from_format="mwtab",
                      to_format="json")
converter.convert()

Converting from a directory of JSON formatted files into their equivalent mwTab formatted files:

[35]:

from mwtab.converter import Converter

converter = Converter(from_path="out/mwfiles_dir_json",
                      to_path="out/mwfiles_dir_mwtab",
                      from_format="json",
                      to_format="mwtab")
converter.convert()

Note

Many-to-many files and one-to-one file conversions are available. See mwtab.converter for full list of available conversions.

Command-Line Interface#

The mwtab Command-Line Interface provides the following functionality:

Convert from the mwTab file format into its equivalent JSON file format and vice versa.
Download files through Metabolomics Workbench’s REST API.
Validate the mwTab formatted file.
Extract metadata and metabolite information from downloaded files.

[36]:

! mwtab --help

The mwtab command-line interface
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Usage:
    mwtab -h | --help
    mwtab --version
    mwtab convert (<from-path> <to-path>) [--from-format=<format>] [--to-format=<format>] [--mw-rest=<url>] [--verbose]
    mwtab validate <from-path> [--mw-rest=<url>]
    mwtab download url <url> [--to-path=<path>] [--verbose]
    mwtab download study all [--to-path=<path>] [--input-item=<item>] [--output-format=<format>] [--mw-rest=<url>] [--verbose]
    mwtab download study <input-value> [--to-path=<path>] [--input-item=<item>] [--output-item=<item>] [--output-format=<format>] [--mw-rest=<url>] [--verbose]
    mwtab download (study | compound | refmet | gene | protein) <input-item> <input-value> <output-item> [--output-format=<format>] [--to-path=<path>] [--mw-rest=<url>] [--verbose]
    mwtab download moverz <input-item> <m/z-value> <ion-type-value> <m/z-tolerance-value> [--to-path=<path>] [--mw-rest=<url>] [--verbose]
    mwtab download exactmass <LIPID-abbreviation> <ion-type-value> [--to-path=<path>] [--mw-rest=<url>] [--verbose]
    mwtab extract metadata <from-path> <to-path> <key> ... [--to-format=<format>] [--no-header]
    mwtab extract metabolites <from-path> <to-path> (<key> <value>) ... [--to-format=<format>] [--no-header]

Options:
    -h, --help                           Show this screen.
    --version                            Show version.
    --verbose                            Print what files are processing.
    --from-format=<format>               Input file format, available formats: mwtab, json [default: mwtab].
    --to-format=<format>                 Output file format [default: json].
                                         Available formats for convert:
                                             mwtab, json.
                                         Available formats for extract:
                                             json, csv.
    --mw-rest=<url>                      URL to MW REST interface
                                            [default: https://www.metabolomicsworkbench.org/rest/].
    --to-path=<path>                     Directory to save outputs into. Defaults to the current working directory.
    --prefix=<prefix>                    Prefix to add at the beginning of the output file name. Defaults to no prefix.
    --suffix=<suffix>                    Suffix to add at the end of the output file name. Defaults to no suffix.
    --context=<context>                  Type of resource to access from MW REST interface, available contexts: study,
                                         compound, refmet, gene, protein, moverz, exactmass [default: study].
    --input-item=<item>                  Item to search Metabolomics Workbench with.
    --output-item=<item>                 Item to be retrieved from Metabolomics Workbench.
    --output-format=<format>             Format for item to be retrieved in, available formats: mwtab, json.
    --no-header                          Include header at the top of csv formatted files.

    For extraction <to-path> can take a "-" which will use stdout.
    All <from-path>'s can be single files, directories, or URLs.

Documentation webpage: https://moseleybioinformaticslab.github.io/mwtab/
GitHub webpage: https://github.com/MoseleyBioinformaticsLab/mwtab

Converting `mwTab` Files in Bulk#

CLI one-to-one file conversions#

Convert from a local file in mwTab format to a local file in JSON format:

[37]:

! mwtab convert ST000017_AN000035.txt out/ST000017_AN000035.json \
          --from-format=mwtab --to-format=json

Convert from a local file in JSON format to a local file in mwTab format:

[38]:

! mwtab convert ST000017_AN000035.json out/ST000017_AN000035.txt \
          --from-format=json --to-format=mwtab

Convert from a compressed local file in mwTab format to a compressed local file in JSON format:

[39]:

! mwtab convert ST000017_AN000035.txt.gz out/ST000017_AN000035.json.gz \
          --from-format=mwtab --to-format=json

Convert from a compressed local file in JSON format to a compressed local file in mwTab format:

[40]:

! mwtab convert ST000017_AN000035.json.gz out/ST000017_AN000035.txt.gz \
          --from-format=json --to-format=mwtab

Convert from an uncompressed URL file in mwTab format to a compressed local file in JSON format:

[41]:

! mwtab convert 35 out/ST000017_AN000035.json.bz2 \
          --from-format=mwtab --to-format=json

Note

See mwtab.converter for full list of available conversions.

CLI Many-to-many files conversions#

Convert from a directory of files in mwTab format to a directory of files in JSON format:

[42]:

! mwtab convert mwfiles_dir_mwtab out/mwfiles_dir_json \
          --from-format=mwtab --to-format=json

Convert from a directory of files in JSON format to a directory of files in mwTab format:

[43]:

! mwtab convert mwfiles_dir_json out/mwfiles_dir_mwtab \
          --from-format=json --to-format=mwtab

Convert from a directory of files in mwTab format to a zip archive of files in JSON format:

[44]:

! mwtab convert mwfiles_dir_mwtab out/mwfiles_json.zip \
          --from-format=mwtab --to-format=json

Convert from a compressed tar archive of files in JSON format to a directory of files in mwTab format:

[45]:

! mwtab convert mwfiles_json.tar.gz out/mwfiles_dir_mwtab \
          --from-format=json --to-format=mwtab

Convert from a zip archive of files in mwTab format to a compressed tar archive of files in JSON format:

[46]:

! mwtab convert mwfiles_mwtab.zip out/mwfiles_json.tar.bz2 \
          --from-format=mwtab --to-format=json

Note

See mwtab.converter for full list of available conversions.

Download Files Through Metabolomics Workbench’s REST API#

The mwtab package provides the mwtab.mwrest module, which contains a number of functions and classes for working with Metabolomics Workbench’s REST API.

Note

For the full official REST API specification see the following link (MW REST API (v1.0, 5/7/2019)): https://www.metabolomicsworkbench.org/tools/MWRestAPIv1.0.pdf

Note

We do not recommend downloading the JSON version of mwTab files directly from Metabolomics Workbench. There are known errors in some of the JSON files they provide. We recommend downloading in mwTab format and then using this package to convert them to JSON.

Download by URL#

To download a file based on a given URL, simply call the download url command with the desired URL and provide an output path:

[47]:

! mwtab download url "https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000035/mwtab/txt" --to-path=out/ST000017_AN000035.txt

To download a single analysis mwTab file, simply call download study and specify the analysis ID:

[48]:

! mwtab download study AN000035 --to-path=out/ST000017_AN000035.txt

To download an entire study mwTab file, simply call download study and specify the study ID:

[49]:

! mwtab download study ST000017 --to-path=out/ST000017_AN000035.txt

To download a list of studies and/or analyses, simply call download study and provide a list of IDs in a JSON file:

[49]:

! mwtab download study path_to_json.json --to-path=out/

Download study, compound, refmet, gene, and protein Files#

To download study, compound, refmet, gene, and protein context files, call the download command and specify the context, input item, input value, and output item (optionally specify the output format).
Download a study:

[50]:

! mwtab download study analysis_id AN000035 mwtab --output-format=txt --to-path=out/ST000017_AN000035.txt

Download compound:

[51]:

! mwtab download compound regno 11 name --to-path=out/tmp.txt

Download refmet:

[52]:

! mwtab download refmet name Cholesterol all --to-path=out/tmp.txt

Download gene:

[53]:

! mwtab download gene gene_symbol acaca all --to-path=out/tmp.txt

Download protein:

[54]:

! mwtab download protein uniprot_id Q13085 all --to-path=out/tmp.txt

Download All `mwTab` Formatted Files#

The mwTab package provides a number of command line functions for downloading Metabolomics mwtab formatted files through the Workbench’s REST API.

To download all available analysis files, simply call the download study all command:

! mwtab download study all

It is also possible to download all study files by calling the download study all command and providing an input item and output path:

! mwtab download study all –input-item=study_id

Download moverz and exactmass#

To download moverz files, call the download moverz command and specify the input value (LIPIDS, MB, or REFMET), m/z value, ion type value, and m/z tolerance value.

[55]:

! mwtab download moverz MB 635.52 M+H 0.5 --to-path=out/tmp.txt

To download exactmass files, call the download exactmass command and specify the LIPID abbreviation and ion type value.

[56]:

! mwtab download exactmass "PC(34:1)" M+H --to-path=out/tmp.txt

Note

It is not necessary to specify an output format for exactmass files.

Extracting Metabolite Data and Metadata From `mwTab` Files#

The mwtab package provides the extract_metabolites() and extract_metadata() functions that can parse mwTab formatted files. The extract_metabolites() takes a source (list of mwTab file) and list of metadata key-value pairs that are used to search for mwTab files which contain the given metadata pairs. The extract_metadata() takes a source (list of mwTab file) and list of metadata keys which are used to search the mwTab files for possible values to the given keys.

To extract metabolites from mwTab files in a directory, call the extract metabolites command and provide a list of metadata key value pairs along with an output path and output format:

[57]:

! mwtab extract metabolites mwfiles_dir_mwtab out/output_file.csv SU:SUBJECT_TYPE Plant --to-format=csv

Note

It is possible to use ReGeXs to match the metadata value (eg. … SU:SUBJECT_TYPE “r’(Plant)’”).

To extract metadata from mwTab files in a directory call the extract metadata command and provide a list of metadata keys along with an output path and output format:

[58]:

! mwtab extract metadata mwfiles_dir_json out/output_file.json SUBJECT_TYPE --to-format=json

Validating `mwTab` Files#

The mwtab package provides the validate_file() function that can validate files based on a JSON schema definition. The mwtab.mwschema contains schema definitions for every block of mwTab formatted file, i.e. it lists the types of attributes (e.g. str as well as specifies which keys are optional and which are required).

To validate file(s), simply call the validate command and provide a path to file(s):

[59]:

! mwtab validate 35

Using the mwtab Package to Find Analyses Involving a Specific Disease or Condition#

The Metabolomics Workbench data repository stores mass spectroscopy and nuclear magnetic resonance experimental data and metadata in mwTab formatted files. Metabolomics Workbench also provides a number of tools for searching or analyzing mwTab files. The mwtab Python package can also be used to perform similar functions through both a programmatic API and command-line interface, which has more search flexibility.

In order to search the repository of mwTab files for analyses associated with a specific disease, Metabolomics Workbench provides a web-based interface:

https://www.metabolomicsworkbench.org/data/metsearch_MS_form2.php

The mwtab Python package can be used in a number of ways to similar effect. The package provides the extract_metabolites() method to extract and organize metabolites from multiple mwTab files through both Python scripts and a command-line interface. This method has more search flexibility, since it can take either a search string or a regular expression.

Using the mwtab Package API to Extract Study IDs, Analysis IDs, and Metabolites#

The extract_metabolites() method takes two parameters: 1) an iterable of MWTabFile instances and 2) an iterable of ItemMatcher or ReGeXMatcher instances. The iterable of MWTabFile instances can be created by passing mwTab file sources (filenames, analysis IDs, etc.) to the read_files() method. The iterable of matcher instances can be created using the generate_matchers() method.

An example of using the mwtab package API to extract data from analyses associated with diabetes and output the first three metabolites:

[60]:

from mwtab.mwextract import extract_metabolites, generate_matchers
from mwtab import read_files
import re

mwtab_gen = read_files("diabetes/")

matchers = generate_matchers([
    ("ST:STUDY_SUMMARY",
    re.compile("(diabetes)"))
])
list(extract_metabolites(mwtab_gen, matchers).keys())[:3]

[60]:

['1_5-anhydroglucitol', '1-monopalmitin', '1-monostearin']

Using the mwtab CLI to Extract Study IDs, Analysis IDs, and Metabolites#

The mwtab command line interface includes a mwtab extract metabolites method which takes a directory of mwTab files, an output path to save the extracted data in, and a series of mwTab section item keys and values to be matched (either string values or regular expressions). Additionally an output format can be specified.

mwtab extract metabolites <from-path> <to-path> (<key> <value>) … [–to-format=<format>] [–no-header]

An example of using the mwtab CLI to extract data from analyses associated with diabetes:

[61]:

! mwtab extract metabolites diabetes/ out/output_file.json ST:STUDY_SUMMARY "r'(?i)(diabetes)'" --to-format=json

The mwtab Tutorial#

Brief mwTab Format Overview#

Using mwtab as a Library#

Importing the mwtab Package#

Constructing a MWTabFile Generator#

Processing a MWTabFile Generator#

Accessing Data From a Single MWTabFile#

Manipulating Data From a Single MWTabFile#

Printing an MWTabFile and its Components#

Writing Data From a MWTabFile Object Into a File#

Extracting Metadata and Metabolites from mwTab Files#

Extracting Metadata Values#

Extracting Metabolites Values#

Converting mwTab Files#

One-to-one file conversions#

Many-to-many files conversions#

Command-Line Interface#

Converting mwTab Files in Bulk#

CLI one-to-one file conversions#

CLI Many-to-many files conversions#

Download Files Through Metabolomics Workbench’s REST API#

Download by URL#

Download study, compound, refmet, gene, and protein Files#

Download All mwTab Formatted Files#

Download moverz and exactmass#

Extracting Metabolite Data and Metadata From mwTab Files#

Validating mwTab Files#

Using the mwtab Package to Find Analyses Involving a Specific Disease or Condition#

Using the mwtab Package API to Extract Study IDs, Analysis IDs, and Metabolites#

Using the mwtab CLI to Extract Study IDs, Analysis IDs, and Metabolites#

Converting `mwTab` Files in Bulk#

Download All `mwTab` Formatted Files#

Extracting Metabolite Data and Metadata From `mwTab` Files#

Validating `mwTab` Files#