The mwtab Tutorial#
The mwtab package provides classes and other facilities for downloading,
parsing, accessing, and manipulating data stored in either the mwTab or
JSON representation of mwTab files.
Also, the mwtab package provides a simple command-line interface to convert
between mwTab and JSON representations, download entries from
Metabolomics Workbench, access the MW REST interface, validate the consistency
of the mwTab files, or extract metadata and metabolites from these files.
Brief mwTab Format Overview#
Note
For the full official specification see the following link (mwTab file specification):
http://www.metabolomicsworkbench.org/data/tutorials.php
The mwTab formatted files consist of multiple blocks. Each new block starts with #.
Some of the blocks contain only “key-value”-like pairs.
#METABOLOMICS WORKBENCH STUDY_ID:ST000001 ANALYSIS_ID:AN000001
VERSION 1
CREATED_ON 2016-09-17
#PROJECT
PR:PROJECT_TITLE FatB Gene Project
PR:PROJECT_TYPE Genotype treatment
PR:PROJECT_SUMMARY Experiment to test the consequence of a mutation at the FatB gene (At1g08510)
PR:PROJECT_SUMMARY the wound-response of Arabidopsis
Note
Some “key-value”-like pairs can span multiple lines.
The
#SUBJECT_SAMPLE_FACTORSblock is specially formatted, i.e. it contains header specification and tab-separated values.
#SUBJECT_SAMPLE_FACTORS: SUBJECT(optional)[tab]SAMPLE[tab]FACTORS(NAME:VALUE pairs separated by |)[tab]Additional sample data
SUBJECT_SAMPLE_FACTORS - LabF_115873 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS - LabF_115878 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS - LabF_115883 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS - LabF_115888 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS - LabF_115893 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS - LabF_115898 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
The
#MS_METABOLITE_DATA(results) block containsSamplesidentifiers,Factorsidentifiers as well as tab-separated data between*_STARTand*_END.
#MS_METABOLITE_DATA
MS_METABOLITE_DATA:UNITS Peak height
MS_METABOLITE_DATA_START
Samples LabF_115904 LabF_115909 LabF_115914 LabF_115919 LabF_115924 LabF_115929 LabF_115842 LabF_115847 LabF_115852 LabF_115857 LabF_115862 LabF_115867 LabF_115873 LabF_115878 LabF_115883 LabF_115888 LabF_115893 LabF_115898 LabF_115811 LabF_115816 LabF_115821 LabF_115826 LabF_115831 LabF_115836
Factors Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded
1_2_4-benzenetriol 1874.0000 3566.0000 1945.0000 1456.0000 2004.0000 1995.0000 4040.0000 2432.0000 2189.0000 1931.0000 1307.0000 2880.0000 2218.0000 1754.0000 1369.0000 1201.0000 3324.0000 1355.0000 2257.0000 1718.0000 1740.0000 3472.0000 2054.0000 1367.0000
1-monostearin 987.0000 450.0000 1910.0000 549.0000 1032.0000 902.0000 393.0000 705.0000 100.0000 481.0000 265.0000 120.0000 1185.0000 867.0000 676.0000 569.0000 579.0000 387.0000 1035.0000 789.0000 875.0000 224.0000 641.0000 693.0000
...
MS_METABOLITE_DATA_END
The
#METABOLITESmetadata block contains a header specifying fields and tab-separated data between*_STARTand*_END.
#METABOLITES
METABOLITES_START
metabolite_name moverz_quant ri ri_type pubchem_id inchi_key kegg_id other_id other_id_type
1,2,4-benzenetriol 239 522741 Fiehn 10787 C02814 205673 BinBase
1-monostearin 399 959625 Fiehn 107036 D01947 202835 BinBase
2-hydroxyvaleric acid 131 310750 Fiehn 98009 218773 BinBase
3-phosphoglycerate 299 611619 Fiehn 724 C00597 217821 BinBase
...
METABOLITES_END
The
#NMR_BINNED_DATAmetadata block contains a header specifying fields and tab-separated data between*_STARTand*_END. Note that this block has been deprecated. Binned data is now uploaded as a separate text file and referenced with the “NM:NMR_RESULTS_FILE” tag in the NM metadata block.
#NMR_BINNED_DATA
NMR_BINNED_DATA_START
Bin range(ppm) CDC029 CDC030 CDC032 CPL101 CPL102 CPL103 CPL201 CPL202 CPL203 CDS039 CDS052 CDS054
0.50...0.56 0.00058149 1.6592 0.039301 0 0 0 0.034018 0.0028746 0.0021478 0.013387 0 0
0.56...0.58 0 0.74267 0 0.007206 0 0 0 0 0 0 0 0.0069721
0.58...0.60 0.051165 0.8258 0.089149 0.060972 0.026307 0.045697 0.069541 0 0 0.14516 0.057489 0.042255
...
NMR_BINNED_DATA_END
Order of metadata and data blocks (MS)
#METABOLOMICS WORKBENCH
VERSION 1
CREATED_ON 2016-09-17
...
#PROJECT
...
#STUDY
...
#SUBJECT
...
#SUBJECT_SAMPLE_FACTORS: SUBJECT(optional)[tab]SAMPLE[tab]FACTORS(NAME:VALUE pairs separated by |)[tab]Additional sample data
...
#COLLECTION
...
#TREATMENT
...
#SAMPLEPREP
...
#CHROMATOGRAPHY
...
#ANALYSIS
...
#MS
...
#MS_METABOLITE_DATA
MS_METABOLITE_DATA:UNITS peak area
MS_METABOLITE_DATA_START
...
MS_METABOLITE_DATA_END
#METABOLITES
METABOLITES_START
...
METABOLITES_END
#END
Using mwtab as a Library#
Importing the mwtab Package#
If the mwtab package is installed on the system, it can be imported:
[1]:
import mwtab
Constructing a MWTabFile Generator#
The fileio module provides the read_files()
generator function that yields MWTabFile instances. Constructing a
MWTabFile generator is easy - specify the path to a local mwTab file,
directory of files, or archive of files:
[2]:
import mwtab
mwfile_gen = mwtab.read_files("ST000017_AN000035.txt") # single mwTab file
mwfiles_gen = mwtab.read_files("ST000017_AN000035.txt", "ST000040_AN000060.json") # several mwTab files
mwdir_gen = mwtab.read_files("mwfiles_dir_mwtab") # directory of mwTab files
mwzip_gen = mwtab.read_files("mwfiles_mwtab.zip") # archive of mwTab files
mwanalysis_gen = mwtab.read_files("35", "60") # ANALYSIS_ID of mwTab files
# REST callable url of mwTab file
mwurl_gen = mwtab.read_files("https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000035/mwtab/txt")
Processing a MWTabFile Generator#
The MWTabFile generator can be processed in several ways:
Feed it to a for-loop and process one file at a time:
[3]:
for mwfile in mwtab.read_files("35", "60"):
print("STUDY_ID:", mwfile.study_id) # print STUDY_ID
print("ANALYSIS_ID", mwfile.analysis_id) # print ANALYSIS_ID
print("SOURCE", mwfile.source) # print source
for block_name in mwfile: # print names of blocks
print("\t", block_name)
STUDY_ID: ST000017
ANALYSIS_ID AN000035
SOURCE https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000035/mwtab/txt
METABOLOMICS WORKBENCH
PROJECT
STUDY
SUBJECT
SUBJECT_SAMPLE_FACTORS
COLLECTION
TREATMENT
SAMPLEPREP
CHROMATOGRAPHY
ANALYSIS
MS
MS_METABOLITE_DATA
STUDY_ID: ST000040
ANALYSIS_ID AN000060
SOURCE https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000060/mwtab/txt
METABOLOMICS WORKBENCH
PROJECT
STUDY
SUBJECT
SUBJECT_SAMPLE_FACTORS
COLLECTION
TREATMENT
SAMPLEPREP
CHROMATOGRAPHY
ANALYSIS
MS
MS_METABOLITE_DATA
Note
Once the generator is consumed, it becomes empty and needs to be created again.
[4]:
mwfiles_generator = mwtab.read_files("35", "60")
mwfile1 = next(mwfiles_generator)
mwfile2 = next(mwfiles_generator)
Note
Once the generator is consumed, the StopIteration Exception
will be raised on the subsequent next call.
[5]:
mwfiles_generator = mwtab.read_files("35", "60")
mwfiles_list = list(mwfiles_generator)
Accessing Data From a Single MWTabFile#
Since a MWTabFile is a Python dict,
data can be accessed and manipulated as with any regular Python dict object
using bracket accessors.
Accessing the top-level “keys” in a
MWTabFile:
[7]:
mwfile = next(mwtab.read_files("ST000017_AN000035.txt"))
# list MWTabFile-level keys, i.e. saveframe names
list(mwfile.keys())
[7]:
['METABOLOMICS WORKBENCH',
'PROJECT',
'STUDY',
'SUBJECT',
'SUBJECT_SAMPLE_FACTORS',
'COLLECTION',
'TREATMENT',
'SAMPLEPREP',
'CHROMATOGRAPHY',
'ANALYSIS',
'MS',
'MS_METABOLITE_DATA']
Accessing individual blocks in
MWTabFile:
[8]:
# access "PROJECT" block
mwfile["PROJECT"]
[8]:
{'PROJECT_TITLE': 'Rat Stamina Studies',
'PROJECT_TYPE': 'Feeding',
'PROJECT_SUMMARY': 'Stamina in rats',
'INSTITUTE': 'University of Michigan',
'DEPARTMENT': 'Internal Medicine',
'LABORATORY': 'Burant Lab',
'LAST_NAME': 'Beecher',
'FIRST_NAME': 'Chris',
'ADDRESS': '-',
'EMAIL': 'chrisbee@med.umich.edu',
'PHONE': '734-232-0815',
'FUNDING_SOURCE': 'NIH: R01 DK077200'}
Accessing individual “key-value” pairs within blocks:
[9]:
# access "INSTITUTE" field within "PROJECT" block
mwfile["PROJECT"]["INSTITUTE"]
[9]:
'University of Michigan'
Accessing data in the
#SUBJECT_SAMPLE_FACTORSblock:
[10]:
# access "SUBJECT_SAMPLE_FACTORS" block and print first three
mwfile["SUBJECT_SAMPLE_FACTORS"][:3]
[10]:
[{'Subject ID': '-',
'Sample ID': 'S00009477',
'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'}},
{'Subject ID': '-',
'Sample ID': 'S00009478',
'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'}},
{'Subject ID': '-',
'Sample ID': 'S00009479',
'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'}}]
[11]:
# access individual factors (by index)
mwfile["SUBJECT_SAMPLE_FACTORS"][0]
[11]:
{'Subject ID': '-',
'Sample ID': 'S00009477',
'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'}}
[12]:
# access individual fields within factors
mwfile["SUBJECT_SAMPLE_FACTORS"][0]["Sample ID"]
[12]:
'S00009477'
Accessing data in the
#MS_METABOLITE_DATAblock:
[13]:
# access data block keys
list(mwfile["MS_METABOLITE_DATA"].keys())
[13]:
['Units', 'Data', 'Metabolites']
[14]:
# access units field
mwfile["MS_METABOLITE_DATA"]["Units"]
[14]:
'peak area'
[15]:
# access samples field (by index)
mwfile["MS_METABOLITE_DATA"]["Data"][0].keys()
[15]:
['Metabolite', 'S00009477', 'S00009478', 'S00009479', 'S00009480', 'S00009481', 'S00009500', 'S00009501', 'S00009502', 'S00009503', 'S00009470', 'S00009471', 'S00009472', 'S00009473', 'S00009474', 'S00009475', 'S00009494', 'S00009495', 'S00009496', 'S00009497', 'S00009498', 'S00009499', 'S00009488', 'S00009489', 'S00009490', 'S00009491', 'S00009492', 'S00009493', 'S00009509', 'S00009510', 'S00009511', 'S00009512', 'S00009513', 'S00009514', 'S00009482', 'S00009483', 'S00009484', 'S00009486', 'S00009504', 'S00009505', 'S00009506', 'S00009507', 'S00009508']
[16]:
# access metabolite data and print first three
mwfile["MS_METABOLITE_DATA"]["Metabolites"][:3]
[16]:
[{'Metabolite': '11BETA,21-DIHYDROXY-5BETA-PREGNANE-3,20-DIONE',
'moverz_quant': '',
'ri': '',
'ri_type': '',
'pubchem_id': '44263339',
'inchi_key': '',
'kegg_id': 'C05475',
'other_id': '775216_UNIQUE',
'other_id_type': 'UM_Target_ID'},
{'Metabolite': '11-BETA-HYDROXYANDROST-4-ENE-3,17-DIONE',
'moverz_quant': '',
'ri': '',
'ri_type': '',
'pubchem_id': '94141',
'inchi_key': '',
'kegg_id': 'C05284',
'other_id': '771312_PRIMARY',
'other_id_type': 'UM_Target_ID'},
{'Metabolite': '13(S)-HPODE',
'moverz_quant': '',
'ri': '',
'ri_type': '',
'pubchem_id': '1426',
'inchi_key': '',
'kegg_id': 'C04717',
'other_id': '775541_UNIQUE',
'other_id_type': 'UM_Target_ID'}]
Manipulating Data From a Single MWTabFile#
In order to change values within the MWTabFile, descend into
the appropriate level using square bracket accessors and set a new value.
Change regular “key-value” pairs:
[17]:
# access phone number information
mwfile["PROJECT"]["PHONE"]
[17]:
'734-232-0815'
[18]:
# change phone number information
mwfile["PROJECT"]["PHONE"] = "1-530-754-8258"
[19]:
# check that it has been modified
mwfile["PROJECT"]["PHONE"]
[19]:
'1-530-754-8258'
Change
#SUBJECT_SAMPLE_FACTORSvalues:
[20]:
# access the first subject sample factor by index
mwfile["SUBJECT_SAMPLE_FACTORS"][0]
[20]:
{'Subject ID': '-',
'Sample ID': 'S00009477',
'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'}}
[21]:
# provide additional details to the first subject sample factor
mwfile["SUBJECT_SAMPLE_FACTORS"][0]["Additional sample data"] = {"Additional detail key": "Additional detail value"}
[22]:
# check that it has been modified
mwfile["SUBJECT_SAMPLE_FACTORS"][0]
[22]:
{'Subject ID': '-',
'Sample ID': 'S00009477',
'Factors': {'Feeeding': 'Ad lib', 'Running Capacity': 'High'},
'Additional sample data': {'Additional detail key': 'Additional detail value'}}
Printing an MWTabFile and its Components#
MWTabFile objects provide the print_file() method which can be used to output the file in either mwTab or JSON format. The method takes a file_format keyword argument which specifices the output format to be displayed.
The MWTabFile can be printed to output in mwTab format in its entirety using:
mwfile.print_file(file_format=”mwtab”)
Print the first 20 lines in
mwTabformat.
[23]:
from io import StringIO
mwtab_file_str = StringIO()
mwfile.print_file(file_format="mwtab", f=mwtab_file_str)
# print out first 20 lines
print("\n".join(mwtab_file_str.getvalue().split("\n")[:20]))
#METABOLOMICS WORKBENCH STUDY_ID:ST000017 ANALYSIS_ID:AN000035 PROJECT_ID:PR000016
VERSION 1
CREATED_ON 2016-09-17
#PROJECT
PR:PROJECT_TITLE Rat Stamina Studies
PR:PROJECT_TYPE Feeding
PR:PROJECT_SUMMARY Stamina in rats
PR:INSTITUTE University of Michigan
PR:DEPARTMENT Internal Medicine
PR:LABORATORY Burant Lab
PR:LAST_NAME Beecher
PR:FIRST_NAME Chris
PR:ADDRESS -
PR:EMAIL chrisbee@med.umich.edu
PR:PHONE 1-530-754-8258
PR:FUNDING_SOURCE NIH: R01 DK077200
#STUDY
ST:STUDY_TITLE Rat HCR/LCR Stamina Study
ST:STUDY_TYPE LC-MS analysis
ST:STUDY_SUMMARY To determine the basis of running capacity and health differences in outbread
The MWTabFile can be printed to output in JSON format in its entirety using:
mwfile.print_file(file_format=”json”)
Print the first 20 lines in
JSONformat.
[24]:
from io import StringIO
mwtab_file_str = StringIO()
mwfile.print_file(file_format="json", f=mwtab_file_str)
# print out first 20 lines
print("\n".join(mwtab_file_str.getvalue().split("\n")[:20]))
{
"METABOLOMICS WORKBENCH": {
"STUDY_ID": "ST000017",
"ANALYSIS_ID": "AN000035",
"PROJECT_ID": "PR000016",
"VERSION": "1",
"CREATED_ON": "2016-09-17"
},
"PROJECT": {
"PROJECT_TITLE": "Rat Stamina Studies",
"PROJECT_TYPE": "Feeding",
"PROJECT_SUMMARY": "Stamina in rats",
"INSTITUTE": "University of Michigan",
"DEPARTMENT": "Internal Medicine",
"LABORATORY": "Burant Lab",
"LAST_NAME": "Beecher",
"FIRST_NAME": "Chris",
"ADDRESS": "-",
"EMAIL": "chrisbee@med.umich.edu",
"PHONE": "1-530-754-8258",
Print a single block in
mwTabformat.
[25]:
mwfile.print_block("STUDY", file_format="mwtab")
ST:STUDY_TITLE Rat HCR/LCR Stamina Study
ST:STUDY_TYPE LC-MS analysis
ST:STUDY_SUMMARY To determine the basis of running capacity and health differences in outbread
ST:STUDY_SUMMARY N/NIH rats selected for high capacity (HCR) and low capacity (LCR) running (a for
ST:STUDY_SUMMARY VO2max) (see:Science. 2005 Jan 21;307(5708):418-20). Plasma collected at 12 of
ST:STUDY_SUMMARY age in generation 28 rats after ad lib feeding or 40% caloric restriction at week
ST:STUDY_SUMMARY 8 of age. All animals fasted 4 hours prior to collection between 5-8
ST:INSTITUTE University of Michigan
ST:DEPARTMENT Internal Medicine
ST:LABORATORY Burant Lab (MMOC)
ST:LAST_NAME Qi
ST:FIRST_NAME Nathan
ST:ADDRESS -
ST:EMAIL nathanqi@med.umich.edu
ST:PHONE 734-232-0815
ST:NUM_GROUPS 2
ST:TOTAL_SUBJECTS 42
Print a single block in
JSONformat.
[26]:
mwfile.print_block("STUDY", file_format="json")
{
"STUDY_TITLE": "Rat HCR/LCR Stamina Study",
"STUDY_TYPE": "LC-MS analysis",
"STUDY_SUMMARY": "To determine the basis of running capacity and health differences in outbread N/NIH rats selected for high capacity (HCR) and low capacity (LCR) running (a for VO2max) (see:Science. 2005 Jan 21;307(5708):418-20). Plasma collected at 12 of age in generation 28 rats after ad lib feeding or 40% caloric restriction at week 8 of age. All animals fasted 4 hours prior to collection between 5-8",
"INSTITUTE": "University of Michigan",
"DEPARTMENT": "Internal Medicine",
"LABORATORY": "Burant Lab (MMOC)",
"LAST_NAME": "Qi",
"FIRST_NAME": "Nathan",
"ADDRESS": "-",
"EMAIL": "nathanqi@med.umich.edu",
"PHONE": "734-232-0815",
"NUM_GROUPS": "2",
"TOTAL_SUBJECTS": "42"
}
Writing Data From a MWTabFile Object Into a File#
Data from a MWTabFile can be written into file
in original mwTab format or in equivalent JSON format using
write():
Writing into a
mwTabformatted file:
[27]:
with open("out/ST000017_AN000035_modified.txt", "w") as outfile:
mwfile.write(outfile, file_format="mwtab")
Writing into a
JSONfile:
[28]:
with open("out/ST000017_AN000035_modified.json", "w") as outfile:
mwfile.write(outfile, file_format="json")
Extracting Metadata and Metabolites from mwTab Files#
The mwtab.mwextract module can be used to extract metadata from mwTab
files. The module contains two main methods: 1)
extract_metadata() which can be used to parse metadata
values from a mwTab file, and 2)
extract_metabolites() which can be used to gather a
list of metabolites and samples containing the found metabolites from multiple
mwTab files which contain a given metadata key value pair.
Extracting Metadata Values#
Extracting metadata values from a given
mwTabfile:
[29]:
from mwtab.mwextract import extract_metadata
extract_metadata(mwfile, ["STUDY_TYPE", "SUBJECT_TYPE"])
[29]:
{'STUDY_TYPE': {'LC-MS analysis'}, 'SUBJECT_TYPE': {'Animal'}}
Extracting Metabolites Values#
Extracting metabolite information from multiple
mwTabfiles and outputting the first three metabolites:
[30]:
from mwtab.mwextract import extract_metabolites, generate_matchers
from mwtab import read_files
mwtab_gen = read_files(
"ST000017_AN000035.txt",
"ST000040_AN000060.txt"
)
matchers = generate_matchers([
("ST:STUDY_TYPE",
"LC-MS analysis")
])
list(extract_metabolites(mwtab_gen, matchers).keys())[:3]
[30]:
['11BETA_21-DIHYDROXY-5BETA-PREGNANE-3_20-DIONE',
'11-BETA-HYDROXYANDROST-4-ENE-3_17-DIONE',
'13(S)-HPODE']
Extracting metabolite information from multiple
mwTabfiles using regular expressions and outputting the first three metabolites:
[31]:
from mwtab.mwextract import extract_metabolites, generate_matchers
from mwtab import read_files
from re import compile
mwtab_gen = read_files(
"ST000017_AN000035.txt",
"ST000040_AN000060.txt"
)
matchers = generate_matchers([
("ST:STUDY_TYPE",
compile("(LC-MS)"))
])
list(extract_metabolites(mwtab_gen, matchers).keys())[:3]
[31]:
['11BETA_21-DIHYDROXY-5BETA-PREGNANE-3_20-DIONE',
'11-BETA-HYDROXYANDROST-4-ENE-3_17-DIONE',
'13(S)-HPODE']
Converting mwTab Files#
mwTab files can be converted between the mwTab file format and their JSON
representation using the mwtab.converter module.
One-to-one file conversions#
Converting from the
mwTabfile format into its equivalentJSONfile format:
[32]:
from mwtab.converter import Converter
# Using valid ANALYSIS_ID to access file from URL: from_path="1"
converter = Converter(from_path="35", to_path="out/ST000017_AN000035.json",
from_format="mwtab", to_format="json")
converter.convert()
Converting from JSON file format back to
mwTabfile format:
[33]:
from mwtab.converter import Converter
converter = Converter(from_path="out/ST000017_AN000035.json", to_path="out/ST000017_AN000035.txt",
from_format="json", to_format="mwtab")
converter.convert()
Many-to-many files conversions#
Converting from a directory of
mwTabformatted files into their equivalentJSONformatted files:
[34]:
from mwtab.converter import Converter
converter = Converter(from_path="mwfiles_dir_mwtab",
to_path="out/mwfiles_dir_json",
from_format="mwtab",
to_format="json")
converter.convert()
Converting from a directory of
JSONformatted files into their equivalentmwTabformatted files:
[35]:
from mwtab.converter import Converter
converter = Converter(from_path="out/mwfiles_dir_json",
to_path="out/mwfiles_dir_mwtab",
from_format="json",
to_format="mwtab")
converter.convert()
Note
Many-to-many files and one-to-one file conversions are available.
See mwtab.converter for full list of available conversions.
Command-Line Interface#
- The mwtab Command-Line Interface provides the following functionality:
Convert from the
mwTabfile format into its equivalentJSONfile format and vice versa.Download files through Metabolomics Workbench’s REST API.
Validate the
mwTabformatted file.Extract metadata and metabolite information from downloaded files.
[36]:
! mwtab --help
The mwtab command-line interface
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Usage:
mwtab -h | --help
mwtab --version
mwtab convert (<from-path> <to-path>) [--from-format=<format>] [--to-format=<format>] [--mw-rest=<url>] [--verbose]
mwtab validate <from-path> [--mw-rest=<url>]
mwtab download url <url> [--to-path=<path>] [--verbose]
mwtab download study all [--to-path=<path>] [--input-item=<item>] [--output-format=<format>] [--mw-rest=<url>] [--verbose]
mwtab download study <input-value> [--to-path=<path>] [--input-item=<item>] [--output-item=<item>] [--output-format=<format>] [--mw-rest=<url>] [--verbose]
mwtab download (study | compound | refmet | gene | protein) <input-item> <input-value> <output-item> [--output-format=<format>] [--to-path=<path>] [--mw-rest=<url>] [--verbose]
mwtab download moverz <input-item> <m/z-value> <ion-type-value> <m/z-tolerance-value> [--to-path=<path>] [--mw-rest=<url>] [--verbose]
mwtab download exactmass <LIPID-abbreviation> <ion-type-value> [--to-path=<path>] [--mw-rest=<url>] [--verbose]
mwtab extract metadata <from-path> <to-path> <key> ... [--to-format=<format>] [--no-header]
mwtab extract metabolites <from-path> <to-path> (<key> <value>) ... [--to-format=<format>] [--no-header]
Options:
-h, --help Show this screen.
--version Show version.
--verbose Print what files are processing.
--from-format=<format> Input file format, available formats: mwtab, json [default: mwtab].
--to-format=<format> Output file format [default: json].
Available formats for convert:
mwtab, json.
Available formats for extract:
json, csv.
--mw-rest=<url> URL to MW REST interface
[default: https://www.metabolomicsworkbench.org/rest/].
--to-path=<path> Directory to save outputs into. Defaults to the current working directory.
--prefix=<prefix> Prefix to add at the beginning of the output file name. Defaults to no prefix.
--suffix=<suffix> Suffix to add at the end of the output file name. Defaults to no suffix.
--context=<context> Type of resource to access from MW REST interface, available contexts: study,
compound, refmet, gene, protein, moverz, exactmass [default: study].
--input-item=<item> Item to search Metabolomics Workbench with.
--output-item=<item> Item to be retrieved from Metabolomics Workbench.
--output-format=<format> Format for item to be retrieved in, available formats: mwtab, json.
--no-header Include header at the top of csv formatted files.
For extraction <to-path> can take a "-" which will use stdout.
All <from-path>'s can be single files, directories, or URLs.
Documentation webpage: https://moseleybioinformaticslab.github.io/mwtab/
GitHub webpage: https://github.com/MoseleyBioinformaticsLab/mwtab
Converting mwTab Files in Bulk#
CLI one-to-one file conversions#
Convert from a local file in
mwTabformat to a local file inJSONformat:
[37]:
! mwtab convert ST000017_AN000035.txt out/ST000017_AN000035.json \
--from-format=mwtab --to-format=json
Convert from a local file in
JSONformat to a local file inmwTabformat:
[38]:
! mwtab convert ST000017_AN000035.json out/ST000017_AN000035.txt \
--from-format=json --to-format=mwtab
Convert from a compressed local file in
mwTabformat to a compressed local file inJSONformat:
[39]:
! mwtab convert ST000017_AN000035.txt.gz out/ST000017_AN000035.json.gz \
--from-format=mwtab --to-format=json
Convert from a compressed local file in
JSONformat to a compressed local file inmwTabformat:
[40]:
! mwtab convert ST000017_AN000035.json.gz out/ST000017_AN000035.txt.gz \
--from-format=json --to-format=mwtab
Convert from an uncompressed URL file in
mwTabformat to a compressed local file inJSONformat:
[41]:
! mwtab convert 35 out/ST000017_AN000035.json.bz2 \
--from-format=mwtab --to-format=json
Note
See mwtab.converter for full list of available conversions.
CLI Many-to-many files conversions#
Convert from a directory of files in
mwTabformat to a directory of files inJSONformat:
[42]:
! mwtab convert mwfiles_dir_mwtab out/mwfiles_dir_json \
--from-format=mwtab --to-format=json
Convert from a directory of files in
JSONformat to a directory of files inmwTabformat:
[43]:
! mwtab convert mwfiles_dir_json out/mwfiles_dir_mwtab \
--from-format=json --to-format=mwtab
Convert from a directory of files in
mwTabformat to a zip archive of files inJSONformat:
[44]:
! mwtab convert mwfiles_dir_mwtab out/mwfiles_json.zip \
--from-format=mwtab --to-format=json
Convert from a compressed tar archive of files in
JSONformat to a directory of files inmwTabformat:
[45]:
! mwtab convert mwfiles_json.tar.gz out/mwfiles_dir_mwtab \
--from-format=json --to-format=mwtab
Convert from a zip archive of files in
mwTabformat to a compressed tar archive of files inJSONformat:
[46]:
! mwtab convert mwfiles_mwtab.zip out/mwfiles_json.tar.bz2 \
--from-format=mwtab --to-format=json
Note
See mwtab.converter for full list of available conversions.
Download Files Through Metabolomics Workbench’s REST API#
The mwtab package provides the mwtab.mwrest module, which contains a number of functions and classes for working with Metabolomics Workbench’s REST API.
Note
For the full official REST API specification see the following link (MW REST API (v1.0, 5/7/2019)):
https://www.metabolomicsworkbench.org/tools/MWRestAPIv1.0.pdf
Note
We do not recommend downloading the JSON version of mwTab files directly from Metabolomics Workbench. There are known errors in some of the JSON files they provide. We recommend downloading in mwTab format and then using this package to convert them to JSON.
Download by URL#
To download a file based on a given URL, simply call the
download urlcommand with the desired URL and provide an output path:
[47]:
! mwtab download url "https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000035/mwtab/txt" --to-path=out/ST000017_AN000035.txt
To download a single analysis
mwTabfile, simply calldownload studyand specify the analysis ID:
[48]:
! mwtab download study AN000035 --to-path=out/ST000017_AN000035.txt
To download an entire study
mwTabfile, simply calldownload studyand specify the study ID:
[49]:
! mwtab download study ST000017 --to-path=out/ST000017_AN000035.txt
To download a list of studies and/or analyses, simply call
download studyand provide a list of IDs in a JSON file:
[49]:
! mwtab download study path_to_json.json --to-path=out/
Download study, compound, refmet, gene, and protein Files#
To download study, compound, refmet, gene, and protein context files, call the
downloadcommand and specify the context, input item, input value, and output item (optionally specify the output format).Download a study:
[50]:
! mwtab download study analysis_id AN000035 mwtab --output-format=txt --to-path=out/ST000017_AN000035.txt
Download compound:
[51]:
! mwtab download compound regno 11 name --to-path=out/tmp.txt
Download refmet:
[52]:
! mwtab download refmet name Cholesterol all --to-path=out/tmp.txt
Download gene:
[53]:
! mwtab download gene gene_symbol acaca all --to-path=out/tmp.txt
Download protein:
[54]:
! mwtab download protein uniprot_id Q13085 all --to-path=out/tmp.txt
Download All mwTab Formatted Files#
The mwTab package provides a number of command line functions for downloading Metabolomics mwtab formatted files through the Workbench’s REST API.
To download all available analysis files, simply call the
download study allcommand:
! mwtab download study all
It is also possible to download all study files by calling the
download study allcommand and providing an input item and output path:
! mwtab download study all –input-item=study_id
Download moverz and exactmass#
To download moverz files, call the
download moverzcommand and specify the input value (LIPIDS, MB, or REFMET), m/z value, ion type value, and m/z tolerance value.
[55]:
! mwtab download moverz MB 635.52 M+H 0.5 --to-path=out/tmp.txt
To download exactmass files, call the
download exactmasscommand and specify the LIPID abbreviation and ion type value.
[56]:
! mwtab download exactmass "PC(34:1)" M+H --to-path=out/tmp.txt
Note
It is not necessary to specify an output format for exactmass files.
Extracting Metabolite Data and Metadata From mwTab Files#
The mwtab package provides the extract_metabolites() and extract_metadata() functions that can parse mwTab formatted files. The extract_metabolites() takes a source (list of mwTab file) and list of metadata key-value pairs that are used to search for mwTab files which contain the given metadata pairs. The extract_metadata() takes a source (list of mwTab file) and list of metadata keys which are used to search the mwTab files for possible values to the given keys.
To extract metabolites from
mwTabfiles in a directory, call theextract metabolitescommand and provide a list of metadata key value pairs along with an output path and output format:
[57]:
! mwtab extract metabolites mwfiles_dir_mwtab out/output_file.csv SU:SUBJECT_TYPE Plant --to-format=csv
Note
It is possible to use ReGeXs to match the metadata value (eg. … SU:SUBJECT_TYPE “r’(Plant)’”).
To extract metadata from
mwTabfiles in a directory call theextract metadatacommand and provide a list of metadata keys along with an output path and output format:
[58]:
! mwtab extract metadata mwfiles_dir_json out/output_file.json SUBJECT_TYPE --to-format=json
Validating mwTab Files#
The mwtab package provides the validate_file() function
that can validate files based on a JSON schema definition. The mwtab.mwschema
contains schema definitions for every block of mwTab formatted file, i.e.
it lists the types of attributes (e.g. str as well as specifies which keys are
optional and which are required).
To validate file(s), simply call the
validatecommand and provide a path to file(s):
[59]:
! mwtab validate 35
Using the mwtab Package to Find Analyses Involving a Specific Disease or Condition#
The Metabolomics Workbench data repository stores mass spectroscopy and nuclear magnetic resonance experimental data and metadata in mwTab formatted files. Metabolomics Workbench also provides a number of tools for searching or analyzing mwTab files. The mwtab Python package can also be used to perform similar functions through both a programmatic API and command-line interface, which has more search flexibility.
- In order to search the repository of
mwTabfiles for analyses associated with a specific disease, Metabolomics Workbench provides a web-based interface:
The mwtab Python package can be used in a number of ways to similar effect. The package provides the extract_metabolites() method to extract and organize metabolites from multiple mwTab files through both Python scripts and a command-line interface. This method has more search flexibility, since it can take either a search string or a regular expression.
Using the mwtab Package API to Extract Study IDs, Analysis IDs, and Metabolites#
The extract_metabolites() method takes two parameters: 1) an iterable of MWTabFile instances and 2) an iterable of ItemMatcher or ReGeXMatcher instances. The iterable of MWTabFile instances can be created by passing mwTab file sources (filenames, analysis IDs, etc.) to the read_files() method. The iterable of matcher instances can be created using the generate_matchers() method.
An example of using the mwtab package API to extract data from analyses associated with diabetes and output the first three metabolites:
[60]:
from mwtab.mwextract import extract_metabolites, generate_matchers
from mwtab import read_files
import re
mwtab_gen = read_files("diabetes/")
matchers = generate_matchers([
("ST:STUDY_SUMMARY",
re.compile("(diabetes)"))
])
list(extract_metabolites(mwtab_gen, matchers).keys())[:3]
[60]:
['1_5-anhydroglucitol', '1-monopalmitin', '1-monostearin']
Using the mwtab CLI to Extract Study IDs, Analysis IDs, and Metabolites#
The mwtab command line interface includes a mwtab extract metabolites method which takes a directory of mwTab files, an output path to save the extracted data in, and a series of mwTab section item keys and values to be matched (either string values or regular expressions). Additionally an output format can be specified.
mwtab extract metabolites <from-path> <to-path> (<key> <value>) … [–to-format=<format>] [–no-header]
An example of using the mwtab CLI to extract data from analyses associated with diabetes:
[61]:
! mwtab extract metabolites diabetes/ out/output_file.json ST:STUDY_SUMMARY "r'(?i)(diabetes)'" --to-format=json