CLI

Extract

The extract command of MESSES supports turning tabular data into JSON. This is done by adding a layer of tags on top of the data. These tags tell the extract command how to construct the JSON tables and records. There are features for automatically applying tags to untagged data and features for modifying the names and values of data. The extract command can also be used to modify already JSONized data. There is also support for viewing the data in different ways, such as viewing the record lineages. For a more detailed explanation of the options with examples as to how they might be used, see the Tutorial page.

Usage

 Usage:
    messes extract <metadata_source>... [--delete <metadata_section>...] [options]
    messes extract --help

    <metadata_source> - tagged input metadata source as csv/json filename or 
                        xlsx_filename[:worksheet_name|regular_expression] or 
                        google_sheets_url[:worksheet_name|regular_expression]
                        "#export" worksheet name is the default.

 Options:
    -h, --help                          - show this help documentation.
    -v, --version                       - show the version.
    --silent                            - print no warning messages.
    --output <filename_json>            - output json filename.
    --compare <filename_json>           - compare extracted metadata to given JSONized metadata.
    --modify <source>                   - modification directives worksheet name, regular expression, csv/json filename, or 
                                          xlsx_filename:[worksheet_name|regular_expression] or 
                                          google_sheets_url[:worksheet_name|regular_expression] 
                                          [default: #modify].
    --end-modify <source>               - apply modification directives after all metadata merging. Requires csv/json filename or 
                                          xlsx_filename:[worksheet_name|regular_expression] or 
                                          google_sheets_url[:worksheet_name|regular_expression].
    --automate <source>                 - automation directives worksheet name, regular expression, csv/json filename, or 
                                          xlsx_filename:[worksheet_name|regular_expression] or 
                                          google_sheets_url[:worksheet_name|regular_expression] 
                                          [default: #automate].
    --save-directives <filename_json>   - output filename with modification and automation directives in JSON format.
    --save-export <filetype>            - output export worksheet with suffix "_export" and with the indicated xlsx/csv format extension.
    --show <show_option>                - show a part of the metadata. See options below.
    --delete <metadata_section>...      - delete a section of the JSONized metadata. Section format is tableKey or tableKey,IDKey or tableKey,IDKey,fieldName. These can be regular expressions.
    --keep <metadata_tables>            - only keep the selected tables.  Delete the rest.  Table format is tableKey,tableKey,... The tableKey can be a regular expression.
    --file-cleaning <remove_regex>      - a string or regular expression to remove characters in input files, removes unicode and \r characters by default, enter "None" to disable [default: _x([0-9a-fA-F]{4})_|\r].

Show Options:
  tables    - show tables in the extracted metadata.
  lineage   - show parent-child lineages per table.
  all       - show every option.

Regular Expression Format:
  Regular expressions have the form "r'...'" on the command line.
  The re.match function is used, which matches from the beginning of a string, meaning that a regular expression matches as if it starts with a "^".

 Directives JSON Format:
   {
    "modification" : { table : { field :  { "(exact|regex|levenshtein)\-(first|first\-nowarn|unique|all)" :
                      { field_value : { "assign" : { field : value,... }, "append" : { field : value,... }, "prepend" : { field : value,... },
                                        "regex" : { field : regex_pair,... }, "delete" : [ field,... ], "rename" : { old_field : new_field } } } } } }
    "automation" : [ { "header_tag_descriptions" : [ { "header" : column_description, "tag" : tag_description, "required" : true|false } ],   "exclusion_test" : exclusion_value, "insert" : [ [ cell_content, ... ] ] } ]
   }

Validate

The validate command of MESSES supports validating JSON data. This is done largely through utilizing JSON Schema (jsonschema), but validation beyond the capabilities of JSON Schema (jsonschema) is also done. By default, JSON files are validated against the Experiment Description Specification, but this can be turned off with options. Users can also provide additional validation in the form of their own JSON schema and a Protocol-Dependent Schema. More details and examples for the validate command are in the Tutorial.

Usage

Usage:
    messes validate json <input_JSON> [--pds=<pds> [--csv | --xlsx | --json | --gs] | --no_base_schema] 
                                      [--no_extra_checks]
                                      [--additional=<add_schema>...] 
                                      [--format=<format>]
                                      [--silent=<level>]
    messes validate save-schema <output_schema> [--input=<input_JSON>] 
                                                [--pds=<pds> [--csv | --xlsx | --json | --gs]] 
                                                [--format=<format>]
                                                [--silent=<level>]
    messes validate schema <input_schema>
    messes validate pds <pds> [--csv | --xlsx | --json | --gs] [--silent=<level>] [--save=<output_name>]
    messes validate pds-to-table <pds_json> <output_name> [<output_filetype>]
    messes validate pds-to-json <pds_tabular> [--csv | --xlsx | --gs] <output_name>
    messes validate cd-to-json-schema <conversion_directives> [--csv | --xlsx | --json | --gs] <output_schema>
    messes validate --help
    
    <input_JSON> - if '-' read from standard input.
    <pds> - can be a JSON, csv, xlsx, or Google Sheets file. If xlsx or Google Sheets, the default sheet name to read in is #validate, 
           to specify a different sheet name separate it from the file name with a colon ex: file_name.xlsx:sheet_name.
           If '-' read from standard input.
    <input_schema> - must be a valid JSON Schema file. If '-' read from standard input.
    <output_schema> - if '-' save to standard output.
    <output_name> - path to save tabular pds to, if '-' save to standard output as CSV.
    <output_filetype> - "xlsx" or "csv", defaults to "csv".
    <conversion_directives> - can be a JSON, csv, xlsx, or Google Sheets file. If xlsx or Google Sheets, 
                              the default sheet name to read in is #convert, to specify a different 
                              sheet name separate it from the file name with a colon ex: file_name.xlsx:sheet_name.
                              If '-' read from standard input.

Options:
    -h, --help                           - show this screen.
    -v, --version                        - show version.
    --silent <level>                     - if "full" silence all warnings, 
                                           if "nuisance" silence warnings that are more likely to be a nuisance,
                                           if "none" do not silence warnings [default: none].
    --pds <pds>                            - a protocol-dependent schema file, can be a JSON, csv, or xlsx file. 
                                           If xlsx the default sheet name to read in is #validate, to specify 
                                           a different sheet name separate it from the file name with a colon 
                                           ex: file_name.xlsx:sheet_name.
    --csv                                - indicates that the protocol-dependent schema file is a csv (comma delimited) file.
    --xlsx                               - indicates that the protocol-dependent schema file is an xlsx (Excel) file.
    --json                               - indicates that the protocol-dependent schema file is a JSON file.
    --gs                                 - indicates that the protocol-dependent schema file is a Google Sheets file.
                                           If a file type is not given then it will be guessed from the file extension.
    --additional <add_schema>            - an additional JSON Schema file that will be used to validate <input_JSON>.
    --format <format>                    - additional validation done for the desired supported format. 
                                           Current supported formats: 
                                               mwtab
    --no_base_schema                     - don't validate with the base JSON schema.
    --no_extra_checks                    - only do JSON Schema validation and nothing else.
    --input <input_JSON>                 - optionally give an input JSON file to save-schema to reproduce the 
                                           schema used to validate in the json command.
    --save <output_name>                 - save the JSON Schema created from the protocol-dependent schema.
    

The "json" command will validate the <input_JSON> against the internal base_schema, and optional schema provided 
by the --pds and --additional options. To validate only against a provided schema, use the --additional and --no_base_schema options.

The "save-schema" command will save the internal base_schema to the <output_schema> location. If --pds is given 
then it will be parsed and placed into the base_schema. If --input is given, the protocols table will be added 
in with the PDS to reproduce what happens in the json command. If --format is used, then that format schema is 
saved instead of the base_schema.

The "schema" command will validate the <input_schema> against the JSON Schema meta schema.

The "pds" command will validate that the <pds> file is a valid protocol-dependent schema file. 
If the --save option is given, then save the built JSON Schema.

The "pds-to-table" command will read in a protocol-dependent schema in JSON form and save it out in a tabular form.

The "pds-to-json" command will read in a protocol-dependent schema in tabular form and save it out in a JSON form. 

The "cd-to-json-schema" command will read in conversion directives and create a JSON Schema 
template file that can be filled in and used to validate files that will be converted using those directives.

Convert

The convert command of MESSES supports converting JSON data to another JSON format or another supported format. This is done by using conversion directives, which are detailed in the Conversion Directives section. Arbitrary JSON to JSON conversions are supported through the “generic” command and all supported formats are converted using commands that match their namesake. All supported formats are detailed in the Supported Conversion Formats section. More details and examples for the convert command are in the Tutorial.

Usage

Usage:
    messes convert mwtab (ms | nmr | nmr_binned) <input_JSON> <output_name> [--update <conversion_directives> | --override <conversion_directives>] [--silent]
    messes convert save-directives mwtab (ms | nmr | nmr_binned) <output_filetype> [<output_name>]
    messes convert generic <input_JSON> <output_name> <conversion_directives> [--silent]
    messes convert --help
    
    <conversion_directives> - can be a JSON, csv, xlsx, or Google Sheets file. If xlsx or Google Sheets the default sheet name to read in is #convert, 
                              to specify a different sheet name separate it from the file name with a colon ex: file_name.xlsx:sheet_name.
                              
    <output_filetype> - "json", "xlsx", or "csv"

Options:
    -h, --help                           - show this screen.
    -v, --version                        - show version.
    --silent                             - silence all warnings.
    --update <conversion_directives>     - conversion directives that will be used to update the built-in directives for the format.
                                           This is intended to be used for simple changes such as updating the value of 
                                           the analysis ID. You only have to specify what needs to change, any values 
                                           that are left out of the update directives won't be changed. If you need to remove 
                                           directives then use the override option.
    --override <conversion_directives>   - conversion directives that will be used to override the built-in directives for the format.
                                           The built-in directives will not be used and these will be used instead.
    
    

The general command structure for convert is convert <format> which will convert an input JSON file over to the supported format. 
The outputs of these commands will save both the JSON conversion and the final format file.

The generic command is the same as the supported formats except the user is required to input conversion directives specifying how to 
convert the input JSON to the desired output JSON. Only an output JSON is saved.

The save-directives command is used to print the default conversion directives used by convert for any of the supported formats. <output-filetype> 
can be one of "json", "xlsx", or "csv". The file is saved as "format_conversion_directives.ext" where ".ext" is replaced with ".json", ".xlsx", 
or ".csv" depending on the value of <output-format>, unless <output_name> is given.