Tutorial
Academic Tracker is intended to be used solely as a command line program. This tutorial describes each command and its options.
Configuration JSON File
Details about the configuration JSON file can be found in the JSON Schema section, but in general the sections of the configuration JSON file that aren’t needed for a particular command are not required. For instance, the ORCID_search section is not required for reference_search since it does not search ORCID. The same is true if the –no-ORCID option is used.
Outputs
The specific files output by Academic Tracker vary by the command used and some options, but each command always creates a new timestamped directory in the working directory. If the –test option is not used then the directory will be named tracker-YYMMDDHHMM. If the –test option is used then the directory will be named tracker-test-YYMMDDHHMM.
Search For Publications By Reference
Command Line Signature
academic_tracker reference_search <config_json_file> <references_file_or_URL> [--test --prev-pub=<file-path> --save-all-queries --PMID-reference --MEDLINE-reference --no-Crossref --no-PubMed --verbose --silent]
Description
Parse and tokenize the reference file or URL and then search PubMed and Crossref for the publications found. ORCID is not searched because it is a database of authors and does not support searching for publications directly. Google Scholar is not searched because it does not like bots, so cannot be easily searched without using a 3rd party paid service or proxies.
The reference_file_or_URL can be several different things. If it is a file then it can be a JSON file of already tokenized data, a docx file, or a txt file. If not a JSON file then each reference is expected to be on a single line. If it is a URL then it can be either a MyNCBI URL or not. If it is a MyNCBI URL then it is expected to be the first page of a bibliography and will be tokenized in a specific way. All other URLs are simply read as a text file and tokenized like one.
The –PMID-reference and –MEDLINE-reference options change how the reference file is interpreted. If the –PMID-reference option is used then it indicates that the given reference file is a list of PMIDs (PubMed’s unique IDs). Instead of tokenizing this file it is assumed that each line is a PMID so PubMed will be queried for each PMID and Crossref will not be queried. The idea for this option was to be able to quickly grab information from PubMed.
The –MEDLINE-reference option indicates that given reference file is a MEDLINE formatted file. This will be tokenized in a unique way since the publication information is spread out over multiple lines in this format. This format is supported because it is a dounload option on MyNCBI bibliography pages.
Details about tokenization are in the Tokenization section.
The –prev-pub option is different for reference_search than it is for author_search. First, reference_search does not automatically look for a publicaitons.json file to use since the same assumptions as described for author_search do not hold here. Second, publications in the prev_pub file are not used to ignore publications. Publications in the prev_pub file will still be in the newly created publications.json file. What this option does do is set the <is_in_comparison_file> keyword to True for matching publications in the summary report.
Options
–test:
The test option changes the name of the output directory from tracker-YYMMDDHHMM to tracker-test-YYMMDDHHMM and prevents any emails from being sent.
–prev-pub:
Specifies a publications.json file to use as a list of publications to compare with when generating the summary report.
–save-all-queries:
If used, all of the raw data returned from each source for each author will be saved in a file called “all_results.json”. The structure is {“source_name”:[[pub_dict_1, pub_dict_2, …]], …} Ex. {“PubMed”:[[{}, …]], “Crossref”:[[{}, …]]} The index of each list in the source lines up with the index in the tokenized_reference.
–PMID-reference:
Specifies that the reference file is a list of PMIDs and to only return information from PubMed.
–MEDLINE-reference:
Specifies that the reference file is a MEDLINE formatted file.
–no-Crossref:
If used reference_search will not search Crossref for publications.
–no-PubMed:
If used reference_search will not search Crossref for publications. This option is assumed if the PubMed_search section of the configuration JSON file is missing.
–verbose:
If used HTML errors and other warnings will be printed to the screen.
–silent:
If used nothing will be printed to the screen.
Outputs
Outputs depend on the configuration JSON file and options.
A publications.json file will always be output.
A tokenized_reference.json file will always be output.
An emails.json file is only created if the from_email attribute is given in the summary_report section of the configuration JSON file.
A summary_report.txt file is only created if the summary_report attribute is in the configuration JSON file.
If –PMID-reference is used no reports or emails are generated.
Details about reports can be found in the Reporting section.
An all_results.json file will be output if the –save-all-queries option is given.
publications.json tokenized_reference.json emails.json summary_report.txt all_results.json
Examples
Typical run.
config_file.json:
{
"summary_report": {},
"PubMed_search": {
"PubMed_email": "email@email.com"
},
"Crossref_search": {
"mailto_email": "email@email.com"
}
}
Note
A minimal example is shown, but the config can have other sections and run without error.
Console:
>academic_tracker reference_search config_file.json reference_file.txt
Finding publications. This could take a while.
Searching PubMed.
Searching Crossref.
Success. Publications and reports saved in tracker-2202020140
Run in test mode so emails aren’t sent.
>academic_tracker reference_search config_file.json reference_file.txt --test
Finding publications. This could take a while.
Searching PubMed.
Searching Crossref.
Success. Publications and reports saved in tracker-test-2202020140
Designating a previous publications filepath.
>academic_tracker reference_search config_file.json reference_file.txt --prev-pub prev_pub_file_path.json
Finding publications. This could take a while.
Searching PubMed.
Searching Crossref.
Success. Publications and reports saved in tracker-2202020140
Specifying that Academic Tracker shouldn’t use Crossref.
config_file.json:
{
"summary_report": {},
"PubMed_search": {
"PubMed_email": "email@email.com"
}
}
Note
A minimal example is shown, but the config can have other sections and run without error.
Console:
>academic_tracker reference_search config_file.json reference_file.txt --no-Crossref
Finding publications. This could take a while.
Searching PubMed.
Success. Publications and reports saved in tracker-2202020140
Tokenize A Reference
Command Line Signature
academic_tracker tokenize_reference <references_file_or_URL> [--MEDLINE-reference --verbose --silent]
Description
Tokenize the input reference and output a tokenization report and JSON file.
Options
–MEDLINE-reference:
Specifies that the reference file is a MEDLINE formatted file.
–verbose:
If used HTML errors and other warnings will be printed to the screen.
–silent:
If used nothing will be printed to the screen.
Outputs
The information in the text report and JSON file are essentially the same, but the text report is presented in a more human readable way. They both have every publication that could be identified in the reference and tokenized, so if one does not appear that should be then there is a problem during tokenization. More detailed information about tokenization is in the Tokenization section.
tokenization_report.txt tokenized_reference.json
Examples
Typical run.
>academic_tracker tokenize_reference reference_file.txt
Searching Google Scholar for author's scholar ids.
Success! Tokenization files saved in tracker-2202020140
Generate Reports And Emails Like Reference Search
Command Line Signature
academic_tracker gen_reports_and_emails_ref <config_json_file> <references_file_or_URL> <publication_json_file> [--test --prev-pub=<file-path> --MEDLINE-reference --verbose --silent]
Description
Create reports and emails and send emails just like reference_search would if it had found the publications in the given publications JSON file. The idea behind this command is to give the user the ability to play with the reporting system without having to query for publications. This command will also send emails if the –test option is not used, so don’t forget to use it lest you send a bunch of test emails to the wrong people, or make sure the emails are all going to you.
This command differs a little from the author_search version due to the nature of reference_search. Each reference must be linked to a publication in the given publications JSON file. During reference_search this is done and the matching publication key is stored in the pub_dict_key attribute of the tokenized reference file. If the tokenized reference is generated on the fly or was not generated in tandem with the given publications JSON file then this will not be the case. To resolve this the gen_reports_and_emails_ref command does its best to match each tokenized reference with the publications in the given publications JSON file by comparing DOI, PMID, and title.
The point is that if the given reference and publications were not generated in tandem then results may be different from expectations. A new tokenized_reference.json file is output with this command so the user can see which publications were matched with each reference by looking at the pub_dict_key attribute.
Details about reporting can be found in the Reporting section.
Options
–test:
The test option changes the name of the output directory from tracker-YYMMDDHHMM to tracker-test-YYMMDDHHMM and prevents any emails from being sent.
–prev-pub:
Specifies a publications.json file to use as a list of publications to compare with when generating the summary report.
–MEDLINE-reference:
Specifies that the reference file is a MEDLINE formatted file.
–verbose:
If used HTML errors and other warnings will be printed to the screen.
–silent:
If used nothing will be printed to the screen.
Outputs
Outputs depend on the configuration JSON file and options.
A tokenized_reference.json is always generated.
An emails.json file is only created if the from_email attribute is given in either the summary_report or project_report sections of the configuration JSON file.
A summary_report.txt file is only created if the summary_report attribute is in the configuration JSON file.
Details about reports can be found in the Reporting section.
tokenized_reference.json emails.json summary_report.txt
Examples
Typical run.
config_file.json:
{
"summary_report": {},
}
Note
A minimal example is shown, but the config can have other sections and run without error.
Console:
>academic_tracker gen_reports_and_emails_ref config_file.json reference_file.txt publications.json
Success! Reports and emails saved in tracker-2202020140