User Guide

Description

Academic Tracker was created to automate the process of making sure that federally funded publications get listed on PubMed and that the grant funding source for them is cited.

Academic Tracker searches PubMed, ORCID, Crossref, and Google Scholar to look for publications. The 2 main use cases allows users to search by author names or a publication citation/reference. The output is customizable by the user, but in general will be a JSON file of publication information, a JSON file of email information if emails were sent, and text files of summary information.

A secondary use case of searching by author names is to create a report of the collaborators they have worked with. This can be done by specifying the creation of that report in the configuration file. Details on reports are in the documentation.

Installation

The Academic Tracker package runs under Python 3.7+. Use pip to install. Starting with Python 3.4, pip is included by default.

Install on Linux, Mac OS X

python3 -m pip install academic_tracker

Install on Windows

py -3 -m pip install academic_tracker

Upgrade on Linux, Mac OS X

python3 -m pip install academic_tracker --upgrade

Upgrade on Windows

py -3 -m pip install academic_tracker --upgrade

Install inside virtualenv

For an isolated install, you can run the same inside a virtualenv.

$ virtualenv -p /usr/bin/python3 venv            # create virtual environment, use python3 interpreter

$ source venv/bin/activate                       # activate virtual environment

$ python3 -m pip install academic_tracker        # install academic_tracker as usual

$ deactivate                                     # if you are done working in the virtual environment

Get the source code

Code is available on GitHub: https://github.com/MoseleyBioinformaticsLab/academic_tracker

You can either clone the public repository:

$ https://github.com/MoseleyBioinformaticsLab/academic_tracker.git

Or, download the tarball and/or zipball:

$ curl -OL https://github.com/MoseleyBioinformaticsLab/academic_tracker/tarball/main

$ curl -OL https://github.com/MoseleyBioinformaticsLab/academic_tracker/zipball/main

Once you have a copy of the source, you can embed it in your own Python package, or install it into your system site-packages easily:

$ python3 setup.py install

Dependencies

The Academic Tracker package depends on several Python libraries. The pip command will install all dependencies automatically, but if you wish to install them manually, run the following commands:

  • docopt for creating the command-line interface.
    • To install docopt run the following:

      python3 -m pip install docopt  # On Linux, Mac OS X
      py -3 -m pip install docopt    # On Windows
      
  • pymed for querying PubMed.
    • To install the pymed Python library run the following:

      python3 -m pip install pymed  # On Linux, Mac OS X
      py -3 -m pip install pymed    # On Windows
      
  • jsonschema for validating JSON.
    • To install the jsonschema Python library run the following:

      python3 -m pip install jsonschema  # On Linux, Mac OS X
      py -3 -m pip install jsonschema    # On Windows
      
  • habanero for querying Crossref.
    • To install the habanero Python library run the following:

      python3 -m pip install habanero  # On Linux, Mac OS X
      py -3 -m pip install habanero    # On Windows
      
  • orcid for quering ORCID.
    • To install the orcid Python library run the following:

      python3 -m pip install orcid  # On Linux, Mac OS X
      py -3 -m pip install orcid    # On Windows
      
  • scholarly for querying Google Scholar.
    • To install the scholarly Python library run the following:

      python3 -m pip install scholarly  # On Linux, Mac OS X
      py -3 -m pip install scholarly    # On Windows
      
  • beautifulsoup4 for parsing webpages.
    • To install the beautifulsoup4 Python library run the following:

      python3 -m pip install beautifulsoup4  # On Linux, Mac OS X
      py -3 -m pip install beautifulsoup4    # On Windows
      
  • fuzzywuzzy for fuzzy matching publication titles.
    • To install the fuzzywuzzy Python library run the following:

      python3 -m pip install fuzzywuzzy  # On Linux, Mac OS X
      py -3 -m pip install fuzzywuzzy    # On Windows
      
  • python-docx for reading docx files.
    • To install the python-docx Python library run the following:

      python3 -m pip install python-docx  # On Linux, Mac OS X
      py -3 -m pip install python-docx    # On Windows
      
  • pandas for easy data manipulation.
    • To install the pandas Python library run the following:

      python3 -m pip install pandas  # On Linux, Mac OS X
      py -3 -m pip install pandas    # On Windows
      
  • openpyxl for saving Excel files in pandas.
    • To install the openpyxl Python library run the following:

      python3 -m pip install openpyxl  # On Linux, Mac OS X
      py -3 -m pip install openpyxl    # On Windows
      
  • requests for making internet requests.
    • To install the requests Python library run the following:

      python3 -m pip install requests  # On Linux, Mac OS X
      py -3 -m pip install requests    # On Windows
      
  • deepdiff for comparing publication data.
    • To install the deepdiff Python library run the following:

      python3 -m pip install deepdiff  # On Linux, Mac OS X
      py -3 -m pip install deepdiff    # On Windows
      

Basic usage

Academic Tracker expects at least a configuration JSON file, and possibly more depending on the usage. The 2 main use cases are author_search and reference_search, with the other usages mostly included to support those. author_search searches by the authors given in the configuration JSON file while reference_search searches by the publication references given in the reference file or URL. Details about the JSON files are in the JSON Schema section, and more information about the use cases with examples are in the Tutorial section.

Usage:
    academic_tracker author_search <config_json_file> [--test] 
                                                      [--prev_pub=<file-path> --prev-pub=<file-path>] 
                                                      [--save-all-queries]
                                                      [--no-GoogleScholar --no_GoogleScholar] 
                                                      [--no-ORCID --no_ORCID] 
                                                      [--no-Crossref --no_Crossref] 
                                                      [--no-PubMed --no_PubMed]
                                                      [--verbose --silent]
    academic_tracker reference_search <config_json_file> <references_file_or_URL> [--test] 
                                                                                  [--prev-pub=<file-path> --prev_pub=<file-path>]
                                                                                  [--save-all-queries]
                                                                                  [--PMID-reference --PMID_reference]
                                                                                  [--MEDLINE-reference --MEDLINE_reference]
                                                                                  [--keep-duplicates]
                                                                                  [--no-Crossref --no_Crossref]
                                                                                  [--no-PubMed --no_PubMed]
                                                                                  [--verbose --silent]
    academic_tracker find_ORCID <config_json_file> [--verbose --silent]
    academic_tracker find_Google_Scholar <config_json_file> [--verbose --silent]
    academic_tracker add_authors <config_json_file> <authors_file> [--verbose --silent]
    academic_tracker tokenize_reference <references_file_or_URL> [--MEDLINE-reference --MEDLINE_reference]
                                                                 [--keep-duplicates]
                                                                 [--verbose --silent]
    academic_tracker gen_reports_and_emails_auth <config_json_file> <publication_json_file> [--test --verbose --silent]
    academic_tracker gen_reports_and_emails_ref <config_json_file> <references_file_or_URL> <publication_json_file> [--test]
                                                                                                                    [--prev-pub=<file-path> --prev_pub=<file-path>]
                                                                                                                    [--MEDLINE-reference --MEDLINE_reference]
                                                                                                                    [--keep-duplicates]
                                                                                                                    [--verbose --silent]
    
Options:
    -h --help                         Show this screen.
    -v --version                      Show version.
    --verbose                         Print hidden error messages.
    --silent                          Do not print anything to the screen.
    --test                            Generate pubs and email texts, but do not send emails.
    --prev-pub=<file-path>            Filepath to json or csv with publication ids to ignore. 
                                      Enter "ignore" for the <file_path> to not look for previous publications.json files in tracker directories.
    --prev_pub=<file-path>            Deprecated. Use --prev-pub instead.
    --save-all-queries                Save all queried results from each source in "all_results.json".
    --keep-duplicates                 After references are tokenized duplicate entries are removed, use this option not to remove duplicate entries.
    
Reference Type Options:    
    --PMID-reference                  Indicates that the reference_file is a PMID file and only PubMed info will be returned.
    --PMID_reference                  Deprecated. Use --PMID-reference instead.
    --MEDLINE-reference               Indicates that the reference_file is a MEDLINE file.
    --MEDLINE_reference               Deprecated. Use --MEDLINE-reference instead.

Search Options:
    --no-GoogleScholar                Don't search Google Scholar.
    --no_GoogleScholar                Deprecated. Use --no-GoogleScholar instead.
    --no-ORCID                        Don't search ORCID.
    --no_ORCID                        Deprecated. Use --no-ORCID instead.
    --no-Crossref                     Don't search Crossref.
    --no_Crossref                     Deprecated. Use --no-Crossref instead.
    --no-PubMed                       Don't search PubMed.
    --no_PubMed                       Deprecated. Use --no-PubMed instead.