Some VIVO Data Things
Tools for loading data into VIVO and managing data in your VIVO.
VIVO Pump
The VIVO Pump is a tool for maaging VIVO data using spreadshets. The Pump uses a definition file to map the rows and columns of your spreadsheet to the entities and literal values in the VIVO ontology. The same definition file can be used to get data from VIVO into the spreadsheet, and to update VIVO from the data in the spreadsheet. Download
VIVO Tools
VIVO Tools is general purpose Python software library for developing software related to VIVO. At the University of Florida (UF), we use VIVO Tools to write ingest scripts, maintenance scripts, reports, and much more. Download
Add Papers from Pubmed
Here's a tool to make it easy to add papers to your VIVO that are indexed in PubMed. To use the tool, you simply make a text file listing PubMed ids that you would like to have added. You can add an optional list of one or more URIs of VIVO people who might be authors. This comes in handy when you are adding papers for a department or a research group. You list all the members of the group. When a paper is added, and disambiguation is required, the list you provided will be used to help choose appropriate authors.
When you use the tool to add papers to VIVO, you get cross-linked concepts, cross-linked authors, a list of grants cited, PMCID, NHIMSID identifiers, the abstract, and a link to the full text of the paper if available. Download
Adding Cities to VIVO
VIVO comes with countries and states, but needs major US cities. A list of cities with populations over 100,000 was obtained from Wikipedia and formatted for use in VIVO. Then a simple Python script was written to compare cities found in VIVO with cities in the Wikipedia data. If cities are missing from VIVO, they are added to the appropriate state. If cities are found in VIVO, they are updated, as needed. Download
Updating PubMed Attributes
bio.entrez is a powerful python package for accessing Entrez, the NCBI tool for accessing the NLM (branch of the NIH) PubMed data. Using these tools and resources, we've written a script for adding PubMed data to publications already in VIVO. Entrez is called for each paper in VIVO that has a DOI. Seven attributes are added to each such paper:
- PubMedID. Required to appear on all publications on an NIH biosketch.
- PubMedCentralID. Required to demonstrate the paper is compliant with the NIH Open Access Policy.
- NIHMSID. Useful for tracking papers that have not yet been assigned other identifiers.
- Keywords. MeSH terms assigned by the NLM. Matched to concepts in VIVO.
- Abstract. The full abstract of the paper as it appears in PubMed.
- Grants cited by the paper. A list of grant identifiers acknowledged as supporting the work of the paper. Invaluable in tracking which paper resulted from which funding.
- URL of the full text of the paper. Read the paper in one click from VIVO.
Add Buildings to VIVO
Adding university buildings to VIVO is easy to do and provides another resource for your VIVO users. You can add photos, web sites, links to campus maps, and descriptions. Buildings can appear as locations for events. See, for example, the UF Century Tower Carillon. VIVO automatically adds buildings to the index and provides an index page of of buildings. See the UF building index. Download
Remove URIs from VIVO
There will be times when you will have things in your VIVO that you will want to remove. These may be unused dates, or people who should no longer be in VIVO. You may have run queries checking for various data integrity issues and found a set of URIs to be removed, such as roles that do not have endpoints as expected. For whatever reason, there may be an occasion for you to "bulk remove" URIs from VIVO. remove_uri is a tool for removing URIs from VIVO. Given a list of URIs to be removed, remove_uri will generate the RDF needed to remove all statements in VIVO containing URIs on the list.
remove_uri should be used carefully, and as part of a comprehensive data management activity for your VIVO data. Download
Merge URIs in VIVO
Sometimes, the same entity will be entered into VIVO more than once. Some assertions will be attributed to one of the entities and other assertions will be attributed to one or more duplicates. This may happen because automated processes are being used to add data to your VIVO at the same time that people are using self-edit to add additional entities. At UF we have had duplicate people, duplicate organizations, duplicate papers, and duplicate grants.
merge_uri is a tool you can use to combine pairs of URIs, taking all the assertions from one entity (called "from") and assigning them to another entity (called "to"). From is removed as part of the process. So, for example, if there are two entities referring to the same organization, you can choose which one will be removed and have all the assertions about that entity assigned to the duplicate. merge_uri is "symmetric" -- for two entities "a" and "b," you may assign all assertions from a to b, removing a, or assign all assertions from b to a, removing b. Typically we choose the entity with the smaller set of assertions to be the "from" entity.
merge_uri should be used carefully, and as part of a comprehensive data management activity for your VIVO data. Download
A Link Checker for VIVO
You may have web pages in your VIVO that contain URLs pointing to web resources outside your VIVO. This practice creates a lot of value for your users. At UF, people point to their department profiles, papers point to PubMed Central on-line versions of the full-text of the paper, courses point to catalog descriptions, and so on. With all those links, some will inevitably break. The VIVO link checker is a simple python script that checks each URL in VIVO and determines its status -- available, "404 not found," and all other statuses. The software generates a frequency table of statuses and for 404 and 410 errors, the link checker generates RDF that can be used to remove these pages from your VIVO. Run regularly, the link checker reduces the number of bad links your users will encounter using VIVO. Download
Add SJR Rankings to VIVO
SJR -- the Scientific Journal Rankings -- are open source scores for journals indicating the importance of the journal as determined by citations. The score for more than 20,000 journals are openly available from SciMago. At UF we load these rankings into VIVO, updating the SJR score for each journal, or adding an SJR score if the journal did not have one in VIVO. Download
Repair VIVO Phone Numbers
At UF, VIVO pulls its phone numbers from a master directory of people and their contact contact information known as the UF Directory. The directory contains phone numbers, but they have not been validated to insure they are syntactically valid. Area codes may be missing, as may be the first several digits -- assumed to be one of the UF phone exchanges. Formats for extensions are varied. We wrote a script to read phone numbers from VIVO and improve them to standards. The script can be run at any time to find and repair phone numbers. Download
VIVO Name Parts
At UF we had a bit of a problem in our VIVO. UF likes to store names in strings, for example, "Conlon, Michael." We had loaded data from a variety of sources using a variety of methods. For some of our data management tools, it was important that VIVO have values for name parts such as foaf:firstName. We wrote a small script using VIVO Tools that reads name strings in rdfs:label assertions in VIVO, parses the name into parts, and makes the appropriate name parts assertions. Download