Some VIVO Things Blog

Musings on the ecosystem, community, software, data, use, and whatever else comes to mind.

Is VIVO FAIR?

NOTE: Apologies in advance. This post is a bit longer than I would like, and contains some unavoidable technical terms. I have tried to provide citations for each term, recognizing that this will further lengthen the reading for some. I felt it was better to address this topic in one post rather than break it in two. I hope that is good for all.

The FAIR Data Principles developed by Force 11 are increasingly popular and provide a means for assessing whether data is being shared in a useful manner for others.

VIVO sites produce data in the form of assertions about the connected graph of research and scholarship. How does VIVO stack up against the FAIR data principles?

Findable. VIVO data is quite findable. VIVO includes schema.org tags on its pages to improve search engine finding. VIVO has a registry of sites with URLs for the sites. VIVO sites can participate in Direct2Experts, another finding tool. VIVO site data is aggregated by CTSAsearch, yet another finding tool. OpenVIVO provides its data as constantly updated text files on the web. These files are very easy to find using a search engine (hint: search for "OpenVIVO data"). And with the addition of Triple Pattern Fragments (TPF), in the next release of VIVO, I expect additional tools to be developed to find VIVO data. The future is bright to further improve "find ability" of VIVO data.

Accessible. If people can find your VIVO data, can they access it? The answer is yes. VIVO is designed to share its data. Every page in VIVO can be accessed as HTML, which browsers use to render the page for humans to read, and as Resource Description Framework (RDF), a machine readable data format for computers to read. This is one of VIVO's strongest features, and one of its biggest secrets. Programmers can access VIVO's data starting from almost any page in VIVO, because VIVO provides a connected graph of scholarship and research. Starting at a person, one can find papers, leading to co-authors. Starting at an organization, one can find people who have positions in the organization. Starting at a grant, one can find the funding agency, investigators, and so on. VIVO makes traversing the graph straightforward.

Additionally, sites may export their data to files accessible on the Internet, as OpenVIVO does, or provide a SPARQL endpoint. The TPF feature in the next release of VIVO will make VIVO data even easier to access.

Interoperable. VIVO data, modeled using the VIVO ontology, is amazingly interoperable. Two sets of VIVO data can be combined simply by putting them in the same file. No other work is needed. All VIVO sites and sites exporting VIVO data (there are many) are fully interoperable. They share the same data format (RDF) and the same representation/vocabulary (The VIVO Ontology).

Interoperability is lowered when sites do not use the same version of the VIVO ontology. While each version is a valid representation of scholarship, the ontology currently does not provide equivalence between versions. This must be done by software attempting to use multiple versions of the ontology. Future work may lower the effort currently needed to use multiple ontology versions.

Interoperability can be lowered when VIVO sites extend the ontology in custom ways to represent additional elements in VIVO, or to represent elements that should be common and in the ontology. The VIVO community needs to work with sites to identify elements that should be in the common ontology to avoid such customizations.

Similarly, interoperability can be lowered when sites use custom vocabulary to represent research concepts. The VIVO community needs more work to develop best practices for presenting the concepts underlying research areas of scholars, and subject areas of their works.

Reusable VIVO data, modeled by the VIVO ontology, achieves the highest standards for re-usability. VIVO data is "Five Star Linked Data," a term coined by Tim Berners-Lee. VIVO data is 1) on the web; 2) machine readable structured data; 3) uses a non-proprietary format; 4) published using open W3C standards; and 5) links to other open data. Anyone on the Internet can reuse VIVO data.

And yet, there are things we can do to improve re-usability. We can clarify the license under which sites provide VIVO data, and provide that information with the data. We can clarify where sites obtained their data and provide that information with the data. VIVO's current practice is to "inherit" provenance information from the source providing the information -- that is, if the data came from site x, we currently assume site x provided the data. We can go further and assert such facts explicitly in the VIVO data. We currently assume that VIVO data is provided by each site in a manner that supports reuse with attrbution. We can clarify this by providing a license assertion in the VIVO data.

Each VIVO site determines for itself how best to meet the FAIR data principles, if at all. Some sites share their data freely, while others rely on the delivered VIVO software to share their data. Still others have their data behind firewalls, preventing sharing. Unshared data cannot be FAIR.

Each of the FAIR data principles has sub-headings providing further guidance regarding what it means to be Findable, Accessible, Interoperable, and Reusable. I urge you take a look at the principles and consider how VIVO can be improved and how your data practices can be improved to further the goal of VIVO data as FAIR data.

There is more that VIVO can do to improve VIVO's data as FAIR data. We are all learning how to be FAIR. I think VIVO Is doing well and can do better.

So perhaps a short working answer to "Is VIVO FAIR?" is: 1) the VIVO project supports the FAIR data principles; 2) the VIVO ontology is a strong element of VIVO which supports the FAIR data principles; 3) the VIVO software provides features which support the FAIR data principles; and 4) VIVO sites provide VIVO data and each can share data according to the FAIR data principles.

If you are involved with a VIVO site and are non-technical, you may wish to discuss with your technical staff how your site is addressing FAIR data principles. If you are at a VIVO site and are technical, you may wish to speak with the non-technical members of the team regarding how your site should address FAIR data principles. Working together, sites should be able to align their practices with their institutional requirements and with the FAIR data principles.

What do you think? What more can the VIVO project do to promote data sharing using the FAIR data principles? What features could be added to the ontology or to the software to make sharing data even more natural?