The Dataverse is an open source Java EE application for preserving, sharing, and replicating research data in accordance with the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. Rich metadata and persistent identifiers allow datasets to be shared and integrated with other datasets. Occasionally, the data archiving workflow misses the mark because the data creator doesn't provide crucial metadata such as the Subject category of the data set.
This session demonstrates a method to generate Subject recommendations based on the Subjects of existing datasets with similar descriptions. This method uses LLM embeddings of dataset descriptions stored in a Neo4J knowledge graph.
Type: Learning Session (50 min)
Track: Machine Learning and Artificial Intelligence
Audience Level: Beginner
Speaker: Bob Treacy