JavaOne 2026

JavaOne 2026 Session

Duke in front of a whiteboard

Improving Metadata Workflow in a Data Repository With AI-Generated Metadata Recommendations

Summary

The Dataverse is an open source Java EE application for preserving, sharing, and replicating research data in accordance with the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. Rich metadata and persistent identifiers allow datasets to be shared and integrated with other datasets. Occasionally, the data archiving workflow misses the mark because the data creator doesn't provide crucial metadata such as the Subject category of the data set.

This session demonstrates a method to generate Subject recommendations based on the Subjects of existing datasets with similar descriptions. This method uses LLM embeddings of dataset descriptions stored in a Neo4J knowledge graph.

Profile

Type: Learning Session (50 min)

Track: Machine Learning and Artificial Intelligence

Audience Level: Beginner

Speaker: Bob Treacy

Session: Wednesday, March 18th at 3:00 PM in Room 105