Skip to content
Go back

GSoC Weeks 11-12: Predicate Linking Integration and Final Week Touches

Published:  at  07:00 AM

Introduction

The predicate linking module implemented last week was demoed to the mentors and I got some valuable feedback which I worked on this week. I also implemented a LLM-as-a-judge script to use gpt-oss-120b model to score our synthetically generated data points. Finally, the repo code was cleaned up and made ready for final submission along with thorough documentation.


Predicate Linking

  1. The lexical score currently was generated but without translation. For this, I implemented it so that the candiate was first translated to english and then compared lexically. This gives us a more meaningful comparison score.
  2. Implemented wikidata to dbpedia mapping so that we get the DBpedia property link as output.
  3. Integrated the predicate linking module into the CLI and streamlit demo.
  4. Added link_predicate_batch to the module to process candidates efficiently in batches.

LLM-as-a-judge Scoring Script

In the previous weeks, we implemented a synthetic data generation pipeline for generating data to finetune a small LM locally. The problem with this data was that there was inconsistency in quality. I wrote a script to use the same gpt-oss model used for generation to also act as a judge and score each of the data points.

This opens the door for eventual filtering based on the score of the data points.


Final Week


What’s Next


Share this post on:

Next Post
GSoC Week 10: Predicate Linking