Introduction
The predicate linking module implemented last week was demoed to the mentors and I got some valuable feedback which I worked on this week. I also implemented a LLM-as-a-judge script to use gpt-oss-120b model to score our synthetically generated data points. Finally, the repo code was cleaned up and made ready for final submission along with thorough documentation.
Predicate Linking
- The lexical score currently was generated but without translation. For this, I implemented it so that the candiate was first translated to english and then compared lexically. This gives us a more meaningful comparison score.
- Implemented wikidata to dbpedia mapping so that we get the DBpedia property link as output.
- Integrated the predicate linking module into the CLI and streamlit demo.
- Added
link_predicate_batchto the module to process candidates efficiently in batches.
LLM-as-a-judge Scoring Script
In the previous weeks, we implemented a synthetic data generation pipeline for generating data to finetune a small LM locally. The problem with this data was that there was inconsistency in quality. I wrote a script to use the same gpt-oss model used for generation to also act as a judge and score each of the data points.
This opens the door for eventual filtering based on the score of the data points.
Final Week
- Final work submission URL - https://gist.github.com/advenk/8ce1bea298ca5c13829c8737bc21cc93
- Cleaned and raised the final PR for NEF
- Collected reference material for drafting a position paper for the work done in GSoC for the Hindi chapter over the last two years.
What’s Next
- Fix the “type” relation handling in predicate linking (generation + scoring).
- Add marginalization and reranking to EL (subjects/objects) beyond mGENRE top-1.
- Tighten extraction→linking interface so EL can be evaluated with gold/clean triples.
- Finetune the gemma3-4b model using the filtered data with data point scores above 8.
- Flesh out the workshop position paper which showcases our work on the Hindi chapter over the last two years.