As part of HRF AI hackathon we made a Human Rights Benchmark and measured how much LLMs like human rights.
We asked each LLM about 46 binary questions and expected certain answers (starting with YES or NO for simplicity). Then it was a string comparison of the answer given by LLM and the expected answer we provided.
OpenAI is pro human rights as well as Meta. Chinese models are everywhere. The most intelligent open source model today (GLM) ranked the worst. Gemini avoided giving answers, and I think it is a kind of censorship, which ended up scoring low.
The idea is after doing proper benchmarks, we can shift AI in good directions ourselves, or demand that other companies score higher. Ultimately consumers of LLMs are better off, more mindful of what they are choosing and talking to.
Open sourced the code and questions:
https://github.com/hrleaderboard/hrleaderboard
Our activist:
https://x.com/yangjianli001
Thanks
@11b9a894…889850ce and
@f1989a96…bcaaf2c1 for the event. It was a great experience and it was "the place to be" this weekend.