To better understand the origin of LLM bias, this paper attempts to quantify an underexamined source of bias in model inputs: the authorship of pretraining data. We evaluate authorship of ChatGPT-3’s training corpuses through a gender lens. Operating from developers’ limited disclosure of the model, we estimate that just over a quarter – 26.5% – of ChatGPT-3 training data was authored by women. This paper serves as a case study of what is lost when disclosure and documentation of LLM training data is lacking.
Pitt Cyber submitted recommendations to the UN Multistakeholder Advisory Body on AI recommending the UN develop technical assistance for low and middle-income countries related to AI governance; as well as providing guidance and advocacy for AI technologies to help achieve select Sustainable Development Goals.
Pitt Cyber submitted comments to the National Telecommunications and Information Administration's request for comment on AI Accountability Policies, addressing the need for AI accountability measures to encompass socio-technical characteristics.
Watch Pitt Cyber’s event hosting the launch of the Department of Commerce’s National Telecommunications and Information Administration’s effort on AI accountability policies.
Pitt Cyber submitted comments to the Office of Management and Budget request for comment on “Advancing Governance, Innovation, and Risk Management for Agency Use of Artificial Intelligence Draft Memorandum.”