How can computer vision widen the evidence base around on-screen representation?
Quote 1: “The goal must be that the diversity of employment in film should represent the diversity in the population as a whole. [...] although the problem of under representation has been recognised and discussed over a number of years, very little has been done.”
Quote 2: “The screen industries do not currently reflect the UK population – neither in their workforce nor the content they produce. [...] Even without research, industry is clear that film is nowhere near representative of the UK population and has a lot of work to do to change this.”
Nineteen years separate the two quotes above but they convey very similar points. The first quote is from 2001, from the first report from the Committee for Ethnic Minority Employment in Film, set up by then-DCMS Secretary of State Chris Smith. The second quote is from January 2020, from the Initial Findings report on the British Film Institute (BFI) Diversity Standards. The UK screen industry has a diversity (specifically, representation) problem which it has found difficult to solve. Progress has been made but structural inequalities remain.
Representation is commonly defined as how the media portray individuals, social groups, events and issues to an audience.
Computer vision is an interdisciplinary field that develops techniques to help computers ‘see’ or make sense of the visual world. For example, models can be trained to detect objects like faces in an image or video.
Due to the persistent nature of the diversity issue, it's crucial that we continue to evidence it. In this article, we explore whether and how computer vision might provide a new method of measuring on-screen representation.
This article is an output of a pilot project conducted by Nesta in partnership with Learning on Screen. The project team consists of Raphael Leung (Principal Data Scientist, Nesta); Bartolomeo Meletti (Education and Research Executive, Learning on Screen); Dr Cath Sleeman (Head of Data Discovery, Nesta); Gabriel A. Hernández (Chief Digital Officer, Learning on Screen); and Gil Toffell (Academic Research Manager, Learning on Screen). The pilot is part of ‘BoB for AI’, a Learning on Screen initiative aimed at exploring how the BoB archive can be applied for data science research. With over 2.5 million broadcasts, it is a rich source of audiovisual cultural heritage data, which can be usefully applied in a range of disciplines like digital humanities, computational social science and computer science. The broadcast content we used was made available via the Learning on Screen’s BoB archive with permission from the Educational Recording Agency.
The article builds upon a presentation that Raphael and Bartolomeo gave at the International Federation of Television Archives (FIAT/IFTA) Conference 2020 and a short paper by Raphael for the IJCAI 2021 AI for Social Good workshop. It is also inspired by PEC (AHRC Creative Industries Policy & Evidence Centre) research on generating more subtle, nuanced measures of creative diversity using machine learning, e.g. She said more, which analysed gendered pronouns that are male and female in news articles about the creative industries.
Gaps in the evidence base
There are two big initiatives in the UK regularly collecting evidence on diversity across different demographics on screen. These are: Project Diamond by the Creative Diversity Network, the flagship programme capturing diversity data in the UK Broadcasting supply chain, and the Office of Communications (Ofcom)’s annual diversity in television broadcasting reports.
While these existing data collection exercises are very informative, we believe they miss out on important parts of the picture. There are three areas where more evidence could be particularly impactful.
First, current evidence misses out on some key aspects of representation. For example, much of the evidence focuses on presence (whether someone, for example, a character, appears on screen) and very little on prominence (e.g. how much screen time a character has, and how centered or foregrounded they are) or portrayal (e.g. the context, narratives and any stereotypes). If applied appropriately, computer vision has the potential to expand evaluation of on-screen representation from presence to prominence.
Second, existing data coverage is a) low, and b) uneven across demographic groups. For example, Project Diamond had an average response rate of 28% of individuals working on qualifying TV content across the five contributing broadcasters in 2018/19. There is therefore no direct knowledge of how inclusive or representative the remaining 70+% of the broadcast landscape is. A 2018 review by NatCen further discussed the “inevitable possibility of reporting bias due to non-response as a consequence of the low response rate.” Ofcom’s reports have similarly highlighted “data gaps” and “insufficient collection” of disability and some other demographic data.
There is in general comparatively much less data on some underrepresented and minoritised groups. BFI’s evidence review summarising more than 60 research publications about workforce diversity found that “sexual orientation and religion and belief were seldom explored in detail” in the publications that they examined. For on-screen representation (Standard A), only 1% of productions meeting the BFI Diversity Standard do so via gender identity, compared with 63% for gender and 50% for race/ethnicity.
BFI’s Diversity Standard is a contractual requirement for all British Film Institute funding as well as an eligibility requirement for certain BAFTA awards. It recently inspired similar rules for the Oscars in the US.
Computer vision does not have similar faults of being slow (relative to manual annotation) and possible reporting bias (relative to diversity forms). But it does have its own methodological challenges, which are discussed here.
Third, existing methods (e.g. manual annotation and self-reported forms) are insightful but inherently limited. The evidence gaps resulting from their limitations can only be plugged if we embrace more innovative methods, so we can see a more complete picture and advance the discussion.
The manual annotation approach, which counts characters from particular groups on-screen, has been used in ‘state-of-diversity’ reports (e.g. annotating seven weeks of broadcasts on BBC main channels) as well as specific media studies research (e.g. analysing the representation of older adults or ethnic minorities on television). But it is more laborious, can be slow to deliver insights, and only covers what the researcher can feasibly annotate.
The self-reported survey approach also reflects a partial view. For a TV show that ended many years ago, it is difficult to run a retrospective survey with the cast and crew, but it is possible to apply computer vision to study the programme. For reference, Project Diamond began in 2016 and Ofcom’s annual diversity and equal opportunities in television reports began in 2017 -- so both are relatively young initiatives.
Archived television such as Learning on Screen’s BoB can be used to study the broadcast landscape. There is also potential for computational researchers to take advantage of the UK’s text and data mining copyright exception, introduced in 2014 to allow processing of protected materials for non-commercial research purposes.
BoB is Learning on Screen’s on demand TV and radio service for education. Its archive currently holds over 2.5 million TV and radio broadcasts dating back as far as the 1950s. It updates daily with channels including BBC1, Channel 4, ITV, and BBC Radio 4, among others. It is exploring research use cases using AI and other methods.
Embracing and building upon new evidence methods
Fundamentally, more complete and richer data about representation on screen can only add to the evidence base. As researchers, we think that there are overlooked datasets like archives and newer methods like computer vision that can help address gaps in the evidence base for representation.
In particular, if applied and interpreted appropriately, computer vision can speed up certain parts of data compilation, provide new insights, and widen the evidence base from presence to prominence. It is not a cure-all, but newer methods can potentially be an important supplement to how we understand representation and evidence progress. In the longer term, there is much room for socially-minded AI researchers in the creative industries and elsewhere to build systems that apply computer vision to the domain responsibly.
From presence to prominence
There is good reason for diversity metrics to consider more than on-screen presence and branch out to prominence and portrayal.
First, existing measures of on-screen representation need to be more sophisticated to cater for ongoing discussions about prominence and portrayal. One insight highlighted by the BFI this year is the “percentages of productions foregrounding lead characters from underrepresented groups have room for improvement.” They found that “the comparatively low percentage of black, Asian and other minority ethnic characters speaks to the need for more lead roles and ownership of narrative if film is to be properly representative.” At the same time, we want to avoid tokenistic representation and portrayal that perpetuates stereotypes. Existing measurements are not sufficiently well-purposed to inform these discussions.
Second, there are already feasible computational methods that can be applied to measure prominence. E.g. how relatively centered or foregrounded (‘front and centre’) a character is can be measured by how much screen or speaking time they have.
But the wider social norms around applying facial technologies responsibly and ethically are still developing. A longer version of this article – available as a blog series on the PEC website – presents a conceptual framework for measuring on-screen representation and shows two illustrative examples of measuring character prominence in broadcast TV.
Mock the Week (Season 15 episode 6) was shown on BBC2 in March 2017 (originally aired in July 2016). It was produced by Angst Productions and Ewan Phillips.
Based on this pilot project, we propose potential applications to widen the evidence base around on-screen representation for four groups.
First, for diversity leads and monitors, computer vision could be used, supplemented with manual review, to generate more frequent and richer data about representation. Through plugging evidence gaps (beyond presence, and across under-represented groups), computer vision can generate real-time insights of on-screen representation in broadcasts. Measurements can prompt rethinking around the stories which are told and funded.
Second, for content producers, computer vision may represent new opportunities to create new features for viewers to look for major and minor characters. It is not just the responsibility of the regulator or diversity groups to request and compile data. The content producer can potentially use richer data around character prominence to create new production features. This can generate additional value for viewers and fans, e.g. allowing viewers to interact with a visual summary of more and less prominent characters across a series.
Third, for editors and commissioners, computer vision can be used to analyse character prominence before a show airs. Currently, evidence is gathered long after the broadcast date. As processing an episode for prominence metrics is quicker than real time, especially with downsampling, it is possible to run it post-production, so some concerns around representation can be addressed upstream. These methods can be helpful during commissioning, or during screen-writing or editing in between series.
Finally, researchers can form partnerships to better understand the models under which content rights holders can open up broadcast data for research, so that collections can be more commonly treated as data, and responsibly opened up to answer important social research questions.