This issue of VIEW provides a critical survey of new digital humanities (DH) methods and tools directed toward audiovisual (AV) media. DH as a field is still dominated by a focus on textual studies (studies of word culture) that are largely “deaf and blind” in their capacity to search, discover, and study AV materials. The mandate to improve these capacities is clear and unquestioned, though the pathways are fecund and numerous. New and emergent tools related to deep learning algorithms are reasonably expected to change this methodological landscape within the digitally accelerated near-future.

Such a welcome promise imposes new demands upon the fields of media studies and media history: we must recognize and develop new pedagogical strengths in areas such as quantitative analysis in relation to “digital hermeneutics”.1 This requires multimodal literacy and new skills that may vary across algorithmic criticism, data criticism, tool criticism, interface criticism, simulation criticism, etc.2 But artists, humanists, and social scientists will bring their own resonant critical thinking and essential information to the formation of new 21st century research questions within the Audio Visual Digital Humanities (AVDH).

Indeed, AVDH re-articulates the essential dialectic of digital humanities between the close-reading methodologies of the arts and humanities and the distant reading of the computational sciences. Visual and audio culture are inherently complex in different ways from the culture of words on a page. The inter-discipline of media studies is essential to addressing these key differences and their related 21st century research opportunities.

Building commitment toward a future in which computer vision and machine learning will make image and speech recognition ubiquitous is an achievable goal. Media studies and the study of media history will not only contribute a wide range of necessary methodologies, but can contribute via networked scholarship and even seasoned crowd-sourcing (such as access to the wisdom of generations who are not always-already hard-wired) considerable quantities of curated manual annotations to help train and evaluate the machine-learning algorithms in an iterative cycle: such a procedural workflow has been demonstrated by use of the Semantic Annotation Tool (SAT) of The Media Ecology Project, for example.3

Students, scholars, archivists, librarians, and other 21st century researchers should be encouraged to develop new skills in both close and distant reading techniques: new artful practices of “scalable reading”, critical combinations of “explorative” distant listening and viewing, conjoined with “interpretative modes” of close inspection, and so forth. These adaptive skills to zoom in and out between big data and distinctive expressive nuance will serve as an unquestionably challenging yet copiously generative mandate for many years of rigorous research to come.

The ten articles presented in this issue provide a snapshot of current research on audiovisual data within a broad (and expanding) domain of loosely defined DH-scholarship. This ‘state of the art’ glimpse provides a variety of epistemological, historiographical and technological issues that current research tries to tackle and come to grips with—particularly so regarding the increasingly expanding smorgasbord of digital methods applicable to audiovisual data. In addition, the articles also indicate and demonstrate some future directions and possibilities that upcoming DH-research might take by steering away from the traditional textual orientation of DH towards the exploration of other forms of media modalities. As usual, the content of the issue are presented through a series of Discovery and Exploratory articles.

1 Discoveries

In her article Fingal’s Cave: The Integration of Real-Time Auralisation and 3D Models, Shona Noble writes about an immersive virtual reality application—made in the form of a recreation of a visit to Fingal’s Cave in Scotland (renowned for its extraordinary acoustics). Both the application and Noble’s article explore the importance of audio in heritage visualisations and its consequent technical implementation. Audio has been important in the history and culture of Fingal’s Cave, hence the immersive application that Noble discusses combines 3D models, a narrative soundscape, and interactive auralisation. The article considers the effectiveness of auralisation, and Noble makes the claim that it is necessary for audio to be included in heritage visualisations and virtualisations to give a more powerful impression on audiences.

Moving from audio to images, Christoph Musik and Matthias Zeppelzauer discuss image analysis and machine learning in their article, Computer Vision and the Digital Humanities: Adapting Image Processing Algorithms and Ground Truth through Active Learning. The article offers scholars within DH knowledge about automated tools for image analysis, how they work and are constructed. Musik and Zeppelzauer argue that even if such tools are promising, there are still challenges to overcome, for example regarding algorithmic bias and lack of transparency in what such tools actually do. Based on these insights, the article introduces an approach called ‘active learning’, that according to Musik and Zeppelzauer can help to configure tools in ways that fit specific DH requirements and research questions in an adaptive and user-centered way.

If new forms of image analyses are a consequence of computational media, increase of televisual data is another. In their article, Maps, Distant Reading and the Internet Movie Database: New Approaches for the Analysis of Large-Scale Datasets in Television Studies, Giulia Taurino and Marta Boni explore what digital approaches based on Big Data can bring to the study of television series in a global mediascape. By using metadata of various TV series gleaned from the IMDb database, they examine countries in which the series are produced in, common locations, and links between actual locations and diegetic places. Via GPS coordinates linked to titles, each series is also transformed into a set of dots on a geographic information system. Taurino and Boni argue that the analysis of television series is in dire need of such new methods, and their case study provides an illustrative example of a spatially informed distant reading of televisual data.

Television and new methods of analysis is also the theme of Edward Larkey’s article, Narratological Approaches to Multimodal Cross-Cultural Comparisons of Global TV Formats. It applies a cross-cultural and multimodal methodology for comparing different versions of a TV series, Un Gars Une Fille (1997-2002). Larkey demonstrates how digital tools of analysis can be used for compiling and correlating quantitative and qualitative data on placement, length, and duration of segments in a number of different (global) versions of the ‘same’ TV series. Using computer annotation software to make quantitatively precise determinations about the durations of multimodal configurations, Larkey is able to show how these various audiovisual ‘texts’ contain global and local components structured and sequenced with traces of different power relationships and commercial mobilities.

2 Explorations

Tools for video annotation also lie at the core of the article, Tales of a Tool Encounter: Exploring Video Annotation for Doing Media History, co-written by Susan Aasman, Tom Slootweg, Liliana Melgar Estrada and Rob Wegter. The article explores the affordances and functionalities, possibilities and constraints of the Dutch CLARIAH research infrastructure (and its integrated video annotation tool) for doing research with digitised audiovisual sources from television archives. At the same time, the authors also reflect on their own specific engagements with the same infrastructure, arguing that media scholars need to rethink research practices in terms of methodological transparency, tool criticism and reflection.

A similar mode of scholarly reflexivity also characterizes Berber Hagedoorn and Sabrina Sauer’s article, The Researcher as Storyteller: Using Digital Tools for Search and Storytelling with Audio-Visual Materials. It offers an exploratory critique of the socio-technical affordances of digital tools in terms of support for narrative creation by media researchers. In the form of a case study, Hagedoorn and Sauer present insights from a cross-disciplinary user study involving almost a hundred researchers studying audio-visual materials in a co-creative design process. Their article (and study) consequently provides insights into the search, retrieval, and narrative creation practices of these user groups, accentuating the role of digital tools in meaning-creation processes when working with audio-visual sources, where interaction is always pivotal.

Scholarly experiences from working with digital tools also appear in the article, Speech Analytics in Research Based on Qualitative Interviews. Experiences from KA3, co-written by Almut Leh, Joachim Köhler, Michael Gref and Nikolaus P. Himmelmann. The article presents results from the project KA3, “Kölner Zentrum Analyse und Archivierung von AV-Daten” in which advanced speech technologies have been developed for indexing and analysing speech recordings from the oral history domain. Tools for speech recognition do not yet produce perfect transcripts. However, by adapting new language models and algorithms, word error rates can be drastically reduced. The article discusses the current state of speech recognition software and automatically generated transcripts, and argues that some tools can, in fact, already be used by DH-scholars—even if current performance rates are not totally adequate.

Sufficiency (or abundance) of digital sources, and the challenges that digital archives pose for historiography, are also focal points of Sarah-Mai Dang and Alena Strohmaier’s article, Collective Collecting: The Syrian Archive and the New Challenges of Historiography. The abundance of digital material as well as the practice of curating—of selecting, structuring, and providing access—have become a key activity in digital media practices, they argue. Then again, massive digitization makes histories appear as well as disappear. Via a case study on the so called Syrian Archive, Dang and Strohmaier discuss how concepts such as authenticity and provenance relate to current media practices. Since the ongoing Syrian war is also a propaganda conflict, authenticity has become a major representational issue. Taking the complexity of audiovisual journalism and digital archiving with regard to the Syrian crisis, Dang and Strohmaier examine various challenges of historiography: for what purpose are videos distributed and stored, and what kind of ‘truth’ is actually preserved in the Syrian Archive?

Media archival constraints are also apparent in Indrek Ibrus and Maarja Ojamaa’s article, Newsreels versus Newspapers versus Metadata—A Comparative Study of Metadata Modelling the 1930s in Estonia. Their article offers both a historical and comparative example around the ways in which audiovisual and verbal digital archives model our understanding of the past. Ibrus and Ojamaa focus is put on content metadata schemas, including their role in modeling histories as well as framing usage of audiovisual databases. The article compares how different metadata schemas for newspaper articles and newsreels from the 1930s model their objects. By researching two Estonian digital databases—the Analytic Bibliography of Estonian Journalism and the Estonian Film Database—Ibrus and Ojamaa make the claim that these metadata schemas shape contemporary perceptions of historical realities in quite different ways.

Binary differences in gender are the provisional topic of the article, Describing Gender Equality in French Audiovisual Streams with a Deep Learning Approach, co-written by David Doukhan, Géraldine Poels, Zohra Rezgui and Jean Carrive. Based on the analysis of some 700,000 hours of French audiovisual content (television and radio), the article focuses on the amount of time that men and women speak on air, so called speaking-time. Via the usage of a specific software, the authors have measured a certain, Women Speaking Time Percentage (WSTP), a statistical estimation that uses automatic speaker gender detection algorithms based on acoustic machine learning models. The article presents a variety of WSTP statistics, presented across channels, years, hours, and regions. Results show that men speak twice as much as women on French TV and on the radio (in 2018). In order to further monitor gender equality in audiovisual media, the authors have accordingly released their tool in open-source.