The Digitisation of the Frank Scholten Collection at the Special Collections of Leiden University Libraries

Combining Innovative Digital Technologies and Physical Action in a Spatial Setting for Increased Findability of a Large Photo Collection

Maartje van den Heuvel and Saskia van Bergen

Photography is becoming increasingly important as a visual resource for research and education. Visual literacy has taught individuals to be critical of the alleged documentary value of a photograph. To interpret photographic imagery effectively, one must first clearly understand the collection and provide accurate initial information about it. Sorting and adding basic information to a photographic collection so that it becomes accessible and its rich content can come alive in the minds of researchers, students, writers, and image editors is a profession in itself. In a time when the combination of AI with photography is still perceived with suspicion, often in the framework of copyrights and authorship, we would like to present a case in which AI and, especially, image recognition are brought into action positively. Due to the large number of images involved, the work described in this article could not have been realised by a single or few persons in real life without such collaboration.[1]

When the Frank Scholten (1881–1942) collection was transferred from The Netherlands Institute for the Near East (NINO) to Leiden University Libraries, the library was confronted with a huge photography collection of ca. 14,000 negatives; 13,000 photographic prints; 65 albums with photographs; and postcards, cut-outs, and other archival materials. The collection was unsorted, uncatalogued, and, because of the vulnerability of the photographic materials, unavailable to the public. Although the contents were of enormous academic importance, the size and disorder of the collection made it impossible for the standard number of staff to handle. Finally, the collection was physically sorted and made digitally accessible through a fruitful combination of AI technology and human physical labour in the form of workshops with groups of volunteers. This innovative combination had to be developed on the spot.[2]

The Palestinian Years of Frank Scholten

Frank Scholten was the son of a wealthy Amsterdam family.[3] He studied art in Berlin from 1908 until the First World War, when he was forced to return to his hometown. Originally from a Protestant background, his rebellious spirit came to the surface when he converted to Catholicism. This conversion was a true statement in a society where religious segregation was complete. He was also not afraid to show his queer lifestyle in an era dominated by moral standards that included traditional values of family life, which led to him being arrested by the police for behaviour that was considered immoral at the time. Perhaps in search of more freedom outside this restrictive environment, but also to find the landscapes and environments of the Bible, which had become a true passion for him since his conversion, Scholten sailed to the Holy Land of Palestine in 1921 (Figure 1). Frank Scholten had taken up photography by then. His artistic interests and training as a visual artist in Berlin, combined with the mindset of a true collector, led him to make, collect and categorise images almost obsessively.

Figure 1 – Frank Scholten in Palestine, photographing himself with his camera among other people in a mirroring window, Palestine (exact location unknown) 1921-1923.

Scholten’s initial aim was to create an illustrated Bible based on his journey. He annotated photographs of biblical sites and scenes that reminded him of biblical passages or characters with references to specific biblical quotations. Scholten made these annotations on the back of the photographic prints, on the envelopes in which he packed his negatives, or on the pages of the photo albums that he created. However, he was so fascinated by everything he experienced and saw that he ended up photographing and ‘visually collecting’ virtually everything, including professions, crafts, landscapes, agricultural methods, folk costumes, and festive ceremonies of Jews, Christians, and Muslims. In addition to his own photographs, his accompanying archives contain countless photographs taken by other photographers or commercial photo studios, postcards, cuttings from books and magazines, and other visual material of images and phenomena Scholten found interesting. His efforts could be described as encyclopaedic, although during his lifetime he was only able to begin to sort and categorise the material.

The History of the Collection: From Oblivion to Renewed Interest

Because of his friendship with Professor Liagre Böhl, who was one of the directors of NINO, Scholten bequeathed his photographic archive and documentation to this research institute upon his untimely death in 1942.[4] Apart from a few hundred photographs that Frank Scholten used for his book La Palestine illustrée (1929; later editions in German [1930], English [1931], and Dutch [1935]), most of the photographs were unknown to the public, and the collection hibernated for more than six decades.

The international project Crossroads (running from 2017–2022) initiated by Leiden University, studied the interrelated history of the Arab–Christian communities in Palestine during the formative years of the Middle East (1920–1950).[5] Frank Scholten’s photographs were an important source for the project, as they visualised the region during the fascinating years from 1921 to 1923 when the Ottoman rule was slowly giving way to other influences, marked by social changes and many groups from different religious and cultural backgrounds migrating in and out of the country.

Frank Scholten had a broad interest in and access to all kinds of social circles. Of all the photographers who entered Palestine at that time, Frank Scholten’s collection is the least likely to show a political (colonial) agenda. The area is constantly the focus of religious and cultural groups who see it as their roots and homeland, as the current war shows. The photographs give an insight into many places and the way people relate to them in many instrumental, cultural, and religious customs and ceremonies.

Unlocking the Collection With Image Recognition

Photo collections are only navigable and physically findable if what is seen is identified and keyworded correctly and the collection is numbered and physically sorted. When Frank Scholten’s collection arrived at the University Library, only the negatives were sorted and categorised (Figure 2). Scholten had placed the negatives in envelopes himself, on which he had written keywords. These keywords were usually just a geographical location, sometimes with a thematic term added: for example, ‘Jerusalem Juif’ (‘Jewish Jerusalem’, meaning Jewish heritage and scenes in Jerusalem) or ‘Jaffa Musulman’ (‘Islamic Jaffa’, meaning Islamic heritage and scenes in Jaffa). However, there were also thousands of photographic prints (around 13,000, according to the final count of the digitisation machines), and these prints were completely disorganised and loosely stored in the moving boxes in which they had arrived at the library. The large number of prints, combined with the lack of organisation made cataloguing the individual photographic prints an impossible task. The photographic prints likely matched the negatives, but the question remained as to who would be able to match them. It would take years of work to catalogue the prints with only human eyes.

In 2020, we contacted Giles Bergel of the Visual Geometry Group at the University of Oxford for help with this task using image recognition.[6] Bergel had already used image recognition technology extensively in his research on Scottish printed chapbooks (Dutta et al., 2021). For the Scholten photo collection, Bergel and his colleagues created an online tool that automatically matched the digitised negatives and prints (Figure 3). This tool turned out to be important for the digitisation process itself as well. There was no need to sort the photographic prints before digitisation; we could have them scanned by box and numbered by the software randomly (Figure 4). The match with the negatives would be recorded in a concordance afterwards. The file names of the negatives would later become the basis of the inventory numbers of both negatives and photographic prints.

Figure 2 – The Frank Scholten negatives in their original packaging: ca. 14.000 nitrate negatives in envelopes with a short caption by the photographer. Photograph ©UB Leiden, 2020.
Figure 3 – Screenshot of Oxford University’s software, linking the image files of the photographic prints to their matching negatives.

We found that not all 13,000 photographic prints had a matching negative. Ultimately, 31% of the negatives did not have a matching photographic print, and vice versa: for 24% of the photographic prints, no matching negative could be found.[7] We were already familiar with the phenomenon in photo collections of negatives that did not have a matching print. After all, photographers send photographic prints by post to the editors of magazines or books for which they provide illustrations, or to their friends or family, keeping the negative at home or in their studio. Some negatives are never printed. Photographic prints without a negative are less common. Photographers tend to keep their negative archives with them. We, therefore, did not know whether part of Scholten’s negative archive was lost or had gone elsewhere. These last questions remain unanswered to this day.

To our surprise, the Visual Geometry Group’s software was capable of more than only matching photographic prints to their exactly matching negatives. The image recognition also grouped negatives with the prints with which they had something in common. Thus, negatives and prints were grouped depicting, for example, the same location, but a different event. Similarly, the same people were grouped, although appearing in separate locations. This capability proved ideal for the cataloguing work. By clustering the prints and negatives, it was possible to work much faster. In a year and a half, two temporary staff members managed to create basic records for the whole collection.

Working With Volunteers

Due to the limitations of the 2019–2021 COVID pandemic, metadata were mainly collected from the digital files of negatives and prints. Until then, no time or effort had been spent on repackaging and sorting the physical collection. Further examination of the prints revealed that in several cases, the photographer had written additional information on the back of the prints. We wanted to capture this information in the metadata as well. Moreover, the physical collection needed even more care:

  1. We needed to assign an inventory number to each photographic print for identification. These numbers had to be noted on the verso of each physical photographic print and linked with the digital version (Figure 4a);
  2. The information written by Frank Scholten on the backsides of the photographs needed to be included in the metadata records;
  3. The photographic prints also needed to be rearranged so that they could be stored in the same order as the negatives (Figure 4b), and this reorganisation would be key for visitors and collection employees to find their way around the collection;
  4. Lastly, we wanted to repackage the photographic prints in Melinex transparent pocket sleeves, as doing so would protect them from wear and allow visitors to leaf through them easily at the same time.
Figure 4a – Temporary stacking of the photographic prints after digitisation – the computer added the numbers, the digitisation employee wrote the range of numbers on the label. Photograph ©UB Leiden, 2023.
Figure 4b – Workshop participant with concordance list for the sequence in which the photographic prints had to be picked amd repacked. Photograph ©UB Leiden, 2023.

Since these four actions, too, would imply significant work for the library, we decided to leverage our network. Fortunately, by that point, physical meetings were allowed again. We found volunteers with diverse interests, by publishing an article and announcements in local media and through our own social media channels.

Some volunteers were amateur photographers who had an interest in photography in general and loved to work with historical vintage photography. Others were interested in the specific subject matter. We also worked with volunteers with identities rooted in or related to Palestine. The workshop had Jewish people, people from Arab Palestine families, and also people who had been working in Palestine from NGOs or diplomatic positions. All volunteers worked together on the Scholten photo collection. Finally, we could rely on the library’s regular group of alumni and Friends of Leiden University Libraries association members.

A paid coordinator was employed to organise the workshops. He made sure that everything was well arranged and that all volunteers had the facilities, materials, and the right instructions needed for their tasks. We provided training in the handling of vintage photo collections and supplied all volunteers with the necessary tools, such as gloves, transparent pocket sleeves, and printed labels.

The Workshops

During the first workshop, the volunteers worked in pairs to match the physical and digital information of all the photographic prints. This task was done in several steps.

One volunteer took a photographic print from the stack and checked the image on the site created by the Visual Geometry Group. When a match was found, the volunteer noted the inventory number on the backside of the physical print. At the same time, their companion typed the text written on the back of the print in a spreadsheet, together with the inventory number. Later, this information was added to the metadata records.

The next step was to rearrange the print order to match the negatives’ order and to repackage the prints in boxes with transparent sleeves. This task was the big challenge: it required addressing how to deal with thousands of prints in an orderly fashion.

The match between photographic print and negative, made by the Visual Geometry Group’s software, turned out to be the solution. This concordance resulted in lists that contained two columns:

  • The first column contained the list of photographic prints sorted in the sequence in which the computer had seen the images during the reproduction process.
  • The second column contained a list of matching negatives.

The first step was to rearrange the concordance so that the first column was sorted according to the list of negatives. This order was the order given to the collection by photographer Frank Scholten.

Subsequently, we divided the list into 50 rows matching 50 photographic prints. This number corresponded with the material that could fit into one storage box. Thereafter, a print was made for each box. Each volunteer received a box filled with empty pocket sleeves, as well as a printed list. The aim was to collect all the prints destined for the new box.

Figure 5 – Volunteer workshop participants picking photographic prints, according to the sequence indicated on the concordance lists, from the temporary large brown boxes; subsequently storing the photographic prints in melinex pocket sleeves for permanent storage in the right order, as invented by the photographer. Photograph ©UB Leiden, 2023.

In preparation for the workshops, the original boxes with the photographic prints were positioned in a large rectangle, with each box indicating the range of the number of prints it contained. The volunteers were handed one new storage box at a time, together with a printed list of the numbers that needed to be selected from the boxes and the sequence in which they had to be stored in the sleeves. The volunteers all walked around with their own lists and boxes, selecting the prints from the boxes in the rectangle (Figure 5). All these steps were done in silence and with the utmost concentration, because the material was fragile, the numbers and the work precise, and the process prone to easily disruptioned. A mistake in reading or interpreting a number could make a photograph virtually untraceable.

After 17 whole-day workshop sessions, all the work was completed. The photographic prints were repacked in pocket sleeves and albums and labelled with the identifiers of the negatives, and the annotations written on the back by Frank Scholten were transferred to the metadata records in our catalogue. Approximately 3,000 photographic prints for which AI could not find a matching negative were added to albums with the same geographical location as the subject: photographs of Jaffa were added to the ‘Jaffa’ albums, and so on. In this way, all the photographic prints were grouped in the clearest and most helpful way for future users.

Meanwhile, we also made the collection available online through our repository Digital Collections.[8] Because the collection is free of copyright, we were able to make all the scans available for download as well. This feature attracted worldwide interest. In 2023 and 2024, the collection received around 20,000 visits. The majority of users are from the Netherlands, but the second largest group of users are from the region itself (Israel, Palestine, and Jordan). We also calculated that users spend, on average, a considerably long time on the site (about an hour) and perform, on average, about 40 actions (page views, downloads, internal site searches). These statistics indicate an investigative use.

Visual Search

Despite all the efforts expended on cataloguing the collection, the records still contained only a basic set of metadata. In general, it is difficult to capture the richness of a photograph in a set of keywords. The result will always be incomplete and subjective. Therefore, we wanted to test whether it was possible to improve the searchability of our photo collections using visual search. One of our developers built a proof of concept based on Weaviate and OpenAI. It uses IIIF to create a new user interface for the collection. For example, if a user searches for ‘cars’, they will get all images with a car in them, even if the keyword ‘car’ is not in the metadata.

We are aware that the results may be biased. Given the sensitivity of this collection, particularly considering recent developments in the Palestinian–Israeli conflict, we have decided not to make this tool publicly available yet. The first step will be to test and train the application with a selected group of users, a plan for this still has to be established. Another project we have in mind is to invite specialists to provide information on specific aspects of the photographs, such as clothing, professions, and religious ceremonies. In this case, the AI tool can help make selections based on visual information, thus supporting the enrichment of descriptive metadata.

With this showcase, we wanted to demonstrate the following:

  1. Innovative techniques are interesting not only for users but also for back-end processes in the library, such as speeding up the cataloguing process.
  2. Crowdsourcing has many faces. It does not necessarily have to involve a large group, and the contribution does not have to be digital, but it can include physical activities as well.
  3. Technology and human physical effort are interrelated. It is not possible to have one without the other. This collection, in particular, which is so large and involves so many different user groups, shows that there is a significant need for physical interaction as well.
  4. Finally, although it is still an open door, digitisation is not limited to scanning and putting the collection online. It is the use of the collection that keeps us engaged, and together with researchers, we are constantly developing new applications to encourage and improve its use. The project has been ongoing for 5 years now, and we are not finished yet.

References

Dutta, A., Bergel, G., & Zisserman, A. (2021). Visual analysis of chapbooks printed in Scotland. In HIP ’21: Proceedings of the 6th international workshop on historical document imaging and processing (pp. 67-72). ACM. https://doi.org/10.1145/3476887.3476893

Kwiecien, T. (2008). Frank Scholten. In Geschiedenis van de Nederlandse fotografie in monografieën en thema-artikelen, in Depth of Field, 25(40). https://depthoffield.universiteitleiden.nl/2540f05en/

Scholten, F. (1929). La Palestine illustrée: Tableau complet de la terre sainte par la photographie, évoquant les souvenirs de la Bible, du Talmud et du Coran, et se rapportant au passé comme au présent. Budry.

Scholten, F. (1930). Palästina: Bibel, Talmud, Koran: Eine vollständige Darstellung aller Textstellen in eigenen künstlerischen Aufnahmen aus der Gegenwart und Vergangenheit des Heiligen Landes. Julius Hoffmann.

Scholten, F., Smith, G. A., & Robinson Lees, G. (1931). Palestine illustrated: Including references to passages illustrated in the Bible, the Talmud and the Koran. Longmans.

Scholten, F. (1935). Palestina: Bijbel, Talmud, Koran: Een volledige illustratie van alle teksten door middel van eigen artistieke foto’s uit het heden en verleden van het Heilig Land. Sijthoff.

Stork, D. G. (2024). Computer vision, ML, and AI in the study of fine art. Communications of the ACM, 67(5), 68-75. https://doi.org/10.1145/3633454

Wasielewski, A. (2023). Computational formalism: Art history and machine learning. MIT Press. https://doi.org/10.7551/mitpress/14268.001.0001

Zananiri, S. (2021). Documenting the social: Frank Scholten taxonomising identity in British Mandate Palestine. In K. Sanchez-Summerer & S. Zananiri (Eds.), Imaging and imagining Palestine: Photography, modernity and the biblical lens, 1918-1948 (pp. 266-306). Brill. https://doi.org/10.1163/9789004437944_009

Abstract

Frank Scholten, a Dutch photographer with a deep interest in biblical landscapes, documented life in Palestine between 1921 and 1923. His photographs captured biblical sites, local crafts, landscapes, and a range of religious and cultural ceremonies. Following his death in 1942, his extensive collection—comprising approximately 14,000 negatives and 13,000 photographic prints—remained largely unorganised and inaccessible to the public.

To facilitate public access, Leiden University Libraries undertook a project that combined artificial intelligence (AI) with manual labour. Employing image recognition technologies, the team matched negatives with corresponding prints, thereby streamlining the cataloguing process. To further enrich the metadata, volunteers from diverse backgrounds—including individuals with personal or cultural connections to Palestine—participated in workshops where they helped to catalogue, repackage, and label the prints.

The collection has since been digitised, and in 2023–2024 it gathered interest considerable international interest, particularly from Israel, the United States, the Netherlands, and Jordan. Although the collection has basic metadata, efforts are underway to enhance its discoverability through the application of AI tools. Given the sensitive nature of the material, these tools are currently being tested in a restricted environment.

The project demonstrates how the combination of technology and physical effort can render large and complex photographic archives accessible to both researchers and the wider public.

Keywords

Image recognition; Digitisation; Crowdsourcing; Photography; Palestine (1921-1923)


  1. For more AI practices in the field of visual arts, see Wasielewski (2023) and Stork (2024).
  2. We would like to thank everyone who has helped to unlock this collection: Sary Zananiri and Karène Sanchez with their expertise; Willemijn Havenaar and Lara van der Hammen who initially catalogued the collection; Abishek Dutta and Giles Bergel of the Visual Geometry Group, Oxford University, for the tool they built for us using Image Recognition; Picturae and especially project manager Frank Pera; Sander Müskens who coordinated and accompanied the volunteer workshops; Rama Mwinyimbegu for making the collection better searchable with AI; and all the volunteers who realized the sorting with dedication and patience: Melissa Allieri, N.A.I. Aulia Izza, S.R.L. Berntsen, R.N. de Bruijne, B.V. Burgess, Salome Erni, Peter Groenewegen, Vera van Heel, Salma Helmi, Lieks Hettinga, Annetje Huizinga, Marijke de Jong, Kenzy Kamel, Mariëtte Keuken, Timur Khan, Jochem Kleinjan, Gerrit van der Kooij, Bert van Loen, Lia Lyutakova, Sander Müskens, Anna Navumchyk, José Oudejans, Kate Pukhovaia, Nama'a Qudah, Basema Salman-Spijkerman, Mara Elif Schön, Ingrid Schroeder, Pauline Seijffert, Iris de Smalen, Rene Spitz, Christel Stapel-Saridjo, Niko Tetteroo, Tsjikke Vlasmat, Bart Wagemakers, M.C. Walraven, Erno Wientjens, Niek Winters, Alberto Zarraga, and Jowan Zonneveld.
  3. For articles on the scope and importance of Scholten's photography, see Kwiecien (2008) and Zananiri (2021).
  4. For a history of the collection, see the Collection description on collectionguides.universiteitleiden.nl, https://collectionguides.universiteitleiden.nl/resources/ubl674.
  5. https://crossroadsproject.net/. For publications resulting from this project see e.g. Sanchez Summerer and Zananiri (2021).
  6. Giles Bergel is Senior Researcher in Digital Humanities in the Visual Geometry Group in the Department of Engineering Science at the University of Oxford. For the Visual Geometry Group, see https://www.robots.ox.ac.uk/~vgg/. Giles Bergel's personal website, explaining the image recognition projects he has been working on can be found at: https://www.printing-machine.org/.
  7. The total numbers were as follows: photographic prints: 12,756; match with a specific negative: 9,736 (76%); part of a larger set: 9,867 (77%); total number of negatives: 13,714; match with one or more prints: 9,513 (69%); part of a larger set: 11,105 (81%); total sets: 7,108 (of which ca. 4,680 consist of one print and one negative).
  8. https://digitalcollections.universiteitleiden.nl/

About the authors

Maartje van den Heuvel (PhD) is an art historian specialising in photography. Since 2007, she has been the curator of photography and photographic technology at the Special Collections of Leiden University Libraries, which not only holds the oldest ‘photo-historical’ collection of the Netherlands but also contains numerous scientific photography collections.

Saskia van Bergen is the head of Services and Collection Information at the Special Collections Department of Leiden University Libraries. In this role, she oversees a team of 14 employees and manages all aspects of access to the library’s Special Collections, including cataloguing, digitisation, and research services. Her team handles the Special Collections Reading Room, digitisation projects, and teaching with special collections. Previously, Saskia served as a project manager in the library, where she led several initiatives, including the implementation of ArchivesSpace and Islandora. She also holds a PhD in Art History from Amsterdam University, focusing on medieval art and book history.

Licence

Digital Object Identifier (DOI)

https://doi.org/10.25518/978-2-87019-330-3.05

Share This Book