Access to academic heritage: ENC theses available online
Camille Carette
Abstract
Led by the library of the French School École nationale des chartes (ENC), ThENC@ is a multi-layered project whose aim is to digitize and upload onto a single online platform academic works, the particularity of which is that these theses written by the School archivists-paleographers are not actually held by the School library. This project addresses the issues of digitization and restoration of documents, constitution of a corpus when the physical documents are not stored in a single place, copyright and open access, metadata work, and creation of a digital library. It shows the important role of libraries in accessing both old and recent academic works online.
Keywords
Theses; Academic works; Digitization; Digital library; Archives
Article
Introduction
In November 2021, the library of the École nationale des chartes (ENC) launched a website called ThENC@.[1] Acting as a digital library, this site provides access to some of the theses produced by students at the School. However, the project that led to the creation of this website began years earlier and encountered multiple obstacles—the main one being that none of these theses are held at the School itself or in its library.
Why aren’t the theses in the School library?
A school specializing in history and heritage
Founded in 1821 and located at Sorbonne University for a long time, before moving near the Louvre and the National Library of France a few years ago, the École nationale des chartes is a two-hundred-year-old school that mainly teaches historical and philological sciences and trains future heritage professionals.[2] It has about 170 students: archivists-paleographers, master’s students specializing in digital technologies and digital humanities, and doctoral students.
The archivist-paleographer degree is the School’s historic curriculum. After a demanding entrance exam, these students become trainee civil servants and follow a four-year program, at the end of which they must write a thesis.[3] After their studies, they mainly become researchers, archive curators or library curators.
A complex situation: theses not so easy to read
The specificity of the theses written by archivists-paleographers is that they are not PhD theses, but “school theses”. More importantly, they are considered as private documents belonging to their authors, and therefore have the status of private archives and cannot be read without their permission. Consequently, they do not fall under the same obligations intended to facilitate their consultation by the research community, nor are they held in a university library as it is the case with doctoral theses.
Theses produced before 1961 can be found in many places: in the institution where the author may have worked, such as municipal libraries or departmental archives, in the family archives of their descendants, in a learned society, or even in the possession of someone to whom they have donated a physical copy. The only physical copy of a particular thesis was recently found at Harvard University, for example!
In 1961, an agreement between the School and the French National Archives made the deposit of a copy in the National Archives mandatory. Having all the copies located in the same place was a good thing, but despite this, it did not make the consultation of these documents much easier, as the authorization of the author remained necessary. Therefore, even today, users must contact the National Archives, which in return contact the author or their legal claimants to ask permission.
As of today, we do not know the location of 38% of the 3,000 theses that have been written since 1849, the year of the first thesis defence at the École nationale des chartes, but one thing is certain: with the curious exception of one of them, none of them are in the School library. This leads to a confusing situation for library users, who often think they can consult those documents there.
What could be done? Asking authors, managing legal difficulties
One way to solve these consultation problems was naturally to launch a digitization project, led by the School library. But before we could start to digitize anything, we had to get permission from the authors—and finding them was no mean feat. The task became even more complicated when the authors had died, and their heirs had to be found. So whilst the first authorizations were collected in 2011 and 2012, it was only years later that the first documents could be digitized. As of today, it is possible that authors who might well agree to participate in the project are still not aware of its existence.
Another key point of the project was its legal aspect: it was clear from the start that not all theses could be put online in open access. Indeed, several scenarios complicated the situation. In a first case, an author could authorize the online publication of their thesis, but limit the access to library readers only. In another case, they could agree to their thesis being posted online in open access, but it turned out that a printed version of the thesis had been published years earlier, which made the publisher’s permission mandatory as well. In a last case, it was the content of the thesis that could pose a problem: if all the illustrations used were not free of rights, then not all the pages of the thesis could be put online in open access.
These difficulties had to be considered very early on. It was up to the library to ensure that the illustrations used did not cause any problems, or that a published version of the thesis did not exist. But above all, even without knowing how those theses were going to be published and valued online, it involved thinking about a future solution that would make it possible to control access to certain documents or parts of them. Therefore, using national or international open access platforms was not a suitable option for us.
From finding the documents to putting them online
Locating the documents: an inter-institutional work
As mentioned, physical documents prior to 1960 are scattered across France—and sometimes beyond its borders. The help of other professionals was essential in this quest, and it was very often librarians, archivists or researchers who made the presence of a thesis in an institution known. The ThENC@ project was therefore an opportunity to forge partnerships with other institutions: libraries, of course, but also archive centres or learned societies.[4]
One of those major partnerships is, of course, with the French National Archives, where more than 57% of theses are held. A very beneficial partnership with the IRHT also developed.[5] When the theses available in other institutions were not directly digitized by these institutions, the IRHT staff members digitized them during their visits there. Collaboration between institutions was therefore, and still is, one of the pillars of this very particular project, in which bridges are made between archives and libraries in order to reconstitute a corpus.
Finally, digitizing!
The preservation and physical preparation of documents is important if we want to be able to obtain any digital data. In order to prepare the documents for digitization, on-site checks were necessary—when possible, that is when the documents were located at the National Archives. The oldest and most damaged ones have been restored in the library of the École nationale des chartes. Sometimes, it took weeks to restore a single thesis (Vielliard & Gaudemer, 2021).
Thanks to funding from PSL University, all thesis summaries—called “positions de thèse” in French—were digitized in 2018. In 2019, more than 300 theses were then digitized, and ocrized when possible, which marked the real start of the project (Université Paris Sciences et Lettres, n.d.). In 2020 and 2021, the project also received help from the DIM-MAP,[6] under the project name “Ouvrir le patrimoine académique : les thèses ENC accessibles en ligne” (Postec & Mathis, n.d.). This made it possible to continue digitizing documents, as well as to create a website to publish these academic works and to make them available online.
Finding the technical solution
A first corpus of digitized documents was created, and the next step was to make it accessible to the public. Creating a platform from scratch was quickly discarded, due to the obvious difficulties in maintaining in-house tools over the long term. The library encourages the authors of the most recent theses to upload them in priority on platforms such as the French open access platform HAL,[7] on which about thirty theses are available to date, and was even at the origin of the creation of a new type of document on this platform: “institution thesis”.[8] However, national platforms of this kind for submitting thesis or research work could not be considered for the whole corpus, as the majority of the documents were not native PDFs—and that would not have solved the question of the documents which had to be published in restricted access!
Creating a digital library dedicated to these theses appeared to be the best and most satisfying solution. Thus, ThENC@ is currently powered by Omeka S,[9] an open-source content management system making it possible to manage digital collections stored in an SQL database, and to display them on a website. The choice of this tool was guided by several factors: a long-standing preference of the School and its library for open-source software, the presence of a large community of users in France, and Omeka S having interesting functionalities for the project. Among them, the possibility of managing restricted access for non-logged-in users, or that of using the Apache SolR search engine.
Improve user experience by improving metadata
Creating records for all theses
The scattering of theses across the country meant one thing: no one had a complete list of all the theses written since 1849. Some theses were catalogued, others not. At the beginning of the project, the School librarians catalogued the theses held in the French National Archives since 1961—but it was not all the theses. Even then, different metadata standards were used in two French online catalogues for university libraries: Unimarc records were created in the union catalogue Sudoc,[10] and XML-EAD was used in Calames,[11] the public catalogue for collections of manuscripts and archives at higher education and research institutions. The lack of a full list made it more difficult for researchers looking for a specific thesis.
One of the roles of ThENC@ is therefore to provide the complete list of theses, including those which are not digitized. This list was compiled thanks to the thesis summaries mentioned earlier, which let us know that a thesis existed even though we may never be able to find it.
Although based on the detailed records mentioned above, ThENC@ records are simple and use Dublin Core properties—ideal for data interoperability, and for exports using the OAI-PMH protocol. This choice was influenced by the choice of software, as Omeka S and Dublin Core work together very well. By doing this, the library, in addition to providing access to documents that are not in its physical collections, also acts as a collection creator, aiming to federate the data into a single place.
Enriching metadata
This project was, in many ways, an opportunity to enrich the existing metadata sets. One of the major operations was the creation of authority records for all archivists-paleographers who did not already have one in the IdRef platform,[12] in collaboration with the bibliographic agency for higher education (Abes).
In order to help end users, other metadata enrichments have taken place directly in the ThENC@ digital library. One of them was the addition of the location of the printed copy, when known, to help people to find it when the thesis is not available online or digitized. In order to improve the user experience, there is also an ongoing project to classify theses by thematic collections. The more thoughts there are on how to help researchers in their quest for these hard-to-find documents, the more ideas for refining the metadata are generated.
Creating bibliographies related to the theses
The truth must be accepted: some theses will never be online, either due to lack of authorization from the author, or for lack of knowledge as to the whereabouts of the thesis. So, what can be done to help users access the information those documents may contain? One of the solutions is to report all the work of an author relating to their thesis: articles, conferences… sometimes even another version of the thesis itself, published as a printed book.
Zotero bibliographies have been created and linked to the records. It is a long and still ongoing project—as well as a collective one, which involved students from the School. By studying in detail the background and the career of a class of archivists-paleographers from 1900, they were able to find three theses, all located in a different place (Ceccarelli, 2021). On the other hand, they found no trace of some authors from this year: the older those former students, the harder it is to find information about them and their work, particularly if they did not pursue a career in heritage, teaching or public institutions. Some theses are probably lost for good.
Conclusion: where are we now?
Today, ThENC@ has been online for a year and contains more than 500 theses, most of which are searchable PDFs. The project is still ongoing: not only can metadata and bibliographies always be improved, but new digitizations are arriving each year!
ThENC@ is a multi-layered project, which involves the entire library staff and a wide variety of skills. From searching for documents and authors across France to launching a digital library, through restoring old documents, digitizing, creating records, or dealing with open access issues, this project shows, more than ever, the role of libraries in accessing old and recent academic works online, including when these documents are outside their collections.
Bibliography
Ceccarelli, G. (2021, April 8). La promotion 1900 : un chantier du projet ThENC@. https://chartes.hypotheses.org/7415
Postec, A., & Mathis, R. (n.d.). ThENC@. Ouvrir le patrimoine académique : les thèses ENC accessibles en ligne. Retrieved February 1, 2023, from https://www.dim-map.fr/projets-soutenus/thenca/
Université Paris Sciences et Lettres. (n.d.). Thenc@. Thèses ENC accessibles en ligne. Retrieved February 1, 2023, from https://explore.psl.eu/fr/ressources-et-savoirs-psl/projets-psl-explore/thenc-theses-enc-accessibles-en-ligne
Vielliard, F., & Gaudemer, L. (2021, November 26). Du manuscrit à l’imprimé. Le thèse de Léopold Delisle: aspects matériels et intellectuels. [Conference session]. ThENC@. Ouvrir le patrimoine académique : les thèses d’École des chartes accessibles en ligne, Paris, France. https://hal-enc.archives-ouvertes.fr/hal-03482814v1
- https://bibnum.chartes.psl.eu/s/thenca/ ↵
- https://www.chartes.psl.eu/en/rubrique-ecole/institution-au-service-histoire-du-patrimoine-1821 ↵
- https://www.chartes.psl.eu/en/cursus/the-diploma-of-archiviste-paleographe ↵
- https://bibnum.chartes.psl.eu/s/thenca/page/partenaires ↵
- Institut de recherche et d'histoire des textes, a French research unit (CNRS) that mainly works on manuscripts. ↵
- Domaine d'Intérêt Majeur "Matériaux anciens et patrimoniaux" (DIM MAP))—a research network dedicated to the study of ancient and patrimonial materials. ↵
- https://hal.science/ ↵
- https://hal-enc.archives-ouvertes.fr/search/index/?q=%2A&docType_s=ETABTHESE ↵
- https://omeka.org/about/project/ ↵
- http://www.sudoc.abes.fr ↵
- http://www.calames.abes.fr/ ↵
- https://www.idref.fr/ ↵