Skip to main content



The Speech Corpus of the Spanish/Portuguese Border (FRONTESPO-COR) is a work that is constantly being updated. Its core consists of almost 300 hours of recordings, most of which are on video, the result of interviews with 287 informants from different age groups carried out in 64 towns on both sides of the border between 2015 and 2017. Of these recordings, a small sample is currently available to the public; this amount will increase progressively over the coming months. If any researcher, teacher, community member, etc. is interested in the materials that have not been uploaded yet, please contact us at


Furthermore, we are working on digitalizing and editing other primary materials from the border region. We would greatly appreciate receiving any unedited recordings (linguistic surveys, ethnographic research, family recordings, etc.) for their open publication according to our usage license; please write to us at Thank you for your collaboration.


Forma de cita

Álvarez Pérez, Xosé Afonso (dir.) (2018 - ): Corpus oral de la frontera hispano-portuguesa, Alcalá de Henares: FRONTESPO. <> [Consulted on: <date>]. ISSN 2605-0471.


Organisation of the Corpus

To facilitate consultation and management of the corpus, each interview has been segmented into several recordingscentred on different topics — normally, between six and eight sessions, which each last between 10 and 20 minutes on average — which are presented independently in the FRONTESPO-COR.


The recording entry is the main means (but not the only means) of accessing all of the corpus information. The main screen of the FRONTESPO-COR displays an abbreviated version that links to three elements:

  • Geographic location entry.
  • Informant entry.
  • Multimedia files.

By clicking on the recording title, you can access the full entry, which offers several additional elements:

  • Description of the content (in Spanish, Portuguese and English).
  • Topical classification.
  • Multimedia field, where you can watch the recordings directly on our page, or you can open the repositories in SoundCloud, Vimeo and YouTube.
  • Transcription of the content, both aligned with the audio and video (ELAN) and in text (PDF and TXT).


The geographic location entry can be accessed from the recording entry or from the map located to the left of the main window. This entry contains;

a) a map with the town's location

b) the delimitation of the administrative territory to which it belongs (local government, province/district)

c) a brief description of the main geographic and historic characteristics of the town, with links to other sources of information, when appropriate

d) a list of the informants interviewed in each location and the recordings in which each participates, with the option of directly consulting the recordings and transcriptions.


The informant entry provides the individuals’ names (except when they request to remain anonymous), their sex and age group, as well as a brief description of their sociolinguistic profile: place and date of birth, profession, education, travel outside of the town, as well as other circumstances that may be relevant for analysing the interviews. As with the above cases, the recordings can be accessed directly from this entry.

Consulting the Materials

The main screen of the FRONTESPO-COR is divided into three clearly defined sections:

map of survey locations, from which the geographic location entry can be accessed for each location in the network and, from there, you can find the interviews available from each town.

A list of recordings, with 10 elements per page, which can be browsed sequentially by using the arrows and numbers at the bottom of the page.

search box, which allows you to narrow down the elements displayed on the map or list according to several search criteria: town name, local government/province to which it belongs, sociolinguistic profile of the person interviewed, title of the recording, topic of the conversation and whether there is a transcription of the interview.

To return to all available elements, click on Clear.



At each location in the network, several people were interviewed, always ensuring that there was representation from the two sexes and the different age groups (a) up to 50 years old; b) between 50 and 75 years old; c) over 75 years old), with the aim of avoiding the “archaeological” focus of traditional dialectology.

A prototypical interview consists of three differentiated parts, both in terms of the topics broached as well as in terms of the more or less directed nature of the interview (in general, we moved from more to less directed in the sections presented below)

a) Semantic fields of traditional life (agriculture, livestock, etc.). There is a common questionnaire for the entire network, in order to obtain a homogeneous corpus that allows for geolinguistic and dialectometric research, research on lexical obsolescence, etc.

b) Sociolinguistic aspects: judgements about their own variety and that of neighbouring towns (on both sides of the Line), linguistic attitudes and uses (especially in the case of minority varieties and enclaves), code switching or mixing in contact with nationals from other countries, etc.

c) Aspects related to border life and modern life in a rural area: rural depopulation, contraband and trade, good or bad relations with neighbours on the other side, etc.




Multimedia Files

The majority of the interviews in the FRONTESPO-COR have been recorded redundantly, with a digital recording (audio only) and/or with a video camera (sound and image). This policy minimises the risk of losing data due to malfunction of the material and provides researchers and the interested public with three different resources depending on their needs. The characteristics of the recordings (with very rare exceptions) are as follows:

a) Audio from a digital recorder connected to an external microphone. Sampling frequency: 48 kHz, depth of 24 bits.

b) Audio and video. High-definition image recorded in AVCHD 2.0. (specifications: 1920x1080, 50 frames per second; uncompressed PCM audio). After editing, the files are uploaded in MP4 format to the repositories with compressed audio.

c) Audio extracted from the video camera recording: uncompressed PCM, with 48 kHz sampling frequency and depth of 16 bits.

The video recordings are stored on our Vimeo and YouTube channels, while the two audio sources are available on Soundcloud. We also have versions in WebM and Ogg formats, which will be released soon on Wikimedia.

The Library of the University of Alcalá hosts the primary data generated in the framework of our project in the repository e-cienciaDatos. The direct link to the dataverse is

The files are available in the repositories indicated above, in accordance with license CC-BY-SA 4.0, and therefore they can be freely downloaded and reused.

We would appreciate any type of collaboration in the long-term preservation of these materials; do not hesitate to contact us at


The transcriptions of the interviews are orthographic, according to the conventions of the languages of communication of each interview, with minimal concessions to dialect phenomena (eliminating or adding sounds, altering the order of elements, etc.); we will provide the criteria adopted shortly.

The transcriptions are offered in two different formats:

a) Transcription aligned with the audio and video, in an ELAN file. This medium can be easily used by researchers to add their marks and morphological, syntactic, or discourse analysis tags, etc. In this case, we would greatly appreciate if you could send us the files created, so as to make them available to all on our website.

b) Text in traditional transcription format, both in an editable version (TXT) and non-editable version (PDF).


PLEASE NOTE: The headings of the .txt and .pdf documents indicate the version number of the transcription and if it has been reviewed. Those that have not been reviewed should be considered provisional and used with special care. As it is possible to directly compare them to the audio and video files, we decided we would rather offer the working versions as soon as they were available, so as to facilitate access to the data to the communities and people involved.