{readelan} is an R package with the explicit goal to facilitate reading ELAN files into R for further analysis.
The {readelan} package is a fast yet lightweight package with only one dependency outside of core packages, namely {xml2}. The {xml2} package is central to access the inner structure of ELAN files, which are all fundamentally XML files.
Why was {readelan} needed?
In my experience, many researchers in linguistics and related field (e.g., psychology, cognitive science, anthropology, etc.) use the ELAN annotation software regularly, and simultaneously use the R programming language for analysis. Despite this, there was no dedicated package on CRAN for reading ELAN files.
At least a couple of other packages on CRAN have functions to read ELAN annotation files (.eaf) into R, but these packages are not targeting all the different files associated with ELAN, and they are also part of larger packages with a much broader scope – e.g., {act} and {phonfieldwork}. Benchmarking against these other packages indicate that {readelan} is fast option:
Unit: milliseconds
expr min lq mean median
readelan::read_eaf(eaf_file) 3.750718 3.886996 5.138851 4.924746
phonfieldwork::eaf_to_df(eaf_file) 13.535165 14.084316 88.521082 14.151778
act::import_eaf(eaf_file) 61.852629 67.594326 79.324778 68.738691
uq max neval cld
5.826621 7.55366 10 a
14.800122 754.83798 10 a
84.518526 153.25400 10 a
Can I read multiple files?
Yes! All read_ functions in {readelan} accept either a single file or a vector of files as input (see also Reading multiple files).
Can I read files online?
Yes! Since the functions rely on the {xml2} package, which can read files directly from an online location (as long as it is structured as an XML file), you can simply point the read_ functions in {readelan} to the URL of the file itself and it should read it, assuming you have an internet connection.
What is the function output?
The read_ functions of {readelan} output base R data frames. If you read multiple files, the individual data frames (from each input file) are already combined together into a single data frame in the output.
If you prefer to work with {tidyverse}-style tibbles, you will need to convert the output:
Yes! In read_eaf(), you can use the arguments tiers (or xpath if more advanced) to specify specific tiers by names or types to read. This speeds up the process of reading the data, since only some tiers need to be read.
For instance, if I know that I only want to read the tiers “Lexem_Gebärde_l_A” and “Lexem_Gebärde_r_A” in this file from the DGS-Korpus, I can specify this and read the file much faster.
Unit: milliseconds
expr
readelan::read_eaf(dgs_file)
readelan::read_eaf(dgs_file, tiers = list(tier = c("Lexem_Gebärde_l_A", "Lexem_Gebärde_r_A")))
min lq mean median uq max neval
727.2490 727.2490 727.2490 727.2490 727.2490 727.2490 1
419.4928 419.4928 419.4928 419.4928 419.4928 419.4928 1
Can I write files back into ELAN?
No. The goal of {readelan} is to allow for the reading of ELAN data into R, not the other way around. ELAN has great options for importing data from character-separated values (e.g., .csv) files and other file formats. However, if you do require the possibility to write data back into ELAN files, you should look at the {phonfieldwork} package.
Should I cite {readelan}?
My personal opinion is to cite software for two reasons: to credit the work (especially if it is open source and/or voluntary-based) and for reproducibility. You can cite {readelan} by using the citation("readelan") function in R (see also Citation).
I found a bug
Oh no! Please report it by filing an issue on GitHub. You may also try to contact the maintainer via email to report a bug or ask for assistance, but I may not be able to respond to such requests quickly.