Extracting CVs

Extracting controlled vocabularies (CV)

ELAN has the option of adding controlled vocabularies (CV) to annotation files (.eaf) and template files (.etf) for the purposes of constraining the possible input in annotations.

Regular CVs are internal, stored inside the EAF/ETF file itself. External CVs (ECV) are external, linked through a URL inside the EAF/ETF file to an online .ecv file.

The read_cv() function can read both internal and external CVs if the ecv argument is set to TRUE (default). However, this will only work if the URLs are valid and accessible, meaning you need internet access and that the page is not restricted. If so, you may need to set ecv to FALSE, then reading only internal CVs:

library(readelan)

eaf_file <- system.file("extdata", 
                        "example.eaf", 
                        package = "readelan")

read_cv(file = eaf_file, 
        ecv = FALSE)
     filename  url cv_id                                     cve_id lang_ref
1 example.eaf <NA>   pos cveid_89b2ac38-7313-4737-aa5f-19e1231ccb18      eng
2 example.eaf <NA>   pos cveid_89b2ac38-7313-4737-aa5f-19e1231ccb18      fin
3 example.eaf <NA>   pos cveid_d5558ab7-11c3-47d5-9a0f-403724b0e0b7      eng
4 example.eaf <NA>   pos cveid_d5558ab7-11c3-47d5-9a0f-403724b0e0b7      fin
5 example.eaf <NA>   pos cveid_f2eec815-c427-4b61-84eb-cd1e1601c9b4      eng
6 example.eaf <NA>   pos cveid_f2eec815-c427-4b61-84eb-cd1e1601c9b4      fin
       language        value
1 English (eng)      pronoun
2 Finnish (fin)    pronomini
3 English (eng)         verb
4 Finnish (fin)        verbi
5 English (eng)         noun
6 Finnish (fin) substantiivi

If you instead want to read an .ecv directly, you can do so by simply reading it as the file input of read_cv():

library(readelan)

ecv_file <- system.file("extdata", 
                        "syntax.ecv", 
                        package = "readelan")

ecv <- read_cv(file = ecv_file)

ecv
    filename  url  cv_id                                     cve_id lang_ref
1 syntax.ecv <NA> syntax cveid_4990ed36-c1d1-40c6-800e-2bd264d9a89b      eng
2 syntax.ecv <NA> syntax cveid_4990ed36-c1d1-40c6-800e-2bd264d9a89b      fin
3 syntax.ecv <NA> syntax cveid_de8c8f23-6b49-42da-9bd4-5b8b59d8b1da      eng
4 syntax.ecv <NA> syntax cveid_de8c8f23-6b49-42da-9bd4-5b8b59d8b1da      fin
5 syntax.ecv <NA> syntax cveid_f37323ed-9b7e-48d9-bbd6-b9105186ed02      eng
6 syntax.ecv <NA> syntax cveid_f37323ed-9b7e-48d9-bbd6-b9105186ed02      fin
       language       value
1 English (eng)     subject
2 Finnish (fin)    subjekti
3 English (eng)   predicate
4 Finnish (fin) predikaatti
5 English (eng)      object
6 Finnish (fin)     objekti

Additional arguments

Writing full paths

The full_path argument simply determines whether the full file path input should be written to the output data frame as the filename (if TRUE; e.g., “/path/to/elan_file.eaf”), or whether it should be shortened to the base name only (if FALSE, the default; e.g., “elan_file.eaf”).

Progress bar

If progress is set to TRUE, a progress bar will be printed to the console as files are read. This is mostly useful when reading multiple files that take some time to complete (see Multiple files).