Files do not have to be stored locally, but can be read directly from an online location, assuming you have an internet connection. Thus, we can read files directly from the DGS-Korpus repository like so:
filename a time_1 time_2
1 1413451-11105600-11163240.eaf a2834768 ts1 ts12
2 1413451-11105600-11163240.eaf a2834772 ts12 ts19
3 1413451-11105600-11163240.eaf a2834780 ts19 ts40
4 1413451-11105600-11163240.eaf a2834793 ts40 ts63
5 1413451-11105600-11163240.eaf a2834798 ts63 ts68
6 1413451-11105600-11163240.eaf a2834800 ts68 ts76
annotation
1 Wie mein Leben aussieht?
2 Na ja, ich bin als Gehörloser aufgewachsen.
3 Ich habe eher das Gefühl, wenn ich mir vorstelle, dass ich allein bin, dann wäre ich einsam.
4 Da treffe ich lieber viele Gehörlose und mache mit denen was, dann ist mein Leben schön.
5 Aber das ist ja klar.
6 Dann ist Alleinsein nicht schlimm.
tier tier_type participant annotator
1 Deutsche_Übersetzung_A L_text__finer_granularity ber-36 <NA>
2 Deutsche_Übersetzung_A L_text__finer_granularity ber-36 <NA>
3 Deutsche_Übersetzung_A L_text__finer_granularity ber-36 <NA>
4 Deutsche_Übersetzung_A L_text__finer_granularity ber-36 <NA>
5 Deutsche_Übersetzung_A L_text__finer_granularity ber-36 <NA>
6 Deutsche_Übersetzung_A L_text__finer_granularity ber-36 <NA>
parent_ref a_ref language_ref cv_id cve_ref start end duration time_unit
1 <NA> <NA> <NA> <NA> <NA> 240 2160 1920 milliseconds
2 <NA> <NA> <NA> <NA> <NA> 2160 4120 1960 milliseconds
3 <NA> <NA> <NA> <NA> <NA> 4120 7860 3740 milliseconds
4 <NA> <NA> <NA> <NA> <NA> 7860 11180 3320 milliseconds
5 <NA> <NA> <NA> <NA> <NA> 11180 12340 1160 milliseconds
6 <NA> <NA> <NA> <NA> <NA> 12340 13500 1160 milliseconds
Additional arguments
Specifying tiers
There are two argument with which you can specify which tiers or tier types you want to read from your EAF file: tiers (simple to use but constrained) and xpath (complicated to use but more customizable).
With tiers, you can input a named list with either tier or tier_type as names, and single character strings or character vectors as their values, to specify tiers to read. For instance, if I know that there are only two tiers that I want to read in a file, I can specify this:
With the xpath argument, you can directly target tiers using XPath syntax. This is more complicated for the basic user, but allows for some additional functionality and customization for the advanced user, such as targeting tiers based on substrings, like tiers staring with “wo” (targeting “words”):
The argument fill_times is set to TRUE by default, which means that the read_eaf() function attempts to fill the empty time slots (i.e., start and end times of annotations) for child annotations based on their parents’ annotations. This is usually the output most users would likely want, but it may result in unexpected behavior in some cases. For instance, while it works normally in cases like above, where there are parent annotations to fill times from, it will not work if the file for some reason lacks time slots altogether, or if only child tiers are targeted. In such cases, setting this argument to FALSE should solve the issue of reading the file, but will leave time slots empty as in the original EAF file:
library(readelan)eaf_file <-system.file("extdata", "example.eaf", package ="readelan")# This will result in an error:# annotations <- # read_eaf(file = eaf_file,# tiers = list(tier_type = c("syntax")))# This should workannotations <-read_eaf(file = eaf_file,tiers =list(tier_type =c("syntax")),fill_times =FALSE)head(annotations)
filename a time_1 time_2 annotation tier tier_type participant
1 example.eaf a7 <NA> <NA> subject syntax syntax s001
2 example.eaf a8 <NA> <NA> predicate syntax syntax s001
3 example.eaf a9 <NA> <NA> object syntax syntax s001
annotator parent_ref a_ref language_ref cv_id
1 GHI words a1 eng syntax
2 GHI words a2 eng syntax
3 GHI words a3 eng syntax
cve_ref start end duration time_unit
1 cveid_4990ed36-c1d1-40c6-800e-2bd264d9a89b NA NA NA milliseconds
2 cveid_de8c8f23-6b49-42da-9bd4-5b8b59d8b1da NA NA NA milliseconds
3 cveid_f37323ed-9b7e-48d9-bbd6-b9105186ed02 NA NA NA milliseconds
Writing full paths
The full_path argument simply determines whether the full file path input should be written to the output data frame as the filename (if TRUE; e.g., “/path/to/elan_file.eaf”), or whether it should be shortened to the base name only (if FALSE, the default; e.g., “elan_file.eaf”).
Progress bar
If progress is set to TRUE, a progress bar will be printed to the console as files are read. This is mostly useful when reading multiple files that take some time to complete (see Multiple files).