One of the most difficult tasks to carry out is the splitting of GEDCOM files into two, and the merging of two GEDCOM files into one. Maybe you want to take a specific branch of your tree to give to someone else. Maybe you would like to combine two files describing your maternal ancestors and paternal ancestors. One of the reasons why it is so difficult is because of the cross references that occur within a GEDCOM file, and determining if two records should be merged into one.
We illustrate with the following sample file:
library(tidyged)
library(tidyged.utils)
summary(sample555)
#> GEDCOM file summary:
#>
#> Submitter: Reldon Poulson
#> Description:
#> Language: English
#> Character set: UTF-8
#>
#> Copyright:
#>
#> Source system: GS
#> Source system version: 5.5.5
#> Product name: GEDCOM Specification
#> Product source: gedcom.org
describe_records(sample555, sample555$record, short_desc = TRUE)
#> [1] "Submitter @U1@, Reldon Poulson"
#> [2] "Individual @I1@, Robert Eugene Williams"
#> [3] "Individual @I2@, Mary Ann Wilson"
#> [4] "Individual @I3@, Joe Williams"
#> [5] "Family @F1@, headed by Robert Eugene Williams and Mary Ann Wilson"
#> [6] "Family @F2@, headed by Robert Eugene Williams"
#> [7] "Source @S1@, titled Madison County Birth, Death, and Marriage Records"
#> [8] "Repository @R1@, Family History Library"
Splitting a file is much easier than merging two files. In order to
split a file we use the split_gedcom()
function and provide
the xrefs of the records we would like to be contained in the new file.
In this example, we’re going to take the family @F2@ and the two individuals within it:
new <- split_gedcom(sample555, c("@F2@", "@I1@", "@I3@"))
#> Some dead record references have been removed: @S1@, @F1@
summary(new)
#> GEDCOM file summary:
#>
#> Submitter: Reldon Poulson
#> Description:
#> Language: English
#> Character set: UTF-8
#>
#> Copyright:
#>
#> Source system: GS
#> Source system version: 5.5.5
#> Product name: GEDCOM Specification
#> Product source: gedcom.org
With this new file we can see it has the exact same header and submitter information. Let’s take a look to see what records it contains:
describe_records(new, new$record, short_desc = TRUE)
#> [1] "Submitter @U1@, Reldon Poulson"
#> [2] "Individual @I1@, Robert Eugene Williams"
#> [3] "Individual @I3@, Joe Williams"
#> [4] "Family @F2@, headed by Robert Eugene Williams"
By default, this function will remove references to records that do not exist in the file. The function will tell us which records these are in case you want to go back and include them.
Merging two files is a much more involved affair. Cross reference
identifiers must be made unique across both files, potential duplicate
records must be identified, and then merged. This is all done
automatically using the merge_gedcoms()
function.
Unfortunately it cannot be demonstrated here since it seeks user input when potentially duplicate records are identified.
The process of merging files contains many steps, and some of these steps are useful in their own right and are exposed to the user. These are:
Multiple records can be merged into one using the
merge_records()
function. To illustrate, we take the sample
file and add another duplicate record for one of the individuals:
with_dupes <- sample555 |>
add_indi(sex = "M") |>
add_indi_names(name_pieces(given = "Joe", surname = "Williams"))
#> Added Male Individual: @I4@
describe_records(with_dupes, with_dupes$record, short_desc = TRUE)
#> [1] "Submitter @U1@, Reldon Poulson"
#> [2] "Individual @I1@, Robert Eugene Williams"
#> [3] "Individual @I2@, Mary Ann Wilson"
#> [4] "Individual @I3@, Joe Williams"
#> [5] "Family @F1@, headed by Robert Eugene Williams and Mary Ann Wilson"
#> [6] "Family @F2@, headed by Robert Eugene Williams"
#> [7] "Source @S1@, titled Madison County Birth, Death, and Marriage Records"
#> [8] "Repository @R1@, Family History Library"
#> [9] "Individual @I4@, Joe Williams"
We now merge the two records:
merged <- merge_records(with_dupes, c("@I3@","@I4@"))
describe_records(merged, merged$record, short_desc = TRUE)
#> [1] "Submitter @U1@, Reldon Poulson"
#> [2] "Individual @I1@, Robert Eugene Williams"
#> [3] "Individual @I2@, Mary Ann Wilson"
#> [4] "Family @F1@, headed by Robert Eugene Williams and Mary Ann Wilson"
#> [5] "Family @F2@, headed by Robert Eugene Williams"
#> [6] "Source @S1@, titled Madison County Birth, Death, and Marriage Records"
#> [7] "Repository @R1@, Family History Library"
#> [8] "Individual @I3@, Joe Williams"
We can take a closer look at this merged record to see what has happened:
level | record | tag | value |
---|---|---|---|
0 | @I3@ | INDI | |
1 | @I3@ | NAME | Joe /Williams/ |
2 | @I3@ | SURN | Williams |
2 | @I3@ | GIVN | Joe |
1 | @I3@ | SEX | M |
1 | @I3@ | BIRT | |
2 | @I3@ | DATE | 11 JUN 1861 |
2 | @I3@ | PLAC | Idaho Falls, Bonneville, Idaho, United States of America |
1 | @I3@ | FAMC | @F1@ |
1 | @I3@ | FAMC | @F2@ |
2 | @I3@ | PEDI | adopted |
1 | @I3@ | ADOP | |
2 | @I3@ | DATE | 16 MAR 1864 |
1 | @I3@ | SEX | M |
1 | @I3@ | NAME | Joe /Williams/ |
2 | @I3@ | GIVN | Joe |
2 | @I3@ | SURN | Williams |
1 | @I3@ | CHAN | |
2 | @I3@ | DATE | 22 NOV 2024 |
We can see that we now have a duplicate sex subrecord and a duplicate
name subrecord. We can remove these with the
remove_duplicate_subrecords()
function:
level | record | tag | value |
---|---|---|---|
0 | @I3@ | INDI | |
1 | @I3@ | NAME | Joe /Williams/ |
2 | @I3@ | SURN | Williams |
2 | @I3@ | GIVN | Joe |
1 | @I3@ | SEX | M |
1 | @I3@ | BIRT | |
2 | @I3@ | DATE | 11 JUN 1861 |
2 | @I3@ | PLAC | Idaho Falls, Bonneville, Idaho, United States of America |
1 | @I3@ | FAMC | @F1@ |
1 | @I3@ | FAMC | @F2@ |
2 | @I3@ | PEDI | adopted |
1 | @I3@ | ADOP | |
2 | @I3@ | DATE | 16 MAR 1864 |
1 | @I3@ | CHAN | |
2 | @I3@ | DATE | 22 NOV 2024 |
Both duplicate subrecords have been removed.