--- title: "Splitting and Merging GEDCOM files" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Splitting and Merging GEDCOM files} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction One of the most difficult tasks to carry out is the splitting of GEDCOM files into two, and the merging of two GEDCOM files into one. Maybe you want to take a specific branch of your tree to give to someone else. Maybe you would like to combine two files describing your maternal ancestors and paternal ancestors. One of the reasons why it is so difficult is because of the cross references that occur within a GEDCOM file, and determining if two records should be merged into one. We illustrate with the following sample file: ```{r} library(tidyged) library(tidyged.utils) summary(sample555) describe_records(sample555, sample555$record, short_desc = TRUE) ``` ## Splitting files Splitting a file is much easier than merging two files. In order to split a file we use the `split_gedcom()` function and provide the xrefs of the records we would like to be contained in the new file. In this example, we're going to take the family @F2@ and the two individuals within it: ```{r} new <- split_gedcom(sample555, c("@F2@", "@I1@", "@I3@")) summary(new) ``` With this new file we can see it has the exact same header and submitter information. Let's take a look to see what records it contains: ```{r} describe_records(new, new$record, short_desc = TRUE) ``` By default, this function will remove references to records that do not exist in the file. The function will tell us which records these are in case you want to go back and include them. ## Merging files Merging two files is a much more involved affair. Cross reference identifiers must be made unique across both files, potential duplicate records must be identified, and then merged. This is all done automatically using the `merge_gedcoms()` function. Unfortunately it cannot be demonstrated here since it seeks user input when potentially duplicate records are identified. The process of merging files contains many steps, and some of these steps are useful in their own right and are exposed to the user. These are: * Identifying if records in a file are potentially duplicates (seeks user input) * Merging selected records into a single record * Removing duplicate subrecords ### Merging records Multiple records can be merged into one using the `merge_records()` function. To illustrate, we take the sample file and add another duplicate record for one of the individuals: ```{r} with_dupes <- sample555 |> add_indi(sex = "M") |> add_indi_names(name_pieces(given = "Joe", surname = "Williams")) describe_records(with_dupes, with_dupes$record, short_desc = TRUE) ``` We now merge the two records: ```{r} merged <- merge_records(with_dupes, c("@I3@","@I4@")) describe_records(merged, merged$record, short_desc = TRUE) ``` We can take a closer look at this merged record to see what has happened: ```{r} dplyr::filter(merged, record == "@I3@") |> knitr::kable() ``` ### Removing duplicate subrecords We can see that we now have a duplicate sex subrecord and a duplicate name subrecord. We can remove these with the `remove_duplicate_subrecords()` function: ```{r} remove_duplicate_subrecords(merged, "@I3@") |> dplyr::filter(record == "@I3@") |> knitr::kable() ``` Both duplicate subrecords have been removed.