Redaction of sensitive data

Introduction

GEDCOM files are often shared between people, but a file can contain detailed information for individuals, which is an issue, especially if they are still alive.

The tidyged.utils package contains functionality to detect these individuals and either remove them or remove their details.

Basic usage

The function to remove living individuals is remove_living(). To illustrate, we create a tidyged object containing a number of different individuals, alive, dead, and some ambiguous:

  • A person born in 1996
  • A person with a death event
  • A person born in 1796
  • A person who was a driver in 1930 when they were 50 years old
  • A person who got married in 1856 when they were 20 years old
library(tidyged)
library(tidyged.utils)

people <- gedcom(subm("Me")) |> 
  add_indi(qn = "Living person") |> 
  add_indi_fact("birth", date = date_calendar(1996)) |>
  add_indi(qn = "Confirmed dead person") |> 
  add_indi_fact("death") |> 
  add_indi(qn = "Reeeaally old person") |> 
  add_indi_fact("birth", date = date_calendar(1796)) |> 
  add_indi(qn = "Implicit dead person 1") |> 
  add_indi_fact("occupation", descriptor = "Driver", date = date_calendar(1930), age = "50y") |> 
  add_indi(qn = "Implicit dead person 2")
#> Added Unknown Individual: @I1@
#> Added Unknown Individual: @I2@
#> Added Unknown Individual: @I3@
#> Added Unknown Individual: @I4@
#> Added Unknown Individual: @I5@

idp2_xref <- find_indi_name(people, "Implicit dead person 2")

people <- people |>
  add_famg(husband = idp2_xref) |> 
  add_famg_event("relationship", date = date_calendar(1856), husband_age = "20y")
#> Added Family Group: @F1@

describe_records(people, people$record)
#> [1] "Submitter @U1@, Me"                                            
#> [2] "Individual @I1@, Living person, born 1996"                     
#> [3] "Individual @I2@, Confirmed dead person"                        
#> [4] "Individual @I3@, Reeeaally old person, born 1796"              
#> [5] "Individual @I4@, Implicit dead person 1"                       
#> [6] "Individual @I5@, Implicit dead person 2"                       
#> [7] "Family @F1@, headed by Implicit dead person 2, and no children"

The default behaviour of the function is to remove data for living individuals, but also those that are ambiguous. Confirmed dead person has a death event, and Reeeaally old person was born in 1796. The function assumes a maximum age of 120.

remove_living(people) |> 
  describe_records(people$record)
#> Individual @I1@, Living person cleansed
#> Individual @I4@, Implicit dead person 1 cleansed
#> Individual @I5@, Implicit dead person 2 cleansed
#> [1] "Submitter @U1@, Me"                              
#> [2] "Individual @I1@, Unnamed individual"             
#> [3] "Individual @I2@, Confirmed dead person"          
#> [4] "Individual @I3@, Reeeaally old person, born 1796"
#> [5] "Individual @I4@, Unnamed individual"             
#> [6] "Individual @I5@, Unnamed individual"             
#> [7] "Family @F1@, headed by @I5@, and no children"

For illustration purposes, we can increase the maximum age threshold, which will make the function treat the old person as still living:

remove_living(people, max_age = 300) |> 
  describe_records(people$record)
#> Individual @I1@, Living person cleansed
#> Individual @I3@, Reeeaally old person cleansed
#> Individual @I4@, Implicit dead person 1 cleansed
#> Individual @I5@, Implicit dead person 2 cleansed
#> [1] "Submitter @U1@, Me"                          
#> [2] "Individual @I1@, Unnamed individual"         
#> [3] "Individual @I2@, Confirmed dead person"      
#> [4] "Individual @I3@, Unnamed individual"         
#> [5] "Individual @I4@, Unnamed individual"         
#> [6] "Individual @I5@, Unnamed individual"         
#> [7] "Family @F1@, headed by @I5@, and no children"

Guessing individuals ages

The guess parameter will cause the function to invoke additional functionality to try to guess the age of individuals where a date of birth is not given:

remove_living(people, guess = TRUE) |> 
  describe_records(people$record)
#> Individual @I1@, Living person cleansed
#> [1] "Submitter @U1@, Me"                                            
#> [2] "Individual @I1@, Unnamed individual"                           
#> [3] "Individual @I2@, Confirmed dead person"                        
#> [4] "Individual @I3@, Reeeaally old person, born 1796"              
#> [5] "Individual @I4@, Implicit dead person 1"                       
#> [6] "Individual @I5@, Implicit dead person 2"                       
#> [7] "Family @F1@, headed by Implicit dead person 2, and no children"

This causes the function to determine that all individuals apart from the first is dead. This looks in both individual facts and family group events.

Redaction options

The remaining parameters determine the action to take when living individuals are found. By default, the records are preserved, but all detail is removed, leaving only a change date and an explanatory note:

remove_living(people, guess = TRUE) |>
  dplyr::filter(record  == "@I1@")
#> Individual @I1@, Living person cleansed
#> # A tibble: 4 × 4
#>   level record tag   value                                             
#>   <dbl> <chr>  <chr> <chr>                                             
#> 1     0 @I1@   INDI  ""                                                
#> 2     1 @I1@   NOTE  "Information on this individual has been redacted"
#> 3     1 @I1@   CHAN  ""                                                
#> 4     2 @I1@   DATE  "22 NOV 2024"

The user has the option of changing the text of this note using the explan_note parameter. Alternatively it can be set to an empty string to remove it completely:

remove_living(people, guess = TRUE, explan_note = "") |>
  dplyr::filter(record  == "@I1@")
#> Individual @I1@, Living person cleansed
#> # A tibble: 3 × 4
#>   level record tag   value        
#>   <dbl> <chr>  <chr> <chr>        
#> 1     0 @I1@   INDI  ""           
#> 2     1 @I1@   CHAN  ""           
#> 3     2 @I1@   DATE  "22 NOV 2024"

Alternatively, the record can be removed completely:

living_removed <- remove_living(people, guess = TRUE, remove_record = TRUE)
#> Individual @I1@, Living person removed

Since supporting records could also hold sensitive information, the remove_supp_records parameter allows you to also remove these (which it does by default).