- How to Establish a Kinship Dataset
- File Formats for Kinship Data
- Kinship Relations Notation
- Property Codes
- Bibliographic References
A genealogical corpus is a set of individuals linked by relations of kinship and marriage with basic and supplementary information for each individual that has been coded :
- A unique identity number (ID)
- Gender: H (man), F (woman), X (gender unknown)
- Father’s ID number
- Mother’s ID number
- Spouse(s) ID number
- Biographical informations (birth, marriage, death dates and places, other properties)
Tips for Collecting Kinship Data
Data are not only a result but also a means of data collection. They should be easily accessible in order to guide your research and to cross-check your informant’s answers. When dealing with archives, this is often fairly simple: you can take a computer with you. But in many fieldwork situations this is not possible. However, noting kinship "by hand" can be extremely fast and efficient, if some basic principles are observed :
- Always use a compact medium, such as a notebook. Do not use filesheets or loose papers. You cannot use them during interviews, and there is a high risk of loosing some of them.
- Separate graphics and text. A good method is to use a notebook with the left page for drawing genealogies, the right page for listing the individuals and their properties, and numbers for identifying these individuals (if numbers get large, it is recommended to use, in addition, initial letters to prevent identification problems in case of numbering errors) — Attribute an identity number to each individual and never attribute that number to another individual. If you have "doubles", make a link to the original number but do not re-assign it. Holes in the series of numbers do not cause any damage, but ambiguities in identity numbers cause much damage, and are extremely difficult to detect.
- Do not use identity numbers as codes. Identity numbers serve to identify individuals - and nothing else (except, perhaps, to recall the order in which you have entered them and to document the history of your corpus). If you want to convey information on individuals gender, clan affiliation, residence, etc., do not use identity numbers for that.
- Never forget to make regularly copies and store them on different places. This holds for all data, but especially for kinship data, due to the network properties of kinship: one lost notebook may render twenty others useless.
- Do I have to number individuals continuously ?
No. Discontinuous numbering is no problem for Puck nor for most other genealogical programs. Pajek requires continuous numbering, but Puck can convert datasets into pajek file format including renumbering without loss of information on original numbers (by using the option "numbered" for exportation). However, you should avoid too large empty spaces between identity numbers, because some search methods may get more time intensive.
- Some individuals in my dataset are doubles, do not exist, or have become obsolete. Can I delete them ?
Yes, but do not reassign their identity numbers to other individuals! Just leave their positions empty. In the case of doubles, it can be useful to keep them in your dataset, so that you can easily find informations on the individuals in the different places in your notebooks. You can mark them as doubles by assigning them as a name the identity number of the original. If needed, you can always eliminate them by the eliminate doubles option.
- How do I code kinship relations between individuals when I ignore the exact genealogical chain ?
If you know the exact genealogical relation, you may introduce into your dataset virtual
individuals - having « # » as a name - as intermediary links (for instance, if you know that A is B’s paternal brother, you may introduce a virtual common father). Make sure, however, that the kinship term people give you really corresponds to the supposed genalogical relation (in many societies, kinship terms may designate large classes of relations, some of them may be without any genealogical foundation whatsoever!) If you are not 100% sure that your « brother » really is a brother in a genalogical sense, you should rather store the information in a note or as relational property of the concerned individuals.
- How do I code divorced spouses ?
Like all other spouses, living or dead, married or divorced. You can store the information on divorce among the individuals properties (see also File formats for kinship data).
This page contains some references to the Kinsources project website. To know more about Kinsources, click here.
Kinship data can be stored in files of different formats :
Text and Excel format (file extensions .txt and .xls)
Pajek Network format (file extension .paj)
Gedcom format (file extension .ged)
Kinship editor xml format (file extension .xml)
Prolog format (file extension .pl)
A kinship relation can be represented in several different notations. Puck basically uses two of them : the standard and the positional notations.
The conventional notation of kinship relations uses capital letters for indicating the type of 8 basic kinship relations. These letters are mostly abbreviations of the corresponding English kinship term. They contain information on the gender of Alter and of the direction of the basic kinship relation (ascendance, descent, marriage, as well as siblingship). The following table shows its logic :
These basic kinship relations are composed into more complex ones by the simple juxtaposition of letters according to their position in the kinship chain, starting from ego (as in English, but contrary, for example, to French, where kinship terms have to be composed starting with alter!). The gender of Ego must be indicated by additional signs such as ♂ [male Ego] or ♀ [female Ego] placed before the initial letter. The resulting combination of letters can be read as a direct abbreviation of an English kinship term: MBD (mother’s brother’s daughter, a matrilateral cross-cousin), ZH (sister’s husband, a brother in-law), FWS(father’s wife’s son, a step-brother) are examples of this.
Half-sibling relations are distinguished from full sibling relations by using explicit combination of ascendance and descendance letters instead of sibling letters: for instance, FS (father’s son, paternal half-brother). In addition to genealogical relations, relative age can be indicated by minor letters e (elder) and y (younger) placed before the kinship letter concerned: for instance, FeB (father’s elder brother), MyZ (mother’s younger sister). Standard kinship notation is highly intuitive and easy to read (at least for anglophones). However, it expresses the ethnocentric viewpoint of English kinship terminology and, by using simple abbreviations, tells us little or nothing about the structure of the kinship relation. It is therefore certainly not the best tool for analytical purposes.
In the positional notation, developed by Laurent Barry (Barry, 2004), a kinship relation is represented by a sequence of letters indicating gender (by abbreviations of the french terms H - homme - for male, and et F - femme - for female) and two diacritical signs :
- The point or full stop “.” which indicates marriage ;
- The parentheses () surround an apical position, that is, the position of an individual which is not descendant of any of its neighbors. If both neighbors are spouses, the parentheses may be dropped.
Relations of ascendance and descent are indicated by simple juxtaposition, where direction changes after every pair of parentheses and every marriage dot. By convention, the starting direction is ascendance.
By replacing gender letters with the variable X, more comprehensive classes of kinship relations can be represented in positional notation. For instance, X(H)X denotes paternal half- siblings, XX(X)F direct aunts, X(F)FH uterine nephews.
Note that the translation of kinship relations from standard notation (without using ♀ and ♂ signs for the gender of ego) into positional notation always implies the variable letter X in the first position.
Positional notation can be used not only to represent abstract kinship relations, but also concrete kinship chains. In this case, gender letters are replaced by identity numbers of the individuals in the respective positions.
The major advantages of positional notation are :
- The clear representation of the kinship relations structural properties, which remain unchanged by symmetry transformations HF( )HF becomes FH( )FH, but MBD becomes FZS ;
- The integration of the sex of ego and not only of alter ;
- The applicability not only as a notation but as a classification tool (by use of gender variables) ;
- The homogeneity of notations of kinship chains (with individual numbers), kinship relations (with gender letters) and kinship relation classes (with gender variables).
The following table shows some examples of kinship relations translation from positional to standard notation :
Endogenous and exogenous properties are designated by standard codes. In addition to the standardized codes listed above, you are free to enter any other property label you want.
Warning : only use single-word codes - Puck does not allow for empty spaces in property codes.
Note : property codes are fixed and language-independent. They do not change by switching from one language to another.
Main Endogenous Properties
“Endogenous” criteria of classification are calculated by Puck from the genealogical data and are derived automatically from the kinship network itself : sibling group size, number of known ascendants, number of spouses, etc. They need not and should not be explicitly specified, and their codes should not be used to enter properties or to load them from a file.
- ALL - a pseudo-property that serves to remove a partition and to restore the unity of the underlying corpus
- ***BIRTH_ORDER - birth order
- ***GENDER - gender
- GEN - generation (see here)
- FIRSTN - first name
- LASTN - last name
- FRATP - father, agnatic fratry
- FRATM - mother, uterine fratry
- PATRIC - agnatic apical ancestor, “patrilineage”
- MATRIC - uterine apical ancestress, “matrilineage”
- PATRID - distance to the agnatic apical ancestor, “agnatic generation”
- MATRID - distance to the agnatic apical ancestress, “uterine generation”
- DEPTH - distance to the most remote ancestor, maximal generational depth
- MDEPTH - mean distance to ancestors, mean generational depth. The formula have been defined by Cazes (Cazes & Cazes, 1996)
- PEDG x - number of ascendants (where x is a number specifying generational distance)
- PROG x - number of descendants (where x is a number specifying generational distance)
Note : The properties PEDG (pedigree) and PROG (progeny) require specification by a number that indicates generational distance. For instance, PEDG 2 is the number of grandparents, PROG 1 the number of children.
- SPOU - number of spouses
Main Exogenous Properties
The “Exogenous” classification criteria do not derive from the kinship network itself : dates of birth, death or marriage, profession, residence, religion, etc. Exogenous properties have to be specified explicitly for each individual in the file from which the corpus is loaded or by entering them in the data window. Puck uses the standard gedcom codes for exogenous properties.
- ***BIRT_DATE - birth date
- ***BIRT_PLACE - birth place
- ***DEAT_DATE - death date
- ***DEAT_PLACE - death place
- ***MARR - marriage (place/date/year/alter)
Note : Binarizing this property according to place, date or period and using this binarized property for redefining spouses in order to effect a second relational or matrimonial census permits a restricted matrimonial census
- DIV - divorce (place/date/year/alter)
- BAP - baptism (place/date/year)
- BURI - burial (place/date/year)
- DECO - decoration (place/date/year)
- EDUC - education
- NATI - nationality
- OCCU - occupation
- RELI - religion
- RESI - residence
- TITL – title
a) According to the arc and edge pattern of lines:
- Length : the number of arcs and edges included (Roman degree in the case of consanguine relations)
- Height : the length of the longest linear chain included (German degree in the case of consanguine relations)
- Width : the number of marriage edges included (consanguine relations have width 1, relinking marriages width 2 or more.)
b) According to the gender pattern of vertices :
- Descent : agnatic, uterine or cognatic according to the gender of vertices in consanguine chains
- Crossness : cross or parallel according to the gender difference of intermediate pairs of vertices in consanguine chains
- Terminal crossness : cross or parallel according to the gender difference of terminal pairs of vertices in consanguine chains
c) According to symmetry features :
- Skewedness: horizontal, ascending or descending according to differences in the length of the linear chains composing a consanguine chain
- Automorphy: percentage of symmetry transformations that leave the kinship relation unchanged
- SIMPLE - the relation or ring type as such (the "finest" classification: each relation is in a separate class)
- LENGTH - length : the number of links between ego and alter (in consanguine relations this corresponds to civil or roman degree)
- HEIGTH - height : the maximal number of links to an apical ancestor (in consanguine relations this corresponds to canonic or germanic degree)
- WIDTH - width: the number of consanguine components implied in the relation
- SYM - symmetry: the number of automorphic transformations as a percentage of all possible transformations which leave gender and direction invariant
- HETERO - a binary property, true if all married couples as well as the pair ego/alter are heterosexual, false otherwise
- DEGREE - civil degree (number of links between consanguines)
- ENDS - gender combination of ego/alter
- SKEW - skewedness (generational distance between ego and alter)
- SKEW+ - skewedness (in three classes: horizontal, oblique, alterne)
- LINE - unilinearity type (agnatic, uterine, cognatic, bilateral or identity)
- AGNA - agnatic coefficient (percentage of agnatic links)
- UTER - uterine coefficient (percentage of uterine links)
- DRAV - dravidian crossness
- SWITCHES - number of gender switches
- ARCH - gender combination of the apical siblings (children of the apical ancestor of the relation), not defined for linear relations
- Status (allowed / not allowed / not defined) according to particular marriage systems
- DRAV-H - dravidian crossness (horizontal system, Chimane model)
- DRAV-O - dravidian crossness (oblique system, Parakana model)
2004, "Historique et Spécificités techniques du programme Genos", Ecole « Collecte et traitement des données de terrains », Available online at http://llacan.vjf.cnrs.fr/SousSites/EcoleDonnees/extras/Genos.pdf
BARRY Laurent, & GASPERONI Michaël,
2008, "L’oubli des origines. Amnésie et information généalogiques en histoire et en ethnologie", Annales de démographie historique, 116, 53-104.
CAZES Marie-Hélène, & CAZES Pierre,
1996, "Comment mesurer la profondeur généalogique d’une ascendance?", Population, 51/1, 117-140.
GRANGE Cyril, & HOUSEMAN Michael,
2010, "Objets d’analyse pour l’étude des réseaux de parenté: une application aux familles de la grande bourgeoisie juive parisienne XIXe-XXe siècles", Annales de démographie historique, 116(2), 105-144.
HAMBERGER Klaus & DAILLANT Isabelle,
2008, "L’analyse de réseaux de parenté: concepts et outils", Annales de démographie historique, 116, 13-52.
HAMBERGER Klaus, & GARGIULO Floriana,
2013, "Virtual Fieldwork. Modeling Observer Bias in Kinship and Alliance Networks", Journal for Artificial Societies and Social Simulation, 17(3), 2. Available online at http://jasss.soc.surrey.ac.uk/17/3/2.html.
HAMBERGER Klaus, HOUSEMAN Michael, DAILLANT Isabelle, WHITE Douglas R., & BARRY Laurent,
2004, "Matrimonial ring structures", Mathématiques et Sciences Humaines. Mathematics and Social Sciences, (168), p.83-120.
HAMBERGER Klaus, HOUSEMAN Michael, & GRANGE Cyril,
2009, "La parenté radiographiée", L’Homme, 191(3), 107-137.
HAMBERGER Klaus, HOUSEMAN Michael & GRANGE Cyril,
2014, "Scanning for patterns of relationship: analyzing kinship and marriage networks with Puck 2.0", The History of the Family, publication in progress, see http://www.tandfonline.com/loi/rhof20 (restricted access).
HAMBERGER Klaus, HOUSEMAN Michael, & WHITE Douglas R.,
2012, "Kinship Network Analysis", In P. Carrington & J. . Scotto (Éd.), The Sage Handbook of Social Network Analysis (p. 533-549). Sage Publications.
WHITE Douglas R., & HOUSEMAN Michael,
1996, "Structures réticulaires de la pratique matrimoniale", L’Homme, 36(139), 59-85.
WHITE Douglas R., & JORION Paul,
1992, "Representing and Analyzing Kinship: A New Approach", Current Anthropology, 33, 454-462.