Keywords
Norfolk Island, isolate, inbreeding, runs of homozygosity, population genetics

Background

The Norfolk Island community is an isolated population of whom the majority are direct descendants of 18th century European Bounty mutineers and Polynesian women who relocated to NI from Pitcairn Island in 1856 \cite{edgecombe1991norfolk}⁠. The majority this population can trace their genetic heritage back to a small number of families derived from the original Bounty mutineers and Polynesians. Genomic structure refers to the proportion of an individuals genome that is contributed to by either one or various ancestors; structure is therefore an individuals 'genomic pattern'. This study aimed to explore the phenomena that shaped the genomic structure of a population, leading to the formation of a unique genetic architecture with properties that aid in the identification and mapping of complex disease.

Inbreeding

Just as admixture is important in shaping the genomic architecture of individuals within a population, the concept of inbreeding (consanguinity) is equally important. Inbreeding is a descriptive term for the offspring of matings between genetically related individuals. This phenomenon leads to the reduction of genetic variation in offspring from inbred matings over time. The degree of this genetic reduction (an increase in homozygosity) is determined by: a) the closeness of the consanguineous relationship and b) the number of past consanguineous relationships within the same lineage. The inbreeding coefficient, F, represents the probability of 2 alleles being identical by descent (IBD). Table 1 represents F values for one generation of inbreeding, with no prior generations of inbreeding being taken into account. The value of F will increase above these expected values with increasing generations of inbreeding (as might be expected in population isolates).  
An inbreeding calculation may be used to determine the general genetic distance among relatives by multiplying the inbreeding coefficient by two, because any progeny would have a 1 in 2 risk of actually inheriting the identical alleles from both parents. For instance, the parent/child or sibling/sibling relationships have 50% identical genetics. Generally the sharing of identical genetic material is considered detrimental to the overall viability and strength of the total gene pool as it greatly reduces the overall genetic variation, which can lead to a decrease in the fitness of a population \cite{Bittles_1994}⁠. This generally leads to an increased chance of sequential offspring inheriting recessive or deleterious traits. This reduction in genetic diversity can be a useful tool in the detection and identification of disease related genes as the increased frequencies of homozygous alleles can be viewed as an 'amplification' of potential genetic markers of disease within the inbred cohort \cite{Bulayev_2009}⁠. Thus, the presence of inbreeding and AIMs can provide increased power for the detection of disease markers as demonstrated by a number of studies to date \cite{Latini_2004,Thompson_2009,Kenny_2010}⁠.  

Runs of homozygosity/homozygosity by descent

Individuals born of consanguineous (inbred) union have genomic segments which are homozygous as a result of inheriting identical ancestral genomic segments from both parents. Homozygosity by descent (HBD), also known as autozygosity, explains the presence of the same alleles at a given position that arise from inheritance of an identical ancestral allele through consanguineous mating events. Runs of HBD can be viewed as segments of the genome that span large regions of chromosomes, sometimes several Mb in size, that result from recombination (or lack of recombination) of chromosomal segments inherited from the same ancestral source. Numerous studies have highlighted increased prevalence of recessive diseases within consanguineous populations \cite{el_Hazmi_1995,Zlotogora_1997,Farrer_2003}⁠. HBD-mapping involves the detection of large 'runs' of homozygous alleles, and their association with disease within small inbred families or population isolates. This method was originally developed using micro-satellites/STRs, but was adapted to denser SNP arrays \cite{Woods_2004}⁠. SNP based HBD-mapping has been used successfully to identify rare recessive disease associated genes in numerous closely related populations \cite{Woods_2006}⁠. HBD-mapping has also been utilised to detect disease 'signatures', a collective group of alleles spread across the genome, in more distantly related cohorts \cite{Keller_2012}⁠. Recent advances in next-generation sequencing technologies are providing easy access to an even greater depth of information. For example the use of exome sequencing to accurately identify runs of homozygosity in large cohorts of individuals \cite{Carr_2012}⁠.  

History of the Norfolk Island population

Norfolk Island is a small volcanic island located in the South Pacific about 1600 km north-east of Sydney, Australia. The Island was initially populated by the seafaring Polynesians during the 14th/15th century, however their settlement was brief and unsuccessful and they moved on. It wasn’t until 1774 that Captain James Cook rediscovered NI, and it was subsequently used as a British penal colony on and off from 1788 until 1855. The history of the current occupants of NI originates from the events and actions of the Bounty Mutineers. The mutiny of the HMS Bounty was led by acting Lieutenant Fletcher Christian, who along with 18 other men overthrew the captain and remaining crew members, and sailed to Tahiti. In an attempt to hide from British retribution, Christian lead 9 Bounty Mutineers (Isle of Mann and British Ancestry), 6 Tahitian men (Polynesian Ancestry) and 12 Tahitian women plus a baby girl (Polynesian Ancestry) to the Pitcairn Islands, where the Bounty was sunk to avoid detection as well as eliminate escape from the island. Conflict was rife amongst the small community during the first 3 years of the settlement of Pitcairn, with numerous murders as well as suicide, leaving all Tahitian men and 7 Mutineers dead. When Ned Young succumbed to respiratory failure in 1800, John Adams was the sole surviving male on Pitcairn. Slowly the island's population continued to grow, later bolstered by three further male European settlers – John Buffett and John Evans in 1823, and G.H. Nobbs in 1828 \cite{Refshauge_1981}⁠. These three men were the only outsiders to settle permanently on the island and subsequently they married into the community. Due to population growth and the severe dwindling of resources on Pitcairn the British Government allowed the Pitcairn Islanders to settle NI, which had recently been abandoned by the British as a penal colony. On June 8 1856 the descendants of the Bounty Mutineers and Tahitian women who were previously inhabitants of Pitcairn Island were relocated, with a total population of 193 settlers making NI their new home. 
Today NI has a population of 1576 permanent residents, with approximately 1200 of these being adults. A recent census documents approximately half (N=750) of the permanent population are descendants of the Pitcairn founders⁠. Until recently, strict immigration and quarantine laws have severely restricted the settling of new founders to the island. These laws, along with the sheer isolation of the island have resulted in a lack of interaction with other populations.

Previously Identified Genomic Structure within the NI population

The concepts explained above outline the mechanisms that create and shape the unique genomic structure seen within admixed and genetic isolate populations. Previous work estimated global admixture within the NI population using a set of 128 AIMs derived from the HapMap database \cite{McEvoy_2009}⁠. It was established that the ancestry proportions in the population were 88% European vs 12% Polynesian. Further work and initial validation of the broader pedigree structure was documented using a small set of microsatellite markers. This study included an overview of the presence of inbreeding with the population, estimating the average inbreeding coefficient as F=0.011 for the established pedigree \cite{Macgregor2009}⁠. To date these indices have not been estimated using the recently available SNP data, which should lead to increased accuracy of estimation. The SNP data can be used to facilitate the estimation of locus specific admixture as well as runs of homozygosity, two structural elements that have yet to be derived in the NI population. Doing this will expand knowledge of the NI population by building upon the previously identified unique structure. It is likely that this structure could influence genetic associations with disease in the NI population, and is an integral component of disease gene mapping when using population isolates. It is important to take factors such as genomic structure into account as they have been demonstrated to have the potential to artificially inflate the false discovery rate in association studies when not accounted for \cite{Jiang_2013}⁠. As such this cannot be ignored, and should be considered crucial to experimental design moving forward into future analyses. 
This study aimed to expand upon previous work that identified estimates of ancestry and pedigree-based inbreeding within the NI population. This was achieved using dense genotype data, which resulted in the improvement in accuracy of various structure parameter estimates (admixture and inbreeding). By incorporating a reconstructed pedigree, recently available genome-wide SNP data and new software methods, extensive runs of both homozygosity as well as locus-specific admixture were observed in the genotyped core-pedigree individuals. These indices were previously unidentified in the NI population and will aid in future studies such as 'admixture-mapping' and disease association studies. This work also reinforces previous findings, and documents novel observations, demonstrating that the unique genomic structure in the NI pedigree has made it an ideal tool for disease gene/marker mapping, leading to the identification of genomic regions and structure correlated to a variety of MetS and CVD related traits for which the population is at a known increased prevalence.

Methods

Cohort Collection and Ethics

Accurate and detailed historical accounts have been used by genealogists to create and maintain a well-documented database of the entire Norfolk Island population, spanning all the way back to the original founders. This pedigree has been drawn up and is maintained in a genealogy program known as Brother’s Keeper. The pedigree includes ~5700 individuals coalescing over 11 generations or 200 years back to the original 9 European sailors and 12 Tahitian women. The Norfolk Island Health Study, which has already been well established in previous research \cite{Bellis_2005,Bellis_2008}, sampled individuals from the lower four generations of the pedigree and included 386 (64 %) individuals possessing lineages back to the founders and 216 individuals (36 %) who were considered to be new founders and did not show direct ancestral links . An updated core pedigree was constructed using this information and genetic information as it became available through genetic studies. Currently the core pedigree structure contains those individuals that are most closely related and coalesce directly back to the original founders. The Norfolk Island Health Study (NIHS) has already been well established in previous research . In this study, we used a representative group of Norfolk Island individuals selected from the pedigree, meaning that they relate back to the original founders, and we have phenotype and genotype information for them. The total number of core pedigree members selected was 330 (this was adjusted to exclude individuals under the age of 18 years), which consisted of 152 males and 178 females. All individuals gave written informed consent. Ethical approval was granted prior to the commencement of the study by the Griffith University Human Research Ethics Committee (ethical approval no: 1300000485) and the project was carried out in accordance with the relevant guidelines, which complied with the Helsinki Declaration for human research.

Genome-wide Genotyping 

EDTA anticoagulated venous blood samples were collected from all participants. Genomic DNA was extracted from blood buffy coats using standard phenol-chloroform procedures (Qiagen). Genome-wide genotyping was carried out using the Illumina Human610-Quad v1.0 beadchip. Raw data from Illumina idat files was SNP genotyped in R using the CRLMM package \cite{Scharpf_2011}⁠. Genotype data then underwent QC routines using PLINK \cite{Purcell_2007}⁠. Briefly, SNP analysis was restricted to autosomal SNPs with minor allele frequency >0.01, call rate >0.95 and Hardy-Weinberg equilibrium testing p-value >10−5. After quality control, 590,603 SNPs were used for further analyses.

Inbreeding and runs of homozygosity in Norfolk Island

Estimation of pedigree-based inbreeding (FPED) was done in R using the package 'pedigree' [21]⁠, calculations were based upon the reconstructed 1388 core-pedigree (see \cite{Benton2015} for details). Marker derived inbreeding (FMARKER) was calculated using the software IBDLD \cite{Han_2011}⁠. IBDLD is implemented in a way that it overcomes the issues around exact multipoint estimation of IBD in large pedigrees, and also eliminates the difficulty of accommodating the background linkage disequilibrium (LD) that is present in high-density genotype data \cite{Han_2011}⁠. While generating IBD matrices is the primary function of IBDLD, it also has several other important applications including: calculation of inbreeding coefficients for each genotyped individual per chromosome; per chromosome estimation of allelic homozygosity (homozygous by descent probability [HBD]), and segmental sharing analysis. Unless otherwise stated, all analysis runs of IBDLD were set at 10000 simulations. Inbreeding coefficients, calculated on a per chromosome basis using the -ibc function of IBDLD, were averaged for each individual to obtain an overall genome-wide FMARKER. To summarise the number of consanguineous relationships a cut-off was implemented at 4 decimal places to determine final FMARKER values. This cut-off was introduced as a number of individuals were observed to have F values in the range of 1x10-6- 1x10-8; these values are so close to approaching 0 that it was deemed more accurate to refer to them as not showing any inbreeding in their ancestry.  
Runs of homozygosity were estimated using the -hbd function of IBDLD, again on a per chromosome basis. IBDLD refers to marker-derived homozygosity as homozygous by descent (HBD), which is the probability that two alleles are inherited from a single source (ancestor). After running IBDLD a file containing HBD results is generated, in which each SNP is ranked between 0 and 1 (0 being no probability of HBD, 1 being 100% probability of HBD) for each genotyped individual. To visualise these results both genome-wide and chromosome wide plots of average HBD were generated in R [24]⁠. 'Peaks', or extended runs of HBD (homozygosity), were inferred as a set of SNPs continuously exhibiting an average HBD probability greater than the population average of 0.011.

Correlations between Inbreeding and Endophenotypes

Isolated populations have been repeatedly shown to have increased power to detect disease associated markers \cite{Heutink_2002,Kristiansson_2008}⁠; this is due to the unique properties such as population inbreeding that arise due to founder effects and genetic bottlenecks. In order to explore the potential importance and power of the NI pedigree for disease association marker detection correlation analyses between estimates of inbreeding with endophenotypes for CVD were carried out. All correlations were generated in R [24]⁠ using the Pearson's correlation function (2-tailed P-values were generated).  The endophenotypes examined here included ...........

Results

Homozygosity and Inbreeding in Norfolk Island Pedigree

Inbreeding reduces genetic diversity within a population. The measure for inbreeding (also known as consanguinity) within a population is measured by the inbreeding coefficient (F). Pedigree-based inbreeding (FPED) was estimated using the reconstructed core-pedigree (N=1388). The mean FPED was 0.011 with a maximum FPED=0.28. Of the 1388 individuals in this pedigree, 400 were estimated to have an inbreeding coefficient greater than zero indicating inbreeding has occurred sometime in the past. Genetic markers can also be used to estimate inbreeding. Using data from 502 SNP-genotyped individuals from the core NI pedigree we calculated FMARKER values. Figure 1 shows average FMARKER values per chromosome. It was observed that 87% (N=439) of all 502 genotyped individuals exhibited an F value greater than 0 (using a cut-off at 4 decimal places, see methods for detail). Using this information the global average F was calculated (F=0.011), with a maximum F=0.215 being observed. It is interesting to note that both the FPED and FMARKER are identical (0.011); this validates the accuracy of the updated core-pedigree and the approach used in the cleaning and reconstruction process.
The next step was to characterise the patterns of inbreeding across the genome of the group of core pedigree individuals. This was performed by calculating runs of homozygosity by descent (HBD) ie. areas of the genome that show a reduction of genetic diversity due to inheritance of analogous alleles.  Figure 2 shows a genome-wide profile of HBD probability across genotyped core-pedigree members. Visualisation of the HBD data was split into 2 separate plots.  Figure 2 A displays an average locus-specific homozygosity profile for those individuals exhibiting high HBD probability (determined as at least one locus HBD > 0.75). The average HBD for this subset of individuals was 0.042. The visualisation of peaks arising from individuals with higher than normal levels of HBD is informative of potentially smaller groups of closer related individuals within the wider NI pedigree structure. These could potentially be interesting for investigating disease associations, or could identify smaller sub-pedigrees to further facilitate the tracking of complex traits such as migraine and ocular disorders (glaucoma), both of which show increased prevalence in the NI population \cite{Cox_2012,Maher_2012,Sherwin_2011}⁠.  Figure 2 B shows the average genome-wide locus-specific homozygosity profile for all 502 genotyped core-pedigree individuals. The average genome-wide HBD for all genotyped individuals was 0.011; it should be noted that this is the exact same value as the estimated FMARKER for these individuals. This is due to the fact that both methods are calculating inbreeding with FMARKER being a genome average and HBD being locus-specific. This broader genome-wide profile of all genotyped individuals shows numerous areas of greatly increased HBD, several of the longer genomic regions (HBD > 0.011) are detailed in Table 2. The largest observed 'peak' of HBD probability on chromosome 6 was 0.13, which spans 18Mb showing an average HBD of 0.028 across the span. This peak on chromosome 6 was unique when compared to other areas of increased HBD showing multiple peaks within the same determined run of HBD (Figure 3). Interestingly the area of highest HBD resides on top of the well-defined human leukocyte antigen (HLA) region; a highly variable area of the genome well studied and known for its role in the immune system/response and disease. Another region of high HBD was observed as 2 peaks on chromosome 11. The second peak lies on a large family of olfactory receptor genes. These genes are important in the detection and interpretation of odours \cite{Farley_2004}⁠, and are reported to show increased genetic variation in order to account for the potential limitless amount of detectable odours \cite{Buck_2004}⁠.  Additional File 1 shows exactly the same data as the genome-wide figures, but have been visualised in smaller blocks of chromosomes in order to better display the location and extent of HBD across a given chromosomal region.

Correlation between inbreeding/HBD and CVD endophenotypes

An exploratory correlation analysis was conducted to investigate relationships between genomic inbreeding and 10 CVD risk traits (endophenotypes). Table 3 shows that all 10 traits exhibited some evidence of association with inbreeding (P<0.05) The strongest correlation was between CVD risk and FMARKER (Pearson's r=0.389, P<2.4x10-11). These new results are consistent with previously reported between inbreeding and CVD related traits in the NI population \cite{Macgregor2009}⁠. The current study therefore supports these findings, and builds upon them with previously unidentified trait relationships which could indicate important areas for future research.
New stuff here.

Discussion 

This section explored the unique genomic structure that underlies the NI pedigree. This structure has resulted from the rich history of the original Bounty Mutineers and Polynesian founders, being shaped by genetic bottle necks, founder effects and population admixture over the span of approximately 200 years. Previous calculations for both admixture and inbreeding coefficients have been estimated in the NI population and using these metrics it has been established that there are correlations between ancestry and CVD risk in NI \cite{Macgregor_2009,Benton_2015}. More specific population effects upon genomic structure, such as locus-specific admixture and runs of homozygosity, have not previously been explored in the NI population. These indices have potential implications in terms of disease association and will provide important foundations for future studies in NI, especially in the investigation of disease phenotypes that differ in frequency between the European and Polynesian ancestral populations.
Using a dense set of SNPs an estimate of average inbreeding coefficient of F=0.011 was calculated for the NI population. An F of this value indicates an average relationship level somewhere between second cousins (0.0156) and second cousins once removed (0.0078). This estimate is substantially smaller than that calculated previously by Macgregor et al., F=0.044; there are several potential explanations for this \cite{Macgregor2009}. Firstly, this study was able to use a more accurate, updated pedigree. Secondly, a greater depth of data in terms of a dense set of genotype data (SNPs) was available, as opposed to a small set of microsatellites (STRs). This genomic data and the updated genealogical information facilitated the reconstruction of a more accurate representation of the core NI pedigree, which should aid in more accuracy in such calculations.
As HBD is related to inbreeding, genomic regions which show increased HBD probabilities could potentially identify loci that show a lack of genetic diversity. There are several such areas with above average HBD in the genotyped core-pedigree NI population. There are major implications for this drop in genetic diversity, especially in areas such as the HLA region on chromosome 6, which was identified as exhibiting the largest average HBD. It is well established that the HLA region contains a large amount of immune function related genes, many coding for immune cell receptors that will potentially bind and recognise antigens (foreign peptides). The HLA region requires increased genetic variation to allow near limitless increased receptor specificity from a limited number of genes. High genetic variation is critical to the function of the HLA region as it allows near limitless variation to be introduced to the receptors antigen 'recognition site', which in turn increases the potential number of foreign antigens that can be detected. Decreased variation in this region could potentially be detrimental.
Chromosome 11 also had a larger degree of concentrated increased HBD, one particular peak was observed upon a large region of olfactory receptor genes. Olfactory receptors determine the way in which an organism interprets odours \cite{Buck_2004,Malnic_2004}⁠. As with the HLA region, the olfactory receptor genes are limited in number, therefore increased variation within the gene family is required to enable detection and interpretation of near limitless possible odours \cite{Malnic_1999,Niimura_2006,Khafizov_2006}⁠. Thus decreased genetic variation across this region could lead to a potential reduction in odour detecting abilities. Interestingly HLA may also be related to people's detection and perception of the odour; with several studies observing association between HLA variation and preference to odour; this may be involved in mate selection \cite{Ober1997,Jacob_2002}⁠, as at least one study found a lower than expected rate of HLA similarity between spouses in an isolated community \cite{Ober_1997}⁠. Additionally, research has shown that more married couples have distinctly different HLA (MHC) genomic backgrounds/variation than would be expected by chance alone, suggesting that selection is potentially driving for composition and differentiation within the immune systems of offspring so they are able to adapt to the threat of new diseases. Another reason for this could also an avoidance of inbreeding in an attempt to maintain a higher amount of genetic diversity within a population. In this context it is interesting that in this study has identified decreased variation in the form of increased HBD at both the HLA and olfactory receptors gene loci which warrants further exploration.  
The initial aim was to identify relationships between disease related endophenotypes and indices of the unique structural properties measured in the NI population. Using updated estimates of both marker-derived admixture and inbreeding numerous significant relationships between both measures and a range of Metabolic and CVD related traits were observed, which included a robustly calculated risk score for CVD and clinical diagnosis of Metabolic Syndrome. This is not the first study to identify these factors showing increased prevalence within the NI cohort. An initial study by Bellis et al., (2006) established the basis of increased risk for CVD related disorders and outlined baseline phenotype data . This was followed by several linkage analyses using STR data which established initial genomic maps and identified loci showing significant associations with CVD traits \cite{Bellis_2007,Bellis_2008}⁠. This study builds upon the previous work and identifies novel findings. This is the first study to examine the high-density SNP data in association with CVD/metabolic related traits, and to use an integrative genomics approach. Showing these strong structure vs risk trait relationships provides evidence that the reconstruction of the NI core-pedigree is robust and that the genomic data (SNPs) are concordant with this. This provides confidence for future disease gene mapping studies including: association; linkage and 'admixture-mapping' in this population.
 

Conclusion

Study of isolated populations require an understanding of the unique population history and admixture, which has led to unique genomic structure. Genomic structure in populations has the potential to influence genetic associations with disease, and is therefore important to consider in future study design. This knowledge can then be used appropriately as a valuable tool in disease mapping and association studies. This work increases the accuracy of previous estimates of inbreeding and documents for the first time runs of HBD in the NI population. Additionally, this study identified significant correlations between these unique structural components and disease risk traits for Metabolic Syndrome and CVD. Importantly both increased prevalence and underlying population/genetic based association with Metabolic Syndrome in the NI pedigree has been identified. This provides strong justification for further examination of the NI population in the context of Metabolic Syndrome risk and prevalence. Future research should focus in on the identified area's of locus-specific admixture and HBD in light of the correlations with MetS and related traits.

Competing interests

The authors have no competing interests to declare.

Authors' Contributions

Contributions here....

Funding

This research was supported by funding from a National Health and Medical Research Council of Australia (NHMRC) Project Grant. It was also supported by infrastructure purchased with Australian Government EIF Super Science Funds as part of the Therapeutic Innovation Australia - Queensland Node project. MCB was supported by a Corbett Postgraduate Research Scholarship. The SOLAR statistical genetics computer package is supported by a grant from the US National Institute of Mental Health (MH059490). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Acknowledgements

We extend our appreciation to the Norfolk Islanders who volunteered for this study. Additionally, we would like to acknowledge Amanda Miotto and also QUT for providing computational support for this project.