Discussion
Assessing the current landscape of the PhenomeCentral dataset shows a
steady growth in the deposition of cases for the purposes of
matchmaking. Most of these cases are deeply phenotyped, with an average
of 11 HPO terms annotated per case, and many cases containing additional
medical and family history information. A high amount of genotypic
diversity was also observed within the PhenomeCentral dataset, with over
3200 unique candidate genes flagged in total. Finally, all
PhenomeCentral cases were subjected to internal matching, and about 70%
of cases were also consented to matching with other MME data
repositories. Both internal and external matchmaking queries resulted in
over 62,000 matches being returned across the entire dataset, ultimately
leading to the identification of multiple novel disease-gene
associations.
PhenomeCentral is based on the PhenoTips software, which has a number of
advantages. Over the past five years, PhenoTips has been actively
implemented into hospital systems around the world to enable improved
care for RD patients, resulting in a lower barrier to entry as more
physicians become familiar with the similar interfaces. Having PhenoTips
software at the core of clinical and research databases also expedites
the migration of clinical data into research, as all PhenoTips instances
support the export and import of the same standardized data files, as
well as automated deposition of de-identified cases into PhenomeCentral.
Based on user feedback, we devoted considerable resources to developing
the revised matchmaking filters and the My Matches table. As the MME
network continues to grow and more data is deposited for matchmaking, we
have begun to approach a point where nearly every candidate gene returns
matches with other cases (Osmond et al., in press). Combined with the
reality that older matchmaking submissions continue to receive new
matches years after their initial submission, it is critical that
matchmaking nodes provide users with the tools to filter and track up to
thousands of matches simultaneously. The new filters and My Matches
table represent initial steps towards providing users with such tools,
however additional changes will be required so that matchmaking remains
efficient for users.
The presence of high quality phenotypic data in cases submitted to
matchmaking represents another solution to reducing the time required to
resolve an increasing number of matches. The matchmaking experiences of
the Care4Rare Canada research team suggest that while most MME nodes
support the storage of standardized phenotypic data, more than half of
cases in the MME are submitted with little to no information on clinical
features (Osmond et al., in press). As a result, most matches are
difficult to resolve on an initial review, and require lengthy email
exchanges with the matching user to determine whether a given match is
of interest. Conversely, matches with cases from nodes where phenotypic
data is frequently provided, such as PhenomeCentral, could be ruled out
on initial review over 50% of the time, drastically reducing the number
of follow-up emails required. As the number of cases and candidate genes
submitted to matching continue to grow, it will be critical for nodes to
emphasize the importance of submitting phenotypic data to ensure current
matchmaking solutions remain practical. The philosophy of PhenomeCentral
is that the upfront effort of contributing phenotypic data to cases
ultimately saves time in the matchmaking process
We believe that the current design of PhenomeCentral is well positioned
for novel approaches to matchmaking, which will utilize genomic sequence
data to increase the number of matches made for a given gene. The MME
framework is currently based on two-sided matchmaking, an approach where
both cases in a match have the same identified candidate gene. An
iteration on this approach, called one-sided matchmaking, would instead
allow users to directly query the genomic sequence data of patient
records in a database for variants in a candidate gene. In the future,
zero-sided matchmaking, a process in which algorithms use genomic
sequence data and phenotypic information to highlight matches of
interest, may also become a reality. One-sided and zero-sided
matchmaking, while requiring patient consent to a greater degree of data
sharing, both have the potential to increase the number of matches made
for a candidate gene. They will also have a greater need for detailed
phenotypic data associated with cases to ensure that the larger numbers
of matches can be reviewed without resorting to lengthy email exchanges.
PhenomeCentral is perfectly positioned for these matchmaking approaches,
as it already allows users to upload sequencing data for cases, provides
consent checkboxes to indicate which cases may use this data for
matchmaking, and can display matches between candidate genes and
variants identified in sequencing data.
In summary, PhenomeCentral has continued to grow as a data repository
dedicated to gene discovery and finding diagnoses for unsolved rare
disease patients through matchmaking. The current dataset consists of
cases from both large research groups and individual researchers, and
contains a wide variety of candidate genes and computer-readable
phenotypes. The development of new features such as robust matching
filters and cohort-wide matching tables have helped PhenomeCentral users
more efficiently manage an ever-growing number of matches. Finally, an
emphasis on contributing high quality genotypic and phenotypic data to
matchmaking has both aided MME users in the quick resolution of many
matches (Osmond et al., in press), and has positioned PhenomeCentral to
contribute to more sophisticated forms of matchmaking in the future.