Computational Disease Gene Identification Strategy for Osteoporosis Candidate Genes

European Musculoskeletal Review, 2008;3(2):12-16

Abstract

Complex diseases such as osteoporosis are influenced by multiple genes with small individual effects, the environment and the interaction of the two. The identification of the genetic components of complex diseases is one of the greatest challenges for human geneticists. For the past two decades the dominant study design has been linkage analysis in families, which identifies broad intervals of several megabases of DNA (quantitative trait loci) that correlate with the disease status in pedigrees. Linked DNA intervals can encompass dozens to hundreds of candidate genes that may be involved in or causal for the disease. The transition from quantitative trait loci to gene has proved to be difficult due to the absence of complete functional information for the majority of genes in this susceptibility locus and the limited knowledge of the link between gene function and disease.

Osteoporosis is a complex disease characterised by low bone mineral density (BMD) and fractures, particularly of the spine, hip and wrist. Osteoporosis is highly heritable, with an increased rate of concordance in monozygotic versus dizygotic twins and a substantially increased incidence in individuals with a positive family history. To date, more than 10 genome-wide linkage scans across multiple populations have been launched to hunt for osteoporosis susceptibility genes.1–2 Some significant or suggestive chromosomal regions of linkage to BMD have been identified and replicated in genome-wide linkage screens.

The next daunting task is to identify key candidate genes within these confirmed regions. Exhaustive surveys of all variations in the intervals are needed to determine which genes within these chromosomal regions account for the linkage signals found. Despite the recent drop in genotyping cost, this kind of study is still expensive, and in many cases is not feasible. Currently, some promising bioinformatics tools are available for disease gene identification.3–7 An attractive alternative strategy is to identify candidate genes using bioinformatics tools followed by conventional experimentation (e.g. gene-wide and tag single nucleotide polymorphism [SNP]-based association analyses in large populations and/or family samples). This two-step process will greatly expedite the process of gene discovery in complex diseases such as osteoporosis.

Computational Disease Gene Identification Tools
Disease Gene Prediction (www.cgg.ebi.ac.uk/services/dgp/)3
Disease gene prediction (DGP) assigns probabilities to genes that could indicate involvement in hereditary diseases using parameters based on conservation, phylogenetic extent, protein length and paralogy pattern. It does not assume any particular phenotype and does not account for specific phenotype features. <

GeneSeeker (www.cmbi.ru.nl/geneseeker/)4
GeneSeeker is a web tool that filters positional candidate disease genes based on expression and phenotypic data from both human and mouse models. It directly queries several online databases for localisation information, phenotypic and expression data through the web. GeneSeeker points to genes that have differential expression levels in disease-related tissues compared with unaffected tissues.

PROSPECTR and SUSPECTS (PandS)
(www.genetics.med.ed.ac.uk/suspects/)5

PROSPECTR differentiates between those genes that are likely to be involved in diseases and those that are not involved. It uses sequence-based features such as gene length, protein length and percentage identity of homologues in other species. Genes with scores over a certain threshold (0.5) are classified as likely to be involved in some form of human hereditary disease, while genes with scores below that threshold are classified as unlikely to be involved in disease. SUSPECTS scores candidate genes using the PROSPECTR and also assesses the similarity of their gene expression profiles, shared InterPro domains and Gene Ontology annotations to those of already known disease genes (training genes) for a particular disorder.

Endeavour (www.esat.kuleuven.be/endeavour) 6
Endeavour is a software application for the prioritisation of test genes based on a user-defined training set of genes already known to be involved in the disease of interest. The ranking of a test gene is based on its similarity to the training genes with respect to literature, functional annotation, microarray expression, expressed sequence tags (EST) expression, protein domains, protein–protein interactions, pathway membership, cis-regulatory modules, transcriptional motifs and sequence similarity.

Prioritizer (www.prioritizer.nl) 7
Prioritizer ranks genes based on their functional interaction with genes on different susceptibility loci, assuming that disease genes in a specific disorder are usually functionally related. The Bayesian network was constructed with a large number of predicted interactions relying on gene ontology annotations. The network was then used to rank the positional candidates on the basis of their interactions.