Abstract:
Bioinformatics and computational systems biology fuses several branches of
applied sciences and applied engineering and interplay that exploits basic sciences such as
mathematics, physics, chemistry, computer science that present a partial picture and
biological sciences such as molecular biology, structural biology, and systems biology that
present a whole picture. To unravel complicated biological issues, this research work deals
with interdisciplinary aspects of contemporary bioinformatics and computational systems
biology as defined by the NIHs working definition of bioinformatics and computational
biology (http://www.bisti.nih.gov/docs/compubiodef.pdf) in which many science fields
come under the umbrella of bioinformatics with numerous applications in the field of life
sciences. Contemporary bioinformatics deals with computational tools providing a userfriendly
environment for the dissemination of life sciences knowledge through existing
biological databases in a particular domain. On similar lines, computational systems
biology, which has roots in life sciences, primarily deals with analytical data oriented
techniques and mathematical modelling to study and analyze complex biological systems.
For this, four different research concerns for progressive biological discoveries are handled
in this research work using bioinformatics and systems biology approaches.
At present, all research indications speak in favour of the key challenge for
integrative biology: providing physiological models that could facilitate development of
novel drugs against diseases such as cancer and Alzheimer’s disease against which
effective therapeutics currently do not exist. Even though such full physiological models
are not always attainable due to inadequate biological data and/or their appropriate
integration, functional genomics can be currently considered as a reliable functional basis
upon which such models are expected to rely. The research work provides novel insights
into how a biological data base, which are essentially descriptive physiological models,
can be functionally improved in terms of contemporary bioinformatics depending on the
accessibility and integration of data. Most researchers agree that the challenge is data
management, data analysis, data interpretation, data modelling and understand all the
biological data that are being produced. However, a major issue prevails: all the abovementioned
issues are handled differently at different laboratories throughout the world,
producing plethora of biological data. To fill this research gap, an omics or integrative
genomics revolution is need that uses the power of gene ontology (GO). The first concern
of this work is to provide theoretical models to achieve this herculean task of integrating
biological data by moving from knowledge gained from functional genomics to
physiological models. Since a better understanding of many pathological conditions is the
ultimate goal of full physiological models, physiology can be understood as the science of
the functioning of living systems. To approach a full physiological model, a tremendous
amount of biological knowledge contained in various databases needs to be sorted out by
discriminating different types of data subjected to double integration: i) vertically - from
molecular level, over cell and organ levels, all the way to the level of a whole organism
and (ii) horizontally - comprising gene, anatomy and phenotype data. As such, a
hypothetical full physiological model is supposed to have its full biological process (BP),
full cellular component (CC), full molecular function (MF) and with its specific full
ontologies respectively. Connecting individual ontologies from various data resources is a
key step leading to a universal full physiological model. As such, the proposed model is
supposed to have its full BPCCMF with its specific full ontologies.
After understanding the concept of the full physiology, the illustration using a
plants physiological model is implemented in this research work and the same can be
extended for other organisms, pathological conditions, etc. The second primary concern in
this work is the development of a gene ontology data mining tool using contemporary
bioinformatics focusing on the design of a plants physiology database that represents all
biological knowledge in a computationally tractable way unambiguously. The idea to serve
the plant scientific community by using power of contemporary bioinformatics came from
the fact that plants have been the most studied since the advent of classic genetics. Recent
studies show that plants are biologically more complex and there are enormous
applications to be gained from researching plant genes to progress the reception of
nutrients from the earth to enhance plant yields and plant ailments that directly effects the
health of humans. This research work focuses on providing a centralized plants physiology
database as a new searching and investigating tool after mining plants gene ontological
data from GO database. The applications of contemporary database management led to the
development of Plants Physiology Database (PPDB), a searching and browsing tool based
on the mining of large amounts of gene ontology data currently available. The PPDB is
publicly available and freely accessible on-line (http://www.iitr.ernet.in/ajayshiv/) through
a user-friendly environment generated by Drupal-6.24.
Another focus of this work is the systems biology of cancer. Last decade has
witnessed the emergence of new field of research called systems biology to capture the
biological phenomenon with data analysing, modelling, and computational tools.
Generations of scientists and physicians have dedicated their life to improving patient care
and fighting against cancer. Systems biology offers promising insights to defeat cancer.
Cancer is a major health issue responsible for 8.2 million deaths in 2012 and 14.1 million
new cancer cases were reported in 2013 worldwide (http://globocan.iarc.fr). It is
anticipated that the global yearly number of deaths should reach 17 million in 2030. As
such, research progress in cancer treatment is real but insufficient. Cancer is a genetic
disease that causes a deregulation of gene networks that control cell growth and
dissemination. As a result, methods for modelling gene networks are central to any modern
approach of the molecular biology of cancer. Moreover, the sequencing of the human
genome and subsequent genomic revolution has impacted cancer research at the molecular
level due to high throughput technologies like microarray database (MDB).
As such, this research work focuses on both aspects of systems biology of cancer
separating it into different computational approaches dealing with data driven systems
biology and model driven systems biology. Data driven models are based on
computational statistical tools that can handle high throughput MDB and termed as topdown
models. They deal with two types of statistical analysis known as a low level
analysis dealing with background correction, normalization using a model based
expression index (MBEI) method along with high level analysis dealing with filtering of
genes to find interesting genes, hierarchical clustering of filtered genes, genetic association
study and gene ontology data mining/enrichment analysis. The central dogma of
microarray data analyses is the third research concern in this work. The invaluable
information produced after analyses can pave the way for innovative opportunities for
early diagnosis of malignancies. This research work can enhance further research in
diagnostics, prognostics, disease markers, target validation and targeted therapies using
contemporary bioinformatics at a later stage. The list of significant genes or differentially
expressed genes helps to find the functional relationships between genes in MDB
warehouses by linking it to annotations of GO. For instance, a precautionary double
mastectomy on finding the BRCA1 gene with only 87% probable chance of acquiring the
disease shows the promising nature of this field.
On the other hand, another approach on how dynamical mathematical models can
provide novel insight that cannot be done by doing experiments. Model driven dynamical
models or bottom-up models approach is the opposite of a top down model. With the
bottom up model, it begins with a hypothesis of a biological mechanism. After having this
hypothesis, equations are written down to describe how the components in the biological
system interact with one another. Then simulations are run to generate predictions for what
would happen under different conditions. Some of the keywords associated with bottom up
models are ordinary differential equations, computational tools of dynamical systems to
interpret the output and methods for parameter estimation, partial differential equations
and stochastic models. The focus of the final research concern deals with developing
models consisting of systems of differential equations and using computational tools of
dynamical systems in order to interpret the results of these simulations. Therefore, a multiscale
computational approach of tumour growth model is presented. A mathematical
model is developed for tumour growth and angiogenesis to simulate the solid tumour
growth/progression with chemotherapy drug and anti-angiogenesis drug estimation using
partial differential equation (PDE) modelling. The PDE compartmental model
incorporated spatiotemporal processes including cellular and tissue-mediated diffusion,
cellular transport and migration, cell proliferation, angiogenesis, apoptosis, vessel
maturation and formation to model tumour progression and transition from avascular to
vascular growth. The angiogenesis process coupled with the solid tumour growth model on
a reaction–diffusion kinetics framework portrayed the spatiotemporal development of the
generalised functions of a tumour’s micro-environment viz., nutrients and growth factors
that regulate the tumour’s growth during angiogenesis. Most cancers involve an
endothelial growth factor receptor/extracellular signal-regulated kinases (EGFR/ERK)
signalling pathway that are related to the cell-division cycle promoting tumour cells.
Treatment is studied from tyrosine kinase inhibitors (TKI) in EGFR signalling, which are
distributed through the blood vessels of a tumour’s microvasculature. This showed a huge
potential for in-vitro experiments due to the availability of clinical and expression data
information, which helps in learning about the responses to treatment. Using ordinary
differential equations to model the systems pathway of downstream pathway of EGFR
signalling (SOS RAS RAF MEK ERK PI3K AKT), we performed
computational simulations to determine the facilitation of glucose, oxygen, tumour
angiogenesis factor (TAF), drug (TKI), tumour growth factor alpha (analogue of EGFR)
and angiogenesis inhibitor. The simulation results showed signalling pathways of TKIEGFR
and IGF1R regulation of various active cells, migrating cells, proliferative cells,
apoptotic and quiescent cells could be a united behaviour for the entire profile of tumour
growth. The results established the dual role behaviour played by angiogenesis as TKIEGFR
and VEGF inhibitors are furnished to diminish tumour incursion. In addition, the
neovasculature can transport nutrients to neoplasm cells to continue cell metabolism, thus
enhancing the rate of cell endurance. Hence, simulation results suggest that the coexpression
of EGFR and IGF1R activates a higher number of ERK receptors compared to
down and over-expressions. There is a good agreement between the simulations, an
experimental wild type mouse model, and clinical data.
It can be concluded that this work may not be able to solve the numerous
convoluted issues in the field of biotechnology, but it can address issues in gene ontology
data mining using contemporary bioinformatics taking the example of a plants physiology
database and state of the art work related to cancer systems biology.