dc.description.abstract |
In recent years, there has been increasing interest in using networkbased
resources for large scale data-intensive computation problems. The Grid
computing systems are rapidly gaining importance due to their capability to deal
with the continuously rising demands of computational and storage resources.
A grid may consist of resources owned by a number of different organizations,
within which sharing arrangements have been established. Grid computing can
be defined as coordinated resource sharing and problem solving in dynamic,
multi-institutional collaborations.
Scheduling is an important research topic in Grid computing. The
purpose of grid is to offer high quality services to members of Virtual
Organizations (VOs) based on the efficient use of the available resources. This
goal cannot be achieved without a good scheduling strategy applied to the local
level (clusters) and global level (entire system) of Grids. Since the grid
scheduling problem is NP-hard, we can afford only sub-optimal solutions to
this problem. Grid scheduling scenario has additional complexities due to
scattered data storages and consequent delays in data transfers, coordination
required in the execution of linked tasks and arranging the required data
interchange. Further challenges are related to the dynamic nature of Grid
systems where the changing availability and quality of resources makes the
scheduling decisions still more difficult.
Optimally scheduling a single set of dependent tasks to multiple
heterogeneous resources is an algorithmically hard problem. This problem
becomes even harder when optimization of multiple criteria (at times conflicting)
is required. Quality of Service (QoS) requirements of applications and
resources add an extra layer of complexity to scheduling while matching the
tasks to resources. Due to the dynamic nature of grid resources, the scheduling
of scientific application tasks is still an issue and is an open problem. This
thesis attempts to improve upon these frontages by contributing new single
criteria and multi-criteria scheduling heuristics to satisfy performance objectives
like makespan, economic cost, resource consumption, etc while meeting the
QoS requirements like trustworthiness, availability, network bandwidth, etc.
We have classified the research work in three parts: (1) Single Criteria
Scheduling - Scheduling of application tasks meeting a single objective function
like makespan or economic cost, (2) Bi-Criteria Scheduling - Scheduling of
tasks meeting two objective functions simultaneously, and (3) QoS Oriented
Multi Criteria Scheduling - Scheduling of tasks meeting more than one
objective functions while respecting QoS requirements like bandwidth,
availability and trustworthiness of resources.
The research work in single criteria scheduling has been classified in two
categories: (1) Time based scheduling, and (2) Cost based scheduling. In time
based scheduling, we have developed a new scheduling heuristic called
Scheduling with Heterogeneity using Critical Path (SHCP) algorithm for grid
computing systems. The performance of workflow scheduling greatly depends
on task sequence obtained using b-level (longest directed path considering
average computation and communication cost from concerned task to exit
task). But, in grids, the above approach does not fit due to high heterogeneity in
computation costs of tasks on different resources and communication costs of
data transfers among different tasks (scheduled on different resources) due to
variable network bandwidth. In this research, computation and communication
heterogeneity factors have been formulated based on 'Standard Deviation'
which reflects the spread of execution time/ communication time of a task/ data
transfer in workflow. The expected computation cost of each task is determined
using computation heterogeneity factor and the expected communication cost is
computed using communication heterogeneity factor rather than taking average
computation and communication costs. The priority based task sequence is
generated using these costs. The tasks on critical path in workflow are given
priority in the task sequence. The SHCP algorithm is applied to this task
sequence to generate the schedules. The simulated result analysis shows that
SHCP generates comparatively shorter schedules for large workflow
applications in grids.
In cost based scheduling, task duplication and compaction strategy has
been adopted in scheduling. This strategy adopts a mechanism to minimize the
number of duplications (i.e., duplication cost overhead) to optimize the
schedules. The scheduling algorithms (HED and RD) have been developed and
validated for heterogeneous and homogeneous computing systems
respectively. In this approach, the schedule is obtained using duplication based
scheduling and then useless duplications and unproductive sub-schedules are
removed from the above schedule to minimize the effective consumption of
resources. The simulated result analysis shows that HED (Heterogeneous
Economical Duplication) and RD (Reduced Duplication) algorithms generate
comparable schedules with remarkably less duplication overheads and less
processor consumption in heterogeneous and homogeneous computing
environments respectively. The EDS-G (Economical Duplication Based
Scheduling in Grids) inspired from HED strategy has been developed and
validated in a simulated grid environment.
In bi-criteria scheduling, two heuristics have been suggested i.e.,
compaction based bi-criteria scheduling (SODA) and two phase bi-criteria
scheduling (DBSA). A compaction based scheduling (SODA) comprises two
stages: (1) duplication based scheduling-optimizes the primary criterion, i.e.,
execution time, (2) compaction of schedules - minimizes the processor
requirements and optimizes secondary criterion, i.e., economic cost without
increasing the makespan obtained in primary scheduling. It exploits duplication
strategy to obtain a shorter schedule in primary stage (primary scheduling) in
order to minimize makespan (primary criteria). In secondary scheduling, the
primary schedule is scanned to identify and remove the useless duplications
and unproductive schedules, if any, without letting it to increase the makespan
obtained in primary scheduling. In addition, the above modified schedule is
further improved by swapping tasks among resources (from costlier to cheaper
resource) in order to minimize economic cost (secondary criteria) without
affecting the makespan. The experimental results reveal that the proposed
approach generates schedules with low processor requirements which are fairly
optimized for both economic cost and makespan for executing DAG
applications in the grid environments.
The Two phase bi-criteria scheduling (DBSA) is an extension of SODA
approach which adopts the concept of sliding constraint in secondary
scheduling phase. The primary phase of this approach is similar to SODA
algorithm. In next phase, the primary schedule is first modified in order to
minimize useless duplications and unproductive schedules, then it is improved
by swapping tasks among resources (from costlier to cheaper resource) in
order to minimize economic cost (secondary criteria) while letting the makespan
to deteriorate within sliding constraint limits. This algorithm (DBSA) is
developed and validated in the simulated grid environments. The simulated
results show that DBSA surpasses SODA heuristic in terms of economic cost
due to sliding makespan. It also generates better schedules which are fairly
optimized in respect of both makespan and economic cost.
Further, QoS oriented multi-criteria scheduling is discussed for (a) Trust
based and (b) Availability aware scheduling. The research presented here
considers the QoS constraints (trust, availability) of grid resources for
scheduling application tasks using multiple criteria scheduling approach. This
research work has been classified as trust based multi-criteria scheduling and
availability aware QoS scheduling. In trust based multi-criteria scheduling, trust
has been used as QoS requirement of workflow tasks. Trust is a major concern
of resource users and owners in the grid environment with uncountable and at
times unreliable nodes. An intelligent scheduling mechanism is essential which
may require several different criteria to be considered simultaneously when
evaluating the quality of solution or a schedule. A trust-oriented multi-objective
scheduling (Trust-MOS) heuristic is designed to schedule the tasks to highly
trustworthy resources in the grid environment. This approach has been divided
into two phases. In primary phase, each task is scheduled on to a resource
which fulfills the trust requirements and minimizes the makespan. The schedule
generated in primary phase is then modified using a greedy approach to
minimize the economic cost as well keeping the makespan within specified limit
based on a sliding constraint. The Trust-MOS scheduling algorithm has been
implemented in simulated grid environments. The simulated results reveal that
Trust-MOS generates good quality schedules in terms of trust overheads, and
reduces probabilities of task failure, hence is well suited for executing tasks in a
secure manner.
Availability aware QoS scheduling strategy considers availability as QoS
of compute resource and bandwidth as QoS of underlying network in grid. A
novel availability aware QoS oriented algorithm (AQuA) is developed for
executing applications in grid environment. The application tasks specify the
availability and bandwidth as QoS requirements to select the highly available
compute resource over dedicated network. Such tasks get priority in scheduling
over the qualified resources. After mapping of such tasks, the other tasks not
having QoS requirements are mapped. The AQuA algorithm has been validated
in simulated grid environment and results have been compared with the exiting
QoS Guided Min-Min (QGMM) algorithm. The comparative result analysis
shows that AQuA is able to prioritize the highly available grid resources and
therefore increases the reliability to successfully execute applications without
adversely affecting the makespan. In future, the research has been proposed to
be extended towards the development of multi-criteria scheduling techniques
for simultaneous execution of multiple workflows in grids. |
en_US |