Abstract In computer
science database is the large repository of data. To give facility of using
database over large geographic area in many organizations we distribute and
centralized the data. In
distributed database data is stored on multiple sites. Distributed database
system consists of loosely coupled sites that share no physical
components. Fragmentation and
Replication are two techniques to distribute data on different sites. In
Replication, electronic data is frequently copied in different sites. Copies of
database is stored on multiple sites that seems to transparent to the end
users. Replication has three types snapshot replication, transactional
replication and merge replication. In this paper, I discuss which type of
replication is better in which condition.
Database, Replication and Fragmentation.
computer science, database is large repository of data or database is a
collection of information that is organized in some manner. Databases can
easily be accessed, updated and managed. Software that is used to manage a
database is called DMBS. Database and DBMS are combine called DMS. Due to data
communication needs over large geo-graphic area data should be distributed.
Many techniques are used for data-communication over large geo-graphic area
like centralized, distributed and cloud computing. In centralized database,
there are many drawbacks like in case of failure of network there is no access
of data. So, in distributed database data is distributed in multiple sites and
different copies of same data is distributed on different sites. So, one sites
goes down and its data can be retrieve from other sites. Also in centralized
database there too much load of transaction on centralized server. In an
environment of distributed database 90% of transactions are local transaction
so it’s very easy to deal with these local transaction by local sites. Cloud
computing is also a type of distributed database system. Social media giants
like Google, face book and YouTube are using this cloud computing.
In distributed database, we have two
techniques to distribute the data. One is fragmentation and other is
replication. In fragmentation we distribute some fragments of data on multiple
sites. In replication we distribute an electronic copy of data different sites
like a computer or server. Due replicated data users can access data relevant
to their tasks without any interfere with other users.
Distributed database architecture looks
likes in following figure.
Data replication improves the availability
of data by copying data on multiple sites. Data can fully be replicated which
means that the whole copy of database is stored at every site. Data can also be
partially replicated which means that some fragments of the database are
replicated. Following are some advantages of data replication
one of the sites containing fragment F fails, then fragment F can be obtained
from other site. Thus queries (involving fragment F) can be continued to be
processed in spite of damage of one site.
execution becomes fast because sites containing R can process queries in
parallel. This makes to faster query execution.
replication reduces movement of data on network so it will increase the
processing speed. The more replicas of a relation is there greater will be the
chance of finding data where the transaction is executing.
There also also
some disadvantages of data replication
of data will become complex because in updation of data every replicas must be
is required to save same type of data at different sites.
techniques and concurrency will be more advanced and more expensive.
In general, data
replication enhances the performance of read 0perations and also increases the
availability of data. However updation of data has incurred greater overhead.
In the case of concurrent updates by several transactions data replication
becomes very complex than is using centralized approach. We can simplify the management of replicas of relation r by
choosing one of them as the primary copy of r. For example, in a banking
system, an account can be associated the site in which the account has been
opened. Similarly, in an airline-reservation system, a flight can be associated
with the site at which the flight originates.
SQL Server is
software used for database management. SQL Server 2000 gives 3 types of data
replication: snapshot, transactional and merge, each has own benefits. Before
discussing the types of replication one should know few terminologies like
publisher, distributor and subscriber.
is an instance of database which makes data available to other locations
through data replication. The publisher may have one or more publications, each
have a logically related set of data to replicate. A distributor is a database
instance that acts as a store for data replication associated with one or more
publishers. Every publisher is associated with a single database at the
Distributor. A distributor stores replication status data, metadata about the
publication. In some cases distributor acts a queue for moving of data from the
Publisher to the Subscriblers. In many cases a database server acts as both the
publisher and the distributor. This is called a local distributor. In case,
where publisher and the distributor are configured on different database server
instances, the distributor is known as a remote Distributor. A subscriber is a
database that gets the replicated data. A Subscriber can gets data from
different Publishers. Depending upon the type of replication, the Subscriber
can pass data updation back to the publisher or republish the data to other
In Snapshot replication a “snapshot”
of the data on one server has taken and moves that data to another server or
other database on the same server. After taking the initial synchronization
snapshot, replication can refresh data in tables based on the schedule you
give. Snapshot replication is the easiest type of replication to setup and
maintain because in this replication when copying all data each time a table is
scheduled refreshes, data on the publisher might be very different from the
data on subscriber.
transactional replication data is copied from the publisher to the subscriber
once and then giving transactions to subscriber as they happen on the
publisher. The first copy of the data is transported by using the same
mechanism of snapshot replication. SQL Server takes a snapshot of data on the
publisher and sends it to the subscriber. As database users update, delete or
insert records on the publisher, transactions are forwarded to subscriber.
Transactional replication is useful in such environments that have a dependable
dedicated network between database servers that are participating in the
replication. Database servers subscribing to transactional publications do not
update data; the use data for read only purposes. However SQL Server does
support transactional replication that allows changes in data on subscribers.
In merge replication data is combined
from different sources into a single central database. Same as of transactional
replication, merge replication uses initial synchronization by taking the
snapshot of the data on the publisher and sends it to the subscribers. Unlike
transactional replication, merge replication allows updations of the same data
on Subscribers and publishers, even when subscribers are not connected to the
network. When subscribers connect to the network, replication will combine and
detect updation from all subscriber and change data on the publisher. Merge
replication is useful where you have a need to modify data on remote sites and
when subscribers are not guaranteed to have continuous connection to the
A significant progress
has been seen in development of heterogeneous DDBMS. It is, however, not yet
possible to buy a system off the shelf that can connect all popular data
models, DBMS, and provide full support for distributed query management, schema
integration and transaction management. There are different systems that have
wide range of different computer systems, operating systems and networks.
Gateways are being developed from these systems to other database management
systems. These systems offer only limited schema integration capabilities,
without system support for horizontal or vertical fragmentation or replicated
data although this is expected in near future.
Major approaches of data accessing and sharing
have been discussed in this research. These approaches includes initial file
and database unload /load and PC download, common interfaces on top of existing
DBMS, R and proto type efforts towards long range goals. Commercial
availability of the more encompassing thrusts may become a reality with
mounting problems, opportunity costs and demand for data sharing in the
heterogeneous world. Different profeciencies of database systems has been
I would like to thank my colleague Mr.
Habibullah Lecturer in Govt Commerce College Gujranwala for giving guide how to
write term paper. I would also like to thank my brothers Muhammad Ali (PhD)
from Malaysia in computer science Assistant Professor COMSATS Vehari Campus and
Arshad Shahzad (PhD) from Italy in Petroleum working as Head of the Department
of Petro-Gas Department in NFCIET, Multan who encourages me for doing MS
“B. Allcock, J. Bester, J. Bresnahan, et al.
Efficient Data Transport and Replica Management for High-Performance
Data-Intensice Computing. In 18th IEEE Symposium on Mass Storage Systems and
9th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego, April 17-20 2001.”