Streaming amount of data set stored in secondary memory

Streaming Clustering:The term data stream is the capacity to develop the fast sequence of information.

The concept of data streaming is more suitable than a datasets.It suitable as a model to access to large amount of data set stored in secondary memory where performance required linear.(i) it consists of a continuous flow of large data sets .

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

(ii)  it is quickly develop data that occurs in real time with quick response requirements. (iii) multiple access to the data stream is  impossible to  process it and are able to access the data once.  (iv)  storage of the data stream is restricted so only a summary of the data can be saved to  find the crucial data is a challenging task and  (v) it is multidimensional  so  algorithms are required to mine streaming data.The methods of data streaming clustering are Hierarchical methods, Partitioning methods, Grid-based methods, Density-based methods, Model-based methods which are described below.·         Hierarchical methods:   clustering techniques in hierarchical, which can be divided in two  methods  namely heap or cluster and divisive. It  merges a set of ‘n’ objects into general categories and  divides ‘n’ objects into smaller clusters in order.

However in hierarchical agglomerative clustering (HAC) is more used frequent  method with the option of manually determining the number of clusters . Online divisive agglomerative clustering (ODAC) is a time series data stream clustering technique used  to handle concept of both heap  and divisive hierarchical methods.·         Partitioning methods:  The  partitioning techniques such as k-median and k-means are the data stream clustering. The k-median-based clustering algorithm is  the Stream LSearch algorithm which have  been proposed for clustering high quality data streams. It is part of two  sequence starting with the determination of sample size by the STREAM algorithm. Then ,when the size of the sample is larger than the outcome determined from a predefined equation, the LSEARCH algorithm is then applied.The k-means algorithm is used to create binary data stream clusters for  Several experiments to modified algorithm is far better than the scalable k-means approach.·         Grid-based methods:  Grid-based clustering algorithms such as  WaveCluster  have a very unique characteristics of processing time and it is not dependent on the number of data points, which makes them fast.

These algorithms use  a multi-resolution grid structure and this structure separates an object’s space into a predefined  number of cells.·         Density-based methods: It as ability to detect arbitrary shaped clusters and also have the ability to handle noise and  they require  time to scan raw data. According to such algorithms do not require prior knowledge of the number of clusters (k) unlike k-means algorithms that need to be given the number of clusters in advance. Advantage:It is scalable,sturdy,speed and storage capacity Disadvantage: