Streaming Clustering:The term data stream is the capacity to develop the fast sequence of information.
The concept of data streaming is more suitable than a datasets.It suitable as a model to access to large amount of data set stored in secondary memory where performance required linear.(i) it consists of a continuous flow of large data sets .
(ii) it is quickly develop data that occurs in real time with quick response requirements. (iii) multiple access to the data stream is impossible to process it and are able to access the data once. (iv) storage of the data stream is restricted so only a summary of the data can be saved to find the crucial data is a challenging task and (v) it is multidimensional so algorithms are required to mine streaming data.The methods of data streaming clustering are Hierarchical methods, Partitioning methods, Grid-based methods, Density-based methods, Model-based methods which are described below.· Hierarchical methods: clustering techniques in hierarchical, which can be divided in two methods namely heap or cluster and divisive. It merges a set of ‘n’ objects into general categories and divides ‘n’ objects into smaller clusters in order.
However in hierarchical agglomerative clustering (HAC) is more used frequent method with the option of manually determining the number of clusters . Online divisive agglomerative clustering (ODAC) is a time series data stream clustering technique used to handle concept of both heap and divisive hierarchical methods.· Partitioning methods: The partitioning techniques such as k-median and k-means are the data stream clustering. The k-median-based clustering algorithm is the Stream LSearch algorithm which have been proposed for clustering high quality data streams. It is part of two sequence starting with the determination of sample size by the STREAM algorithm. Then ,when the size of the sample is larger than the outcome determined from a predefined equation, the LSEARCH algorithm is then applied.The k-means algorithm is used to create binary data stream clusters for Several experiments to modified algorithm is far better than the scalable k-means approach.· Grid-based methods: Grid-based clustering algorithms such as WaveCluster have a very unique characteristics of processing time and it is not dependent on the number of data points, which makes them fast.
These algorithms use a multi-resolution grid structure and this structure separates an object’s space into a predefined number of cells.· Density-based methods: It as ability to detect arbitrary shaped clusters and also have the ability to handle noise and they require time to scan raw data. According to such algorithms do not require prior knowledge of the number of clusters (k) unlike k-means algorithms that need to be given the number of clusters in advance. Advantage:It is scalable,sturdy,speed and storage capacity Disadvantage: