Home » A Survey on Data Clustering

A Survey on Data Clustering


Garima Singhal
Department of ECE, NIT Arunachal Pradesh, India
Sahadev Roy
Department of ECE, NIT Arunachal Pradesh, Indiasahadevroy@gmail.com


Data Clustering is a method used to group the unlabelled data based on its similarity according to their type, specification, and other properties. The current agglomeration focuses on those approaches which help to retrieve and categorize the data based on processing speed, size of data it can support, complexity and memory requirement.Navigation through this huge unlabelled collection of data presents a challenge for researchers to select an optimal clustering technique. This paper presents a survey report based on analytical responses obtained from existing data clustering algorithms in order to ease the search and to help the audience to select appropriate clustering algorithm according to their requirement. The algorithms which are covered in this paper have application in pattern recognition, image processing, data mining, machine learning and Artificial intelligence. This survey is also useful for those readers who view it as an accessible introduction to the mature content on computer advancements and its development.


Artificial Neural Networks (ANN);
Fuzzy c-means algorithm(FCM);
Genetic algorithm (GA);
Self-Organising map (SOM);
Self-Organising map (SOM).

download pdf

Cited as

Garima Singhal and Sahadev Roy, “A Survey on Data Clustering,” International Journal of Advanced Engineering and Management, Vol. 2, No. 8, pp. 183-188, 2017.

DOI: https://doi.org/10.24999/IJOAEM/02080042.


  1. Brailovsky, V. L. (1991). A probabilistic approach to clustering. Pattern Recognition Letters, 12(4), 193-198.
  2. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, Inc.
  3. Anderberg, M. R. (1973). Cluster analysis for applications. Monographs and textbooks on probability and mathematical statistics.
  4. Baker, F. B., & Hubert, L. J. (1976). A graph-theoretic approach to goodness-of-fit in complete-link hierarchical clustering. Journal of the American Statistical Association, 71(356), 870-878.
  5. Minis, I., Ampazis, N., & Mamasis, K. (2007). Efficient real-time management of goods distribution to clustered clients. International Journal of Integrated Supply Management, 3(3), 211-227.
  6. Ball, G. H., & Hall, D. J. (1965). ISODATA, a novel method of data analysis and pattern classification. Stanford research inst Menlo Park CA.
  7. Diday, E. (1973). The dynamic clusters method in nonhierarchical clustering. International Journal of Computer & Information Sciences, 2(1), 61-88.,.
  8. Zahn, C. T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on computers, 100(1), 68-86.
  9. Gower, J. C., & Ross, G. J. (1969). Minimum spanning trees and single linkage cluster analysis. Applied statistics, 54-64.
  10. Lu, S. Y., & Fu, K. S. (1978). A sentence-to-sentence clustering procedure for pattern analysis. IEEE Transactions on Systems, Man, and Cybernetics, 8(5), 381-389.
  11. Zadeh, L. A. (1965). Fuzzy sets. Information and control, 8(3), 338-353.
  12. Bezdek, J. C., Coray, C., Gunderson, R., & Watson, J. (1981). Detection and characterization of cluster substructure i. linear structure: Fuzzy c-lines. SIAM Journal on Applied Mathematics, 40(2), 339-357.
  13. Odell, P. L., & Duran, B. S. (1974). Comparison of some classification techniques. IEEE Transactions on Computers, 100(6), 591-596.
  14. Murty, M. N., & Krishna, G. (1980). A computationally efficient technique for data-clustering. Pattern Recognition, 12(3), 153-158.
  15. Jain, A. K., Mao, J., & Mohiuddin, K. M. (1996). Artificial neural networks: A tutorial. Computer, 29(3), 31-44.
  16. Hanrahan, G. (2011). Artificial neural networks in biological and environmental analysis. CRC Press.
  17. Kohonen, T., & Honkela, T. (2007). Kohonen network. Scholarpedia, 2(1), 1568.
  18. Carpenter, G. A., & Grossberg, S. (1990). ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural networks, 3(2), 129-152.
  19. Pal, N. R., & Pal, S. K. (1993). A review on image segmentation techniques. Pattern recognition, 26(9), 1277-1294.
  20. Goldberg, D. E., & Holland, J. H. (1988). Genetic algorithms and machine learning. Machine learning, 3(2), 95-99.
  21. Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The computer journal, 41(8), 578-588.
  22. Grefenstette, J. J. (1986). Optimization of control parameters for genetic algorithms. IEEE Transactions on systems, man, and cybernetics, 16(1), 122-128.
  23. Glover, F. (1989). Tabu search—part I. ORSA Journal on computing, 1(3), 190-206.
  24. Selim, S. Z., & Alsultan, K. (1991). A simulated annealing algorithm for the clustering problem. Pattern recognition, 24(10), 1003-1008.
  25. Ross, G. J. S. (1968). Classification techniques for large sets of data. Academic Press, Inc., New York, NY.
  26. Stepp, R. E., & Michalski, R. S. (1986). Conceptual clustering: Inventing goal-oriented classifications of structured objects.


%d bloggers like this: