Business Listings Data Sets

Below are links to the business-listings data sets used in the Local Algorithms for Interactive Clustering study.

Each data set corresponds to an over-clustering instance (a cluster intersecting several ground-truth clusters) studied in the "Clustering Business Listings" section. Each over-clustering instance was split using different algorithms; the correctness of the resulting splits was then evaluated.

These data sets have been anonymized: each one is represented by an n x n similarity matrix in [0,1], and a 1 x n ground-truth vector in {1,2,...,k}. Here n is the number of elements in the data set and k is the number of ground-truth clusters that this cluster intersects.

There are 20 files specifying the similarity matrices and a single file specifying the ground-truth labels, which has 20 rows. The i-th row of the ground-truth labels file gives the ground-truth vector for the i-th similarity matrix.

Similarity Matrix 1

Similarity Matrix 2

Similarity Matrix 3

Similarity Matrix 4

Similarity Matrix 5

Similarity Matrix 6

Similarity Matrix 7

Similarity Matrix 8

Similarity Matrix 9

Similarity Matrix 10

Similarity Matrix 11

Similarity Matrix 12

Similarity Matrix 13

Similarity Matrix 14

Similarity Matrix 15

Similarity Matrix 16

Similarity Matrix 17

Similarity Matrix 18

Similarity Matrix 19

Similarity Matrix 20

Ground-Truth Labels