Clusteranalyse SPSS - nominal variables

Rich Ulrich

2017-09-01 18:08:51 UTC

Post by c***@gmail.com
Hi everybody,
For my thesis I have to do a clusteranalyse with nominal variables. Unfortunately I have no clue how to do this.. If you could help me out, you would be my hero :)
The participants in my study did a categorizatoin task (open card-sorting). Is it possible to do a cluster task with nominal data? If yes, which cluster analyse in SPSS can I perform (because I can't find it)? If no, can you please tell me how I have to transform my data so I can perform a cluster analyse after all? I read something about transforming the data into binary data.. But I'm not sure if I understood that correctly.

Google tells me that "open card-sorting" means that the number
and labels of categories are not defined before-hand.

It looks like this is being used for describing the content of
web-pages. The first page of hits that Google shows me do
not look to be strongly rigorous. The reference I browsed
suggested (a) that the users should have provided their own
labels for categories; and (b) all data be "cleaned" before analysis -
dropping subjects who did not take the task seriously.
http://uxpajournal.org/card-sort-analysis-best-practices-2/

Questions:
What level of "thesis" is this?
How much training and experience do you have in statistics?
- in using SPSS?
How many Subjects, how many cards, and how many categories?
(Maxiimum categories allowed, specified, or observed?)
What is the range of the number of members in a category?

Comments:
This is not a classical clustering problem that SPSS calls
"clustering", for the initial state of the data.
The basic "data" is a co-incidence matrix for each subject,
showing (Yes/No) whether each pair of cards occur together.

If I were going to fit that into a classical cluster analysis, for
SPSS, I suppose that I would flatten the Yes/No matrix using
the upper triangle of coincidences (AB, AC, AC, ... BC, BD, ... ).

Look at the frequencies of coincidences as percents; high
frequencies show high similarities. I suppose you can move
to clustering from that. I might form initial clusters by using
only the Coincidence items whose frequencies are nearest to
0% and 100%. - I expect you might find similar, published
advice if you read online sources.

If you do your own Google search, you may find something
that gives you advice more fitting to your own problem.

--
Rich Ulrich