Concomitant-Variable LCA
In the procedure summarized below, it is assumed
that there are V dichotomous response variables and a single concomitant
variable, Z, that has been summarized in G ordered categories indexed as
g = 1, 2,... ,G (e.g., the concomitant variable, GPA, for the cheating
data in C. Dayton, Latent Class Scaling Analysis (Sage Publications) has been
summarized in G =
5 ordered categories). Generalization of these procedures to cases with
more than one concomitant variable is straightforward in theory but can
be tedious in practice. In outline, the nonlinear programming procedure
to use in Excel is the following:
1. Place cell labels for
response vectors in Column A (e.g., 0000, 1000, etc.); repeat the labels
in blocks down the column a total of G times. Place concomitant variable
levels, Zg, duplicated in blocks of size 2V in Column
B. These concomitant variable levels would typically be midpoints of intervals
for ordered categories (e.g., 3.125 for the GPA interval 3.00 - 3.25).
Place corresponding observed frequencies, ngs, in Column B.
Note that these frequencies are for the given response vector (i.e., s)
at the given level of the concomitant variable (i.e., g).
2. Place start values for
the conditional probabilities in, say, column K for latent class 1 (LC1)
and in Column L for latent class 2 (LC2). Note that, since p0lAX
= 1 - p1lAX ,etc., these
restrictions should be included as formulae in the spreadsheet (e.g., if
p1lAX = .2 in cell K7,
then cell K8 should contain the formula "=1-K7" for the value of p0lAX).
Place start values (e.g., 1.0 and 1.0) for the logistic parameters, b0
and b1, below the conditional probabilities,
say, in Column K. Some trial and error with the latter start values may
be necessary to find a satisfactory solution.
3. An expected frequency
of the form Ng X P(ys |Zs)
must be computed corresponding to each observed frequency. Note that Ng
is the total frequency of cases at level g of the concomitant variable
where SNg = N. This computation can
be conveniently broken up among five columns. Column D (Labeled "LC1"),
say, contains the computed values of
pilAX X
pjlBX X pklCX
X pllDX
, Column E (labeled "E(LC1)") contains the product of Column D with Ng
and pXl|Zg (i.e., Ng
X (ebo + biZg/
(1 + ebo + biZg
) X pilAX
X pjlBX X
pklCX X
pllDX which
is the component of the expected frequency arising from the first latent
class), Column F (labeled "LC2") contains the computed values of
pi2AX X pj2BX
X pk2CX
X pl2DX, Column
G (labeled "E(LC2)") contains the product of Column F with Ng
and 1 - pXl|Zg (i.e.,
Ng X (1/ (1 + ebo +
biZg ) X pi2AX
X pj2BX X
pk2CX X
pl2DX which is the
component of the expected frequency arising from the second latent class)
and Column H (labeled "Expected Freq") contains the sum of Columns E and
G which yields the expected frequencies, n'gs, for the sth
response vector within the gth level of the concomitant variable.
4. In Column I, compute
components of G2 of the
form: ngsloge(ngs/n'gs).
If ngs = 0, set the component equal to 0. Place a formula for
twice the sum of Column I at the end of the column; this is the G2
value given the current values for the conditional probabilities and logistic
function parameters.
5. Set up the procedure,
SOLVER, in the TOOLS menu as follows: the "Target Cell" field to be minimized
is the column sum, G2; the "by Changing Cells" field contains
the locations of the start values for the conditional probabilities and
logistic function parameters; in the "Subject to the Constraints" field
add constraints corresponding to p0lAX
<= 1, p0lAX
>= 0, p0lBX
<= 1, p0lBX
>= 0etc. Standard options for SOLVER are
adequate for many problems butmay be worth modifying at times(e.g., using
the "quadratic" option under "estimates") and, in general, some trial and
error may be necessary to arrive at the optimal solution. The solution
may be sensitive to start values for the logistic parameters but does not
tend to be sensitive with respect to start values for the conditional probabilities.