occasions (2) occasions (5.5) occasions (10) occasions (15.5) occasions (22) occasions (28)

1 Simulation Appendix Validity Concerns with Multiplying Items Defined by Binned Counts: An Application to a Quantity-Frequency Measure of Alcohol Use By James S. McGinley and Patrick J. Curran This appendix summarizes key findings from the simulation noted in Footnote 3 of our manuscript entitled Validity Concerns with Multiplying Items Defined by Binned Counts: An Application to a Quantity-Frequency Measure of Alcohol Use. We evaluated the four validity concerns (i.e., overestimation of consumption, reversals in relative ranks, nonmonotonic QF estimates, and lack of invariance) by generating open-ended quantity and frequency counts consistent with past 30 day alcohol use data obtained in practice (r=1,000 replications with n=1,000 individuals for each replication). Table 1 shows the ordinal quantity and frequency measures used in the simulation. By crossing the two quantity and frequency items, four unique QF measures were created and then evaluated (e.g., Q 1 x F 1 = QF 11 ; Q 2 x F 1 = QF 21 ; Q 1 x F 2 = QF 12 ; Q 2 x F 2 = QF 22 ). Table 1. Quantity and frequency measures used in our simulation study. Frequency 1 (F 1 ) 0. 0 occasions (0) 1. 1-2 occasions (1.5) 2. 3-5 occasions (4) 3. 6-9 occasions (7.5) 4. 10-19 occasions (14.5) 5. 20+ occasions (25) Frequency 2 (F 2 ) 0. 0 occasions (0) 1. 1-3 occasions (2) 2. 4-7 occasions (5.5) 3. 8-12 occasions (10) 4. 13-18 occasions (15.5) 5. 19-25 occasions (22) 6. 26+ occasions (28) Quantity 1 (Q 1 ) 0. 0 drinks (0) 1. 1-2 drinks (1.5) 2. 3-5 drinks (4) 3. 6-9 drinks (7.5) 4. 10-14 drinks (12) 5. 15+ drinks (16) Quantity 2 (Q 2 ) 0. 0 drinks (0) 1. 1 drink (1) 2. 2 drinks (2) 3. 3-4 drinks (3.5) 4. 5-6 drinks (5.5) 5. 7-8 drinks (7.5) 6. 9-11 drinks (10) 7. 12+ drinks (13) Note: The numbers in parentheses are mid-values used to calculate the ordinal QF estimates. Q 2 and F 1 are the quantity and frequency items used in the manuscript. These measures are all comparable to those used in applied research. Simulation Descriptives Across all replications, the marginal mean(sd) for quantity and frequency counts were 2.14(2.78) and 2.71(4.55), respectively. Quantity and frequency were correlated.38. We also generated a single binary indicator (across all replications: 50% in Group 1 and 50% in Group 2) to assess the concern of unequal measurement of alcohol use across covariates. The mean(sd) of quantity across all replications for Group 1 and Group 2 were 2.64(3.21) and 1.63(2.15). The mean(sd) of frequency across all replications for Group 1 and Group 2 were 3.31(5.33) and 2.11(3.50). Again, these simulated data are consistent with past 30 day alcohol use data for males (Group 1) and females (Group 2). Simulation Strategy Similar to the empirical demonstration presented in our manuscript, we created ordinal quantity and frequency data by binning the open-ended counts into categories according to the scales defined in Table 1 (Q 1, Q 2, F 1, F 2 ). We computed ordinal QF estimates by multiplying the mid-values of the selected quantity and frequency response categories. We then assessed the four validity concerns described in the manuscript by comparing the ordinal QF estimates to actual consumption, which is derived by multiplying the open-ended counts. The specific details on

2 how comparisons were made to evaluate validity concerns are the same as those documented in the manuscript. 1. QF estimates can overestimate actual alcohol consumption Findings: Results from the simulations showed that, across all four ordinal QF measures, a substantial proportion of ordinal QF estimates overestimated actual alcohol consumption. For example, looking at ordinal QF estimates, 62% of the estimates from QF 11, 79% of the estimates from QF 12, 73% of the estimates from QF 21, and 76% of the estimates from QF 22 overestimated the average actual alcohol consumption by more than 1 drink 1. For comparison, not a single ordinal QF estimate across the four measures underestimated the actual consumption by more than 1 drink. Below are tables that display these results in similar manner to Table 2 from the manuscript. Table 2. QF 11 overestimation QF 11 Actual Consumption QF 11 minus # Reps Mean SD Mean 0.00 1000 0.00 0.00 0.00 2.25 1000 2.07 0.10 0.18 6.00 1000 5.40 0.17 0.60 11.25 1000 10.30 0.48 0.95 16.00 1000 14.33 0.57 1.67 18.00 177 15.69 1.69 2.31 21.75 993 19.64 1.68 2.11 30.00 1000 27.15 0.96 2.85 48.00 14 45.28 3.34 2.72 56.25 806 51.04 2.97 5.21 58.00 998 49.84 3.21 8.16 100.00 15 99.98 12.15 0.02 108.75 410 92.90 6.63 15.85 Table 3. QF 12 overestimation QF 12 Actual Consumption QF 12 minus # Reps Mean SD Mean 0.00 1000 0.00 0.00 0.00 3.00 1000 2.56 0.12 0.44 8.00 1000 6.49 0.31 1.51 8.25 1000 7.77 0.40 0.48 15.00 1000 12.98 0.71 2.02 22.00 1000 19.57 0.82 2.43 23.25 313 22.68 2.15 0.57 24.00 430 19.75 2.66 4.25 40.00 999 36.39 1.94 3.61 41.25 997 36.64 2.10 4.61 62.00 445 57.22 3.80 4.78 66.00 13 59.37 4.51 6.63 75.00 454 68.11 4.18 6.89 88.00 12 81.09 6.64 6.91 1 Tables show QF estimates with adequate data to calculate the mean actual consumption. In order to be listed, replications required n > 10 observations within the given QF estimate and there had to be more than 10 valid replications. Thus, # Reps columns represent the total number of replications used to compute the mean consumption. For example, in Table 2, all 1,000 replications were used to compute the mean actual consumption for the QF estimate of 2.25 whereas the mean consumption for QF estimate 18 is based on 177 valid replications (e.g., each of other non-included 823 replications had n < 10). This was done so the mean actual consumption was not based on a small number of observations (e.g., the mean is not particularly useful with a small number observations).

3 Table 4. QF 21 overestimation QF 21 Actual Consumption QF 21 minus # Reps Mean SD Mean 0.00 1000 0.00 0.00 0.00 1.50 1000 1.37 0.06 0.13 3.00 1000 2.76 0.13 0.24 4.00 1000 3.79 0.13 0.21 5.25 1000 4.71 0.21 0.54 7.50 951 7.19 0.27 0.31 8.00 1000 7.59 0.27 0.41 8.25 1000 7.51 0.45 0.74 11.25 945 10.25 0.92 1.00 14.00 1000 12.98 0.47 1.02 14.50 435 12.88 0.74 1.62 15.00 999 14.11 0.67 0.89 22.00 997 20.60 1.01 1.40 26.25 1000 24.71 1.07 1.54 29.00 540 25.78 1.52 3.22 30.00 532 28.30 1.74 1.70 40.00 60 37.64 2.26 2.36 41.25 739 39.10 1.81 2.15 50.75 948 44.87 2.92 5.88 56.25 67 53.67 3.10 2.58 79.75 327 71.24 4.18 8.51 108.75 11 96.82 5.53 11.93 Table 5. QF 22 overestimation QF 22 Actual Consumption QF 22 minus # Reps Mean SD Mean 0.00 1000 0.00 0.00 0.00 2.00 1000 1.70 0.09 0.30 4.00 1000 3.41 0.18 0.59 5.50 1000 5.14 0.20 0.36 7.00 1000 5.89 0.30 1.11 10.00 632 9.56 0.38 0.44 11.00 1000 9.75 0.43 1.25 15.00 995 12.82 1.33 2.18 19.25 1000 17.68 0.71 1.57 20.00 1000 18.10 1.26 1.90 26.00 38 25.32 3.51 0.68 30.25 992 28.16 1.47 2.09 35.00 975 32.84 1.55 2.16 41.25 432 38.53 2.31 2.72 54.25 155 51.47 2.70 2.78 55.00 923 51.65 2.70 3.35 75.00 14 70.73 4.18 4.27

4 2. QF estimates can lead to reversals in relative rank for alcohol consumption Findings: Like the empirical demonstration from our manuscript, across all simulation conditions, there were reversals in relative ranks for consumption. In order to demonstrate this, Figures 1 through 4 display plots of ordinal QF estimates against actual consumption for 3 randomly selected replications. These plots are similar to Figure 5 from the manuscript. Thus, the same logic from the manuscript about reversal rank applies to these simulated data so we do not detail this further here. Interestingly, from these plots, we can visualize inaccuracies with ordinal QF estimates. For example, if ordinal QF estimates did perfectly measure actual consumption, all of the data points would fall on the diagonal lines. However, the plots show that the ordinal estimates both overestimated (e.g., dots above the line) and underestimated (e.g., dots below the line diagonal line) actual alcohol consumption. Figure 1. Scatterplots for random selected replication #228

5 Figure 2. Scatterplots for random selected replication #567 Figure 3. Scatterplots for random selected replication #872

6 3. QF estimates can be non-monotonically ordered Findings: Examining Tables 1 through 4 shows that three of the four unique QF measures had estimates that were non-monotonically ordered (e.g., QF 11, QF 12, and QF 21 ). For example, observing Table 1 for the QF 11 measure, the average alcohol consumption for a QF estimate of 58 (49.84) was smaller than the average alcohol consumption for the estimate of 56.25 (51.04). For the QF 12 measure shown in Table 2, the average alcohol consumption for a QF estimate of 24 (19.75) was smaller than the average alcohol consumption for the estimate of 23.25 (22.68). Table 3 shows that the average consumption for QF estimate of 14.50 (12.88) was slightly less than that of the QF estimate of 14 (12.98) for the QF 21. Table 4 corresponding to the QF 22 measure suggests that all of the QF estimates were monotonically ordered. In sum, these findings show that, even in a controlled simulation, it is still possible for ordinal QF estimates to lack ordinality. 4. QF estimates may lack invariance as a function of covariates Findings: Tables 6 through 9 display the mean actual consumption for given QF estimates stratified by our binary group variable, x 1, across the four unique QF measures (see Footnote 1 for inclusion criteria for the tables). Across all of the conditions, the results supported the concern that ordinal QF estimates may lack invariance as a function of covariates. More specifically, for all but one QF estimate from one scale (e.g., QF 22 estimate = 20), the group that we simulated to drink more (e.g., Group 1: male group) had higher average consumption compared to the lesser drinking group (e.g., Group 2: female group) despite having precisely the same ordinal QF estimates. This fact is clearly displayed by the consistently positive values in Mean difference column across the unique QF measures. The results show that these mean differences were larger for estimates from the QF 11 and QF 12 measures compared to the QF 21 and QF 22 measures. This is likely caused by the large number of response categories in Q 2 quantity item, which also had un-binned counts for the first 3 response categories. In sum, the findings suggested that ordinal QF estimates may not always represent same amount of consumption across groups (e.g., the same QF estimates represented more actual consumption for Group 1 compared to Group 2).

7 Table 6. Group comparison for QF 11 measure. Group 1 Group 2 Actual Consumption Actual Consumption Mean QF 11 # Reps Mean SD # Reps Mean SD Difference 0.00 1000 0.00 0.00 1000 0.00 0.00 0.00 2.25 1000 2.13 0.14 1000 2.03 0.12 0.11 6.00 1000 5.51 0.25 1000 5.31 0.24 0.21 11.25 1000 10.41 0.62 1000 10.16 0.70 0.25 16.00 1000 14.63 0.78 1000 13.98 0.80 0.65 21.75 659 20.08 2.09 282 18.93 2.17 1.15 30.00 1000 27.53 1.19 996 26.49 1.57 1.04 58.00 871 50.86 3.82 122 47.57 3.98 3.29 Table 7. Group comparison for QF 12 measure. Group 1 Group 2 Actual Consumption Actual Consumption Mean QF 12 # Reps Mean SD # Reps Mean SD Difference 0.00 1000 0.00 0.00 1000 0.00 0.00 0.00 3.00 1000 2.67 0.18 1000 2.50 0.15 0.17 8.00 1000 6.66 0.42 1000 6.30 0.44 0.36 8.25 1000 8.00 0.61 1000 7.57 0.53 0.42 15.00 1000 13.10 0.95 1000 12.78 1.09 0.32 22.00 1000 19.97 1.15 1000 19.05 1.16 0.93 40.00 885 36.99 2.41 311 35.35 2.57 1.64 41.25 878 37.04 2.46 60 35.84 2.34 1.20

8 Table 8. Group comparison for QF 21 measure. Group 1 Group 2 Actual Consumption Actual Consumption Mean QF 21 # Reps Mean SD # Reps Mean SD Difference 0.00 1000 0.00 0.00 1000 0.00 0.00 0.00 1.50 998 1.38 0.10 1000 1.37 0.08 0.01 3.00 1000 2.78 0.19 1000 2.74 0.17 0.04 4.00 829 3.82 0.21 994 3.77 0.17 0.05 5.25 1000 4.78 0.30 1000 4.65 0.29 0.13 7.50 168 7.21 0.31 334 7.16 0.32 0.06 8.00 932 7.62 0.40 983 7.55 0.37 0.07 8.25 1000 7.56 0.60 915 7.43 0.68 0.13 11.25 573 10.31 1.02 24 10.15 1.04 0.16 14.00 999 13.15 0.69 999 12.79 0.67 0.35 15.00 932 14.13 0.93 552 14.09 0.82 0.03 22.00 838 20.72 1.23 296 20.52 1.24 0.20 26.25 876 24.93 1.39 506 24.46 1.47 0.47 50.75 537 45.52 3.30 37 43.98 2.60 1.54

9 Table 9. Group comparison for QF 22 measure. Group 1 Group 2 Actual Consumption Actual Consumption Mean QF 22 # Reps Mean SD # Reps Mean SD Difference 0.00 1000 0.00 0.00 1000 0.00 0.00 0.00 2.00 1000 1.73 0.15 1000 1.69 0.11 0.04 4.00 1000 3.47 0.28 1000 3.37 0.23 0.10 5.50 735 5.18 0.28 960 5.10 0.27 0.08 7.00 1000 5.99 0.41 1000 5.78 0.42 0.21 10.00 33 9.68 0.44 45 9.48 0.45 0.19 11.00 1000 9.80 0.59 1000 9.67 0.59 0.13 15.00 857 12.92 1.61 80 12.59 1.62 0.33 19.25 1000 17.92 1.00 991 17.42 1.04 0.50 20.00 923 18.08 1.67 247 18.13 1.46-0.05 30.25 770 28.36 1.80 151 27.50 1.89 0.86 35.00 547 33.19 1.82 118 32.64 2.18 0.56 Simulation Summary In sum, the findings from our simulation study provided support for the four validity concerns highlighted in our manuscript. More specifically, all four of the assessed QF measures had ordinal QF estimates that overestimated average actual consumption and were subject to reversals in relative ranks of consumption. Further, three out of the four QF measures showed signs of non-monotonically ordered ordinal QF estimates and, to varying degrees, all of the measures produced estimates that lacked invariance as a function of a covariate. In some regards, the QF measures QF 21 and QF 22, which used the second quantity item (Q 2 ), seemed less subject to concerns with non-monotonically ordered estimates and non-invariance of estimates as a function of covariates. This result is not particularly surprising because this quantity item had the largest number of categories and the first three categories do not include binned counts. Thus, in these cases, multiplying mid-values can result in slightly more precise estimates. However, a practical concern with measures such as Q 2 that have more response categories and non-collapsed counts is that they lack the simplicity that is appealing to applied researchers (e.g., small number of categories that can lessen participant s burden, errors in recall, and survey administration time). More importantly, we must recognize that the ordinal QF measures that used the Q 2 quantity item still had serious validity concerns such as overestimation of alcohol use and reversals in relative ranks. For this reason, we generally do not recommend this using this measurement approach for applied research.