On Mon, 19 Oct 2009, Allin Cottrell wrote:
On Mon, 19 Oct 2009, Dorian Litvine wrote:
> My name is Dorian, I'm trying to estimate the data of a double
> bounded dichotomous choice (contingent valuation) thanks to a
> maximum likelihood estimation with a linear specification...
When I first responded to this I was pretty much totally
unfamiliar with double-bounded contingent valuation. I've now
read a few paragraphs on the topic, so here are some further
thoughts -- now based on not-quite-total ignorance ;-). I'm
writing this up for the record in case it's of use to Dorian and
others who might want to use gretl for such a purpose.
[For anyone who's following this I should note that Dorian sent me
a dataset off-list, which helped to clarify what's going on, and
I'll refer below to the variables as named in that dataset.]
So: in the simplest case of double-bounded contingent valuation
the data consist of two "bids", b1 and b2, and two binary
responses, r1 and r2 (with 1 = accept, 0 = not accept). In the
background is (presumably) a survey in which people were asked
something like "Would you be willing to pay b1 to preserve
resource X?" If the respondent said "Yes" (r1 = 1) she is then
asked, "Would you be willing to pay b2 (b2 > b1)?", giving us an
r2 value. And if she said "No" to b1 she's asked, "Would you pay
b2 (b2 < b1)?", again giving us an r2 value.
I'm not sure exactly how the b2 values were formulated, but it
appears that the b1 values (the "first bids") were drawn from the
set {5, 15, 35, 50, 95} with about equal probability. So WTP would
appear to be (roughly) on a scale of 0 to 100.
I take it that the problem is to estimate the parameters of the
distribution of Willingness To Pay (WTP). In a more complex case
one might have data on covariates such as income or demographic
characteristics of the respondents, but the dataset I have to hand
contains only b1, r1, b2 and r2. And in that case I take it that
the idea is that WTP for agent i is just
WPT_i = \mu + u_i
where u ~ N(0, \sigma^2).
So the problem is to get ML estimates of \mu and \sigma, given the
data on {b1, r1, b2, r2} (?)
If so, here is a suggested gretl script, with comments appended
below:
<script>
open <datafile> # contains b1, r1, b2. r2
scalar n = $nobs
series WTP = NA
# crude base estimate of WTP
loop i=1..n -q
if (r1[i] + r2[i] == 1)
WTP[i] = (b1[i] + b2[i]) / 2
elif (r1[i] + r2[i] == 0)
WTP[i] = xmin(b1[i], b2[i]) / 2
else
WTP[i] = (xmax(b2[i], b2[i]) + 100) / 2
endif
endloop
scalar mu = mean(WTP)
scalar sigma = sd(WTP)
# responses No, No
series nn = (1-r1) * (1-r2)
# responses No, Yes
series ny = (1-r1) * r2
# responses Yes, No
series yn = r1 * (1-r2)
# responses Yes, yes
series yy = r1 * r2
printf "cases: nn = %d, ny + yn = %d, yy = %d\n", sum(nn), \
sum(ny+yn), sum(yy)
mle logl = nn*lPnn + ny*lPny + yn*lPyn + yy*lPyy
series E1 = 1 + exp(-(b2-mu)/sigma)
series E2 = 1 + exp(-(b1-mu)/sigma)
series lPnn = log(1/E1)
series lPny = log(1/E2 - 1/E1 + (b2 > b1)*100)
series lPyn = log(1/E1 - 1/E2 + (b2 < b1)*100)
series lPyy = log(1 - 1/E1)
params mu sigma
end mle --hessian
</script>
Comments:
(1) My "crude base estimate of WTP" above is not defensible as a
proper estimate, but is just designed to get "ballpark" starting
values for plausible \mu and \sigma. The values I got from
Dorian's data (namely, \mu = 35.6, \sigma = 31.3) are sufficient
to show that the initialization Dorian gave (\mu = 3, \sigma = 1)
is not at all useful. (On that initialization many of the
observed responses will have likelihood machine-zero and the
loglikelihood will not be finite.)
(2) In my specification of the mle block I have assumed that the
likelihood calculations given in Dorian's code are correct, but I
have re-written them in such a way that (a) the reader can
(hopefully) see what's going on more clearly, and (b) we're not
calculating over and over again quantities which can be calculated
either just once (the "response pattern" series nn, ny, etc.) or
once per MLE iteration (the parameter-based magnitudes that I have
called E1 and E2).
Anyway, using the data Dorian sent me I got the following result:
Model 1: ML, using observations 1-2046
logl = nn*lPnn + ny*lPny + yn*lPyn + yy*lPyy
Standard errors based on Hessian
estimate std. error t-ratio p-value
------------------------------------------------------
mu 21.8207 1.34016 16.28 1.32e-59 ***
sigma 31.3319 1.05415 29.72 3.95e-194 ***
Log-likelihood -2650.268 Akaike criterion 5304.537
Schwarz criterion 5315.784 Hannan-Quinn 5308.662
This looks fairly sensible. The ML \sigma estimate is close to
what I got via a naive calculation. The ML \mu estimate is quite
a bit less than what I got at first, but that seems reasonable in
light of the fact that the modal response in this dataset was (No,
No) -- indicating unwillingness to pay either of the quoted
amounts -- along with the fact that the handling of that case in
my crude calculation was somewhat arbitrary.
Allin Cottrell