A few days ago Tom Steinberg mentioned to me an idea which apparently cropped up in mySociety discussions: to create some way to make informal online polling demographically representative. (He also suggested applying the same sorts of ideas to my Political Survey, of which more later.)
Usually, online polls -- by which I mean the kind that appear on the front of Slashdot or web logs or whatever -- are completely unrepresentative, because their audience is self-selected and bears no relation to the population at large. (A second problem is that very few people typically complete them, but that's independent of the sampling problem.) The way that `proper' polls fix this is to decompose the whole population into some set of equivalence classes (described by social class, region, age and so forth) get the same type of information about the respondents (either by asking them or inferring it some other way) and then weight the results of the poll so that the distribution of the respondents over the same categories matches that of the whole population.
Then you assert that the results of the poll of your small sample reflect the results you would get if you asked your questions of the whole population. If you're competent, you also state some confidence bounds, indicating the likelihood that the results are wrong. There's an example of the sort of thing in this page of results from an ICM poll about higher education funding, where the various categories are shown; and this briefing note from the Parliamentary Office of Science and Technology has another non-technical description, as does this article from a Dr. Roger Mortimer of MORI.
Tom's idea, I think, was to have some site which invited people to answer a set of general demographic questions (you can see the sorts of things used at the registration page for people wanting to join YouGov, an internet polling outfit); this site would then give you a cookie which would carry the demographic information (anonymously, presumably), so that later on if you filled in an online poll which supported the protocol, it could weight its results demographically, and produce more representative results than do the typical `Do you like (a) ski-ing; (b) meetings?' polls that adorn web logs all over the place.
This is a neat idea, technologically, though it's not clear that enough people would be interested to make it worthwhile. But his idea leads to the following, slightly inchoate, speculation:
There are lots of organisations -- I'm thinking particularly of small volunteer organisations, of which CDR is an example -- which could make good use of polling data in their campaigns. But polling is expensive. The polling companies don't seem to quote numbers on their websites, but a cost of £10,000 for a 1,000-person sample with a handful of questions is apparently typical. ICM quote £400 per question on a telephone poll, but I don't think this includes the cost of analysing the data.
This puts polling well out of reach of small organisations, so that much of their campaigning is not really informed by public opinion.
Hence the idea: with internet polling, the marginal cost of actually performing the poll and collating the results must be very much smaller than that; in fact, YouGov need only stick a questionnaire up on their site and send an email invitation to enough of their volunteers to get a representative sample. YouGov offer `prizes' to people who complete the survey:
We are currently running a survey on the site and are very interested in your views.
Each member of the Polling Club who takes part in the survey will be entered into a draw to win one prize of 500 pounds. There are also 20 prizes of 50 pounds to be won.
-- the prizes, presumably, being necessary to convince respondents to take the time to do the survey.
Once YouGov have collected the data, the process of analysing the sample to obtain the properly weighted results is (or should be...) automatic; and it is well-understood. The only cost of conducting the survey -- once the questionnaire is written and the software debugged -- is £1,500, the sum of the prizes.
The statistical techniques and software to apply them are well-understood, and could be reproduced rather easily by a small team of competent people. Running the website is technically trivial.
So my proposal is this: somebody should build the infrastructure for on-line polling -- that is, a database of volunteers with demographic information and the means to survey them -- and open it up to anybody to use. The users could offer prizes, like YouGov does, if doing so is necessary to get responses; but the rest of the apparatus can run at almost no cost and with very little maintenance. It could be made available to any organisation which wanted to conduct a poll and had enough expertise to write the survey questions. The software could turn polling into a staple, not a luxury. The result? We might learn a lot.
As a caveat, I must refer again to this report by ICM on the accuracy of internet polling. Their conclusion--
We have found that at present internet polls based on a recruited polling panel may not necessarily produce results that are representative of the population as a whole, even after very considerable weighting of the results has been undertaken or care exercised to ensure that those who are asked to complete an internet poll are demographically and politically representative of the whole population. Being on the internet reflects a difference of attitude towards life that is to a significant degree independent of socio-economic background.
-- which is certainly a problem with any such proposal; but it is one which (a) can be measured, albeit at some expense; (b) should become less significant as more people get access to the internet, either through computers or `interactive TV'. (In marketing-speak that would be `as internet penetration increases', which sounds quite wrong to me....)