I’ve been asked by several bloggers to comment on the 08 Socha-Gelbamnn EDD Survey. I’ll make a point on one specific issue below but generally my reaction this year is the same as it was for the past two years when I wrote a sidebar for Monica Bay in Law Technology News. (I figure she didn’t ask me to write it again this year since she assumed she could just change the dates and print the same column … and she’d probably be correct) So instead I’ll quote a comment from across the pond which pretty much summarized my feelings.

British lit support technology consultant Andrew Haslam of Allvision Computing posted his thoughts on The Orange Rag blog and said, in part:

*“The survey is an annual look at the vendors in the EDD marketplace, written by two independent consultants, George Socha and Thomas Gelbmann. The 2008 version has just been released after its nine month creation process, and is the sixth in the series. Vendors are ranked on how they operate in each of the stages defined by the Electronic Discovery Reference Model (itself an industry agreed model of the various stages of EDD). The executive summary is free, with the full report costing $5000. The word survey is possibly a misnomer, as the information gathered is self-selected by vendors submitting their own figures on company growth and revenue. The methodology used to arrive at the rankings is not explained, and in previous reports, firms have complained of seemingly arbitrary analysis and resultant ranking.”*

My specific comment has to do with the methodology. I’m not sure that self reported numbers from 107 vendors out of a market of more than 600 tells us much and “validation” by 29 law firms isn’t really meaningful without knowing something about those firms. I sent George Socha a private email asking about the demographics of those 29 firms, specifically how many were AmLaw 100 firms. I didn’t get a response but he did post the following comment to the Exectuive Summary of the 08 Survey on the EDD Update page later the very same day I sent him my email:

*“… we sometimes are asked whether our rankings are “statistically significant.” We do not attempt to address confidence levels, confidence intervals, significance levels and the like. We report the number of organizations from whom we obtained amounts of information that we deemed to be useful for our analysis.“*

Hmmmm, best to call it an analysis then and not a survey. And I’m not just picking on George and Tom here. Fios also published a report recently of a “survey” taken of 28 legal professionals from Fortune 500 corporations. How much illumination do we get from a .05% sample of a certain population? Becasue with a survey of a given data population rather than compiling data about the entire population, the survey usually studies a chosen subset of the population, called a sample. The data are then subjected to statistical analysis and IF the sample is representative of the population, then inferences and conclusions made from the sample can be extended to the population as a whole. A major problem lies in determining the extent to which the chosen sample is representative

When you have a small sample size, very small differences will be detected as significant. so you want a large sample to be very sure that the difference is real (i.e., it didn’t happen by fluke). So if you have 29 people responding the difference between 1 person saying CaseLogistix and 2 people saying CaseLogistix is so small as to be meaningless when the total number of possible respondents is 1 million (the number of attorneys in the US) But put in percentage terms for purposes of that “survey” the difference is enormous …. the CaseLogistix percentage would increase by 100% from .034% to .068%. If 5 people said CaseLogistix, their percentage becomes 17% but the increase in real numbers is still insignificant in terms of the total number of the target population.

Now if the population is changed by concentrating on say the AmLaw 100 and we know that each answer is from a different representative of that group then we can say we have a 29% answer rate from our target market and that number becomes more compelling. And if that number (5 out of 29) still seems to be too low a number to be representative of our target population, we need to get more respondents. Because that is really the question we are asking here…what sample set is significant? Using a calculator you can find at Creative Research Systems , it seems that 26 is the number you need for a population of 100 so then the 5 out of 29 becomes even more persuasive .

OK but I can see all the the technical sophisticates in the audience waving their hands, jumping up and yelling “oo oo” like Horshach on Welcome Back Kotter. (Yes Craig, I’m looking at you) OK, technically in statistics “significant” means probably true or not due to chance. A research finding may be statistically significant ( or likely to be true) without being important. When statisticians say a result is “highly significant” they mean it is very probably true. They do not (necessarily) mean it is highly important. Or as former Harvard President Lawrence Lowell once wrote, statistics,

*“like veal pies, are good if you know the person that made them, and are sure of the ingredients”.*Which brings us back to George and Tom, who we all know and trust

*.*As George also said in his post*,**“…**not everyone may realize the degree to which we must depend on self-reported information. … Much of the data provided to us, however, is not information that we can verify or refute. We have to depend on the integrity of the people and the organizations providing us with the information. At times, we feel it is necessary to give providers haircuts; we never, however, give them toupees.“*As I said above, hmmmmmmmmm. I trust George and Tom but I’m not sure about the guys with the toupees. And the guys who put on their Mardi Gras costumes before they answered the questions? You with the sneakers, out of the pool.

(for more on my own surveys of end users, see the next post. One of them is very significant statistically and one not so much, but hey, you can trust me. No really, the ponytail isn’t a weave)