Jump to content

Wikipedia talk:WikiProject Vandalism studies/Study1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by JoeSmack (talk | contribs) at 17:53, 24 March 2007 (ALL DATA POINTS HAVE BEEN DOUBLE CHECKED!!!!!!: DONE!!!!!!!!!! yowzah!). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Finishing the project

I'm am thrilled that we have finished the last data point! I say we all go back and double-check the entries to make sure they are correct and then publish the results out to the whole community. YEAH!!!! Remember 06:04, 25 February 2007 (UTC)[reply]

I'm stoked too, good job to all. I'll have a little extra time after work tomorrow and i'll try to pretty the project pages up a little bit in case people feel warm enough to stay if they like the study. it is important that this all looks professional and orderly. once we feel ready to post the results (we'll have to field some questions i gather), i suggest we put them in these places:
  • counter-vandalism projects talk pages
  • statistic projects talk pages
  • community notice board
  • village pump (w/diff showing the first post suggesting this project's creation of Rem's)
  • Admin notice board (i think)
  • signpost (done last, after others have had some time to gauge reaction)
  • antispam project? (WT:WPSPAM)
  • people in the project's participant list for vandal studies
  • wikiEN-l listserv and wiki-research-l listserv.
i'd also like to get a proposed next study worked out in the wording of our results post, like - 'our initial study showed that X and Y from our sample. further work is needed to add more power to these results, and study 2 where we will Z is looking for more participants should you be interested.' in this way we can inform and turn interest into more user power. what do you say for study 2, more data points with similar criteria for vandalism definitions? JoeSmack Talk 05:15, 27 February 2007 (UTC)[reply]
I totally agree with Joe. We should check our results, make them look nice and presentable and draw up conclusions, and then present them to the community along with a note that we will be gathering volunteers for our next study. As for study number 2, I would be fine doing the same study in a different month with more data points or we could try the study that Jack suggested. We should probably make an area on the talk pages of the vandalism studies project to discuss this more formally. Remember 11:39, 27 February 2007 (UTC)[reply]
I added some general clean up to the study page. One thing i'm concerned about is using Linkspam as a vandalism category - it was added later after encountered part way through the study, so all edits should be double checked to be sure none more were paired to this category. Also, i stuff like this: [1]. We need to check our math, keep the number of significant figures uniform (2, 3, how many?) and remove all ???? from the maths part of this study. JoeSmack Talk 13:57, 28 February 2007 (UTC)[reply]
Hey guys, sorry for the late call.
Anyway, when did the linkspam category got added? Before which datapoint?
I'll be going through the math tonight.
One thing I think could come in handy, is list the types of articles and the sort of vandalized it gets. For instance, I came upon a video game article, that was vandalized around the time of the release. JackSparrow Ninja 16:32, 28 February 2007 (UTC)[reply]
It was added by Remember about halfway through here at data point 21. Up to that point it was only me, and I didn't encounter anything like that. I think we'd need more data before inferring anything about the types of articles that get vandalized. What I'd also like to do is make sure those definitions for the categories of vandalism were what you (both you and Remember) used/fits how you were marking vandalized edits. That part is very important too. JoeSmack Talk 17:02, 28 February 2007 (UTC)[reply]
I'm sorry I haven't done much about it lately. I haven't really had the time to spend longer times at one thing here on Wiki. I should be able to again later this week. Could someone perhaps update me? What's next to do? JackSparrow Ninja 19:14, 6 March 2007 (UTC)[reply]
Just start double-checking all of our data. I have double checked the first 20. Please fell free to check all of them again because I want to make sure we are right before we finalize our conclusions and send them out to everyone. Remember 19:48, 6 March 2007 (UTC)[reply]
Okido. You'll hear from me when I'm done. I'm just gonna do it one big batch. JackSparrow Ninja 18:26, 7 March 2007 (UTC)[reply]
Just a little update. It's taking me a little longer then expected, but I've worked through to Data Point 70 now. Seems good so far, with the updates from you guys. JackSparrow Ninja 03:46, 14 March 2007 (UTC)[reply]

significant figures

i'd propose keeping the math to 2 significant figures when showing percentages - we're yet to deal with a million billion data points, so anything more seems like overkill. hows that sound to you guys? JoeSmack Talk 17:20, 28 February 2007 (UTC)[reply]

Sounds good to me. Remember 17:37, 28 February 2007 (UTC)[reply]

we may need a numbers guy

i think we should request a consult/assistance from Wikipedia_talk:WikiProject_Mathematics before we start presenting this just to be safe. they have a pretty active group over there. JoeSmack Talk 17:38, 28 February 2007 (UTC)[reply]

I was thinking the same thing. Please go ahead and ask. Remember 17:39, 28 February 2007 (UTC)[reply]
Done. JoeSmack Talk 17:57, 28 February 2007 (UTC)[reply]
Hi! I read JoeSmack's message on the math talk page, so I came over here to take a look at what you're doing. You're certainly a dedicated bunch. Gathering the statistics by reviewing the edit histories one at a time must have taken a lot of effort. Anyway, here are my comments.
  • I didn't check all your arithmetic, but I did notice one arithmetical error, in the grand totals for all 100 articles. For November, 2005, 15/273 = 5.49%, not 4.49%.
Fixed, good eye! JoeSmack Talk 18:33, 1 March 2007 (UTC)[reply]
  • To do simple statistical analyses like the one you're after, spreadsheet software is appropriate. Has anyone entered the data into a spreadsheet program, like Excel, or Open Office? If not, I'll try to get it done in the next couple of days.
A fantastic idea, it would really cut down some of this error. I'm not especially adept to using Excel, if you have the time itd be great if you could. It'd also be a whole lot easier to see the data in a page or two as opposed to the many pages of Wikiformat. Good thinking. JoeSmack Talk 18:33, 1 March 2007 (UTC)[reply]
  • Using the "random article" button to select articles for analysis makes a lot of sense. Selecting only the month of November for analysis makes less sense. Ideally, you might want to generate a random integer 1 through 12 (using a pseudorandom number generator) for each article selected for analysis, then analyze the edits for that month for that particular article. The problem with the procedure you used is that it may have introduced an unintentional systematic bias. Human behavioral patterns vary with the seasons, so it may be that you got an exceptionally high reading (or an unusually low reading) because people are grumpy (or benevolent?) in November, on average. Not that it's a big deal. Call it an opportunity for improvement.
A confound I hadn't thought of; we originally made it one month i think so that we weren't being inaccurate is saying around 5% of edits in a month are vandalism - what if the month has 28 days and not 31? ack! It also adds another layer of complexity to the study, which isn't bad if its trade off would outweigh it in goodness. I've moved a lot of these comments to study 2's talk page for background for the next round, so maybe we can incorporate that there. JoeSmack Talk 18:33, 1 March 2007 (UTC)[reply]
  • There are quite a few 'bot authors on Wikipedia. The process of extracting the raw data, at least, might be automated somehow. For example, a 'bot might select the random articles, select a month at random, then extract only the edit history records you're interested in and dump the whole thing into one page somewhere, where you guys could study the data without doing so much data collection. Just a thought.
Very true, very true. I see this as a more further down the road solution, but it can be done. I know a few botworkers too, so I might be able to rustle up some help from Eagle 101, Betacommand, Heligoland or anyone else who hangs out in #wikipedia-spam-t where i spend a lot of time too. JoeSmack Talk 18:33, 1 March 2007 (UTC)[reply]
  • When you write your report, you might want to present the numbers two ways – both with and without the randomly selected articles that you discarded because no edits occurred in November. I'm only suggesting this because if you include that number of articles you can compute the likelihood that a randomly selected article is going to be edited in November, at least once in three years. OK, you'd probably want to dress it up a little and present it as the probability that a randomly selected article gets edited within one month. Anyway, it would just be good statistical practice to report how many articles you bypassed because there were no edits in November. It's part of full disclosure.
We included them at the top, so I think this can be managed without too much trouble. Good point. JoeSmack Talk 18:33, 1 March 2007 (UTC)[reply]
  • Your overall result (roughly 5% of all edits during the period studied attributed to vandalism) is fairly interesting. The distribution of times it took to revert the damage (30 instances, roughly 15 hours, on average) might be of more interest than the average itself. I'll collate those 30 data points, at least, and write something more about it in a little while.
Thanks for all your hard work! DavidCBryant 23:17, 28 February 2007 (UTC)[reply]
And thank you for yours! :D JoeSmack Talk 18:33, 1 March 2007 (UTC)[reply]
Hi! It's me, again. I've collated the data for the 30 (or is it 31?) instances of vandalism.
  • 12 articles were vandalized, out of 100. 6 were vandalized once, 3 were vandalized twice, 2 were vandalized three times, and 1 (#78) was vandalized twelve times.
  • Data point #78 is confusing. You say it was vandalized 6 times in November, 2006, but the events are labeled "#1", "#2 & #3", "#3 & #4 & #5", and "#6". So I'm not sure if it was six instances, or seven. Somebody might want to double-check this data point and fix the labels. Or report 31 instances of vandalism, whichever is right.
I don't think I did data point 70-79, Remember or Jack did either of you do this and remember what was up? JoeSmack Talk 18:33, 1 March 2007 (UTC)[reply]
  • As I suspected, the distribution of time elapsed before vandalism gets fixed is very interesting. I see 3 cases at 0 minutes (should be < 1 minute); 2 cases for each of 1, 4, 8, and 11 minutes; and 3 cases of 13 minutes. The rest of the times (sorted) look like this: 14, 18, 23, 29, 51, 104, 222, 452, 490, 895, 898, 1903, 2561, 4188, 6186, and 7991. The mean (average) is 491 minutes. But the median is 14 minutes -- half of the observed instances of vandalism were repaired within 14 minutes, and half took longer than 14 minutes. 80% of all the cases are "average" or better, and only 20% are "worse than average".
Hmm, interesting. We should include something about this in our summary report. JoeSmack Talk 18:33, 1 March 2007 (UTC)[reply]
  • Even more interesting is the distribution of repair times for articles that were vandalized more than once within a month. Those sequences look like this: 6816, 7991; 11, 11; 23, 4188; 1, 1, 0; 104, 222, 898, 895, 1903; 51, 13, 13, 8, 8, 0; and 452, 29, 4. I'm probably reading too much into this, but the sequences {1, 1, 0}, {51, 13, 13, 8, 8, 0}, and {452, 29, 4} are very suggestive. It's almost as if the defenders of Wikipedia are (sometimes) gathering around recent victims of vandalism and repairing the damage more quickly when an article is the subject of repeated attacks, sort of like the human body's immune response.
I like the picture you drew here with an immune system; and I agree, articles that get slammed with vandalism get babysat a bit after so reverting is often quicker. JoeSmack Talk 18:33, 1 March 2007 (UTC)[reply]
  • I have a technical question. You've divided the edits into two classes, "vandalism" and "not vandalism". I think three classes might be more appropriate: "vandalism", "revert vandalism", and "not related to vandalism". I think the distinction is meaningful, and probably not too hard to make. Anyway, I'm not sure how you counted reverts in your raw data, but maybe I didn't read the report closely enough.
What do you mean by 'revert vandalism'? Vandalism that is reverted later, or people who've vandalized once then revert the reverter? JoeSmack Talk 18:33, 1 March 2007 (UTC)[reply]
That's it for now. I've got to go shovel snow off the sidewalk! DavidCBryant 00:35, 1 March 2007 (UTC)[reply]
Good luck! I've got to catch a bus, CMummert, I'll comment on yours a bit later today. Thanks David for all the hard work, and I invite you to look on over study 2 as well! :) JoeSmack Talk 18:33, 1 March 2007 (UTC)[reply]

comment by CMumert

The study is very interesting and informative, but the small sample size (100) makes the final numbers subject to a large margin of error. With the sample of 100, the estimate of 5% of edits being vandalism has a margin of error of about 4% at 95% confidence; so I conclude from your numbers that there is a very high chance the real vandalism rate is less than 9%. In order to have a 2% margin of error with 95% confidence, if the real percentage of vandalism edits is 5%, you need to sample about 475 articles. Fortunately, the total number of WP articles doesn't matter, only the number that you sample.

A second, more interesting, problem is that you are measuring the average percentage of edits per article that are vandalism. But there is another statistic that is equally valid: the average percentage of total WP edits that are vandalism. To see the difference, think about the extreme case where only one article on WP is ever vandalized, but it received 100% vandalism edits. Then your survey would show 0% average vandalism unless you were lucky enough to find that one article with the random article button. To measure overall averages, you would need to take a random sample of 1000 edits (I have no good idea how to do that without using a database dump) and determine how many of them are vandalism.

Nevertheless, your survey is very interesting. No study is ever perfect, and you seem to be planning more. This sort of work is the only way to dispel the random speculation about vandalism. CMummert · talk 03:00, 1 March 2007 (UTC)[reply]

Some good points. Margin of error and significance is definitely something I want to add to this study or the next; in reality this is a statistics domain and should be treated like it. I know when we started it seemed like it was drive to get the data, I was thinking people could play around with it later. I'm glad you're thinking of more ways to interpret this stuff, and the limitations it has as well.
You're right about sample size; it needs to get bigger. One thing to note is that if this study has a sturdy enough procedure, we can use our old data in the new study, starting off with 100 data points. We'll see how things unfold over at study 2, this may very well be the case. More is better, true true true.
I see the distinction you make between an article's edits being vandalism and wikipedia's edits being vandalism. One thing is, vandalism doesn't just happen on mainspace: it happens on talk pages, user pages, wikipedia pages - you name it. People seem to care the most about mainspace because it is where people go to get the straight dope, and finding crap there instead is worrisome when it comes to credibility, reliability, etc. The distinction is also prevalent in a sense that some articles are vandalism more than others. AIDS for instance, which I helped foster over the years and probably one of two wikipedia articles that got me started on wikipedia at all, receives a LOT of vandalism, probably more than the average article. I know JackSparrow got a taste of this when he hit data point 99, a Zelda videogame (The_Legend_of_Zelda:_The_Minish_Cap) that had a near by release date. Do these data points reflect an artificially inflated amount of vandalism? Do we throw up our hands and hope enough randomization accounts for the occasional frequent target article?
How do you guys feel about this, is there a better way to approach this than how it is currently being done? Is there a way that study 2 can be conducted that combats this in any way? JoeSmack Talk 03:31, 2 March 2007 (UTC)[reply]
Oh, and another thing; what if we hit an article with semi-protection/protection? Do we ignore it? JoeSmack Talk 06:12, 2 March 2007 (UTC)[reply]

Begun double checking

I have begun to double check all of our entries for the study. I will write here my progress and any issues I find. Remember 18:05, 28 February 2007 (UTC)[reply]

Found mistake for data point 5 Crescent City Connection, it says that it had 3 edits in 2005 but the history shows that it had none. Unfortunately this was an early mistake that will affect all our math. I will correct it later unless someone wants to do it sooner. Remember 18:21, 28 February 2007 (UTC)[reply]
Phew, corrected all the effected data. It HAD to be an early data point! ;) I also found in this [2] that for the final total somehow the number got inflated, for both the 2005 total and the final total. As painstaking as it is, the numbers need to be double checked in in both cumulative results sums and the percentages (which should be shaved down to 2 significant figures if they aren't already, or bumped up too I suppose). JoeSmack Talk 19:16, 28 February 2007 (UTC)[reply]

I have now double checked all of our data and calculations for Data points 1-25. Slowly but surely we will finish this thing. Remember 03:29, 2 March 2007 (UTC) : Mistake number two and three, Anthology (Bruce Dickinson DVD) and Gillian Spencer have edits in November. I guess we should add this as point 101 and 102. Remember 03:45, 2 March 2007 (UTC)[reply]

I presume this was from the list of articles with no november edits? if they do actually, it'd be remiss not to add them otherwise our study wouldn't be random (we'd be discriminating some articles). When need to find a way to increase reliability, I'm starting to have second thoughts about spreading results from study 1 when it has these kind of speed bumps that can be flattened in study 2. JoeSmack Talk 04:09, 2 March 2007 (UTC)[reply]
I'm still for publishing our data once we have double-checked everything just to raise awareness so we can conduct study 2 quicker and with more participants. Remember 12:53, 2 March 2007 (UTC)[reply]
Ok, than lets do this; if you're in i'm in. Still, you might want to attach this little disclaimer for now: "Measuring is easy. What's hard is knowing what it is you're measuring. This is a preliminary study, if you have any suggestions for improvement please join us for study 2." JoeSmack Talk 19:27, 3 March 2007 (UTC)[reply]
Whoops, I spoke too soon. Those articles (Anthology (Bruce Dickinson DVD) and Gillian Spencer ) were incorrectly listed as not having data points but they were also listed as being the 23 and 27 data point so I have just taken them out of the early section and it shouldn't affect our math. Remember 18:35, 7 March 2007 (UTC)[reply]

Administrator help

We may want to enlist the aid of an administrator who can help us lock the article before this study goes public because I have a feeling that some people would enjoy the irony of vandalizing a project that studies vandalism. Remember 18:06, 28 February 2007 (UTC)[reply]

If this becomes a serious issue (I'll keep an eye out when the time comes), I'll take care of getting it semi-protected if need be. JoeSmack Talk 03:33, 2 March 2007 (UTC)[reply]

Current status

I've now double checked everything for the first 50 edits. I found two new errors. The first is with Kyle Vanden Bosch. It should be four edits in November 2005 instead of 3. The second is Jean Théodore Delacour, there is 2 edits in 2005 and not 1. I need to change these entries and double check all the math up to this point. Any help double checking is welcome. Remember 16:46, 10 March 2007 (UTC)[reply]

I'll give a hand this weekend, sure thing. JoeSmack Talk 19:20, 10 March 2007 (UTC)[reply]
I have added the new data to Kyle and Jean data points, but I have now found a larger error on Shadow of the Colossus data point. For November 2006 it says there are only 4 edits but there are actually 20! I'll keep double checking this stuff but we need to get this data accurate before our study can mean anything. Remember 22:28, 10 March 2007 (UTC).[reply]

:::I have now gone through all data points for 1-60, but I haven't rechecked the math. Remember 22:52, 10 March 2007 (UTC) I now have gone through 1-70, I found some more errors and corrected them. Now all the data points from 1-70 should be correct but all the math is wrong. If someone wants to help they can verify the last 30 data points or redo the math for all the sections. Remember 23:17, 10 March 2007 (UTC) I have now gone through 75 points. Remember 20:39, 13 March 2007 (UTC)[reply]

Alright, i've gotten all the math from 1-70. Yowzah, study 2 needs to use a spreadsheet instead of this manual method. One question I have is Data Point 68 here has this marked as vandalism (no category on the data point given, in the tallies noted as 'obvious vandalism'). To be fair, I have no idea if this was vandalism or not (I don't speak/read arabic). But, also to be fair, the IP's contrib in question was reverted by a user, and the IP has a history of vandalism. How should we go about this? JoeSmack Talk 00:35, 12 March 2007 (UTC)[reply]
Combining the revert and the ip's history, it seems vandalism. But perhaps we should ask who reverted it. He probably knows why, and thus if it was vandalism. JackSparrow Ninja 00:53, 12 March 2007 (UTC)[reply]
I was the one who originally categorized this as vandalism. I put a message asking the person why he reverted this lanugage but I have got no response. I also tried do a google search for the arabic that was put in the place of the original text and it did not seem to come up with the right person once I translated the page, but I will admit that this is a very crude method. We should find someone that reads arabic. Remember 11:44, 12 March 2007 (UTC)[reply]

I have now double-checked the first 90 data points. Only 10 more to go!!!! Remember 20:50, 19 March 2007 (UTC)[reply]

ALL DATA POINTS HAVE BEEN DOUBLE CHECKED!!!!!!

Now all we need to do is check all of the calculations for the page and write the conclusions! We are almost finished. Please help out if you can. Remember 22:56, 19 March 2007 (UTC)[reply]

I have recalculated up to data point 80. I've gotta end it for today, so should anyone want to continue before I get back, here are the changes that have to be furtherly made from 81 onwards:

Edits in 2004 -> +2
Edits in 2006 -> -2
Vandalism edits in 2006 -> +1
Total vandalism edits -> +1
Time before reverting, total -> 4615 4599
Reverts by anonymous editors -> +2

JackSparrow Ninja 07:18, 21 March 2007 (UTC)[reply]

I'm seeing systemic math whoopsies in the reverting numbers/averages. We're going to have to go through this once more after this, they keep on popping up and we want to do this right. I'm still going, but after I'm done with 80-100 we'll have checked the math twice. JoeSmack Talk 16:46, 24 March 2007 (UTC)[reply]
I've finished checking the math. I found it kind of funny there were exactly 666 edits, good to end on a laugh. It just needs to be triple checked (including making sure numbers are rounded correctly, like anything that is 6.365 on a calculator looks like 6.37 on the wiki and turning 3.4's into 3.40's. Also make sure all the numbers add up to their totals) for surface and aesthetic math. The light at the end of the tunnel is near! :D JoeSmack Talk 17:53, 24 March 2007 (UTC)[reply]