Mike Smit .com
  MikeSmit.com Home      About Mike Smit      Mike Smit's Projects      Mike Smit's Resumé      Mike Smit's Photo Gallery      Mike Smit's Webcam      Mike Smit's Contact Information
 comment on this page - view comments 
This is a continuation of the March 19th post of the same name. While discusing TurnItIn.com with a trusted Dalhousie source, I learned that any student can email TurnItIn.com and ask to have anything they have submitted removed from the database. I haven't been able to find this documented anywhere on their website or on Dal's website, although the TurnItIn.com privacy pledge promises they will remove personal information if you ask them to.

Let me share with you my personal experience trying to get information removed from the TurnItIn.com database.

On December 3rd, 2005, I asked them to stop crawling my website and to remove my copyrighted content from their databases:
Good day,

I have two quick requests:

1. At http://www.turnitin.com/robot/crawlerinfo.html#excluding, you mention that I can request that you stop indexing domains that I control. Therefore:
Could you please stop (or refrain from) indexing the following sites, and all of their subdomains?
mikesmit.com
ivoteonline.com
votesonline.com

2. Could you please remove any cached or stored versions of pages or content drawn from those sites?

Thank you very much,

-Mike Smit
I wait for an astonishing two months before I decide to follow-up and remind them to stop indexing my site and to remove my content from their servers:
February 9, 2006

Good day,

I am still waiting for a reply to this message.

I have confirmed that my copyrighted information has not been removed from the TurnItIn.com database as I requested.

Could you please reply with an explanation or an estimated timeline?

Thanks,

-Mike
They respond the next day:
Hello Mike, I'm sorry for not getting back to you sooner. We had a service upgrade with some of our databases in December so I was not able to remove you from our crawl list at that time, but all the issues have been resolved now and I have removed all your links from our crawl list. I am very sorry for any inconvenience we may have caused you and assure you that you will not be be bothered by our crawler any more...

Tim
Within 5 minutes, I follow-up:
Thanks Tim,

I appreciate you removing me from the crawl list.

The second part was: "2. Could you please remove any cached or stored versions of pages or content drawn from those sites?"

All of my material is copyrighted, as per the notice on the bottom of each page.

Could you confirm that you pulled content from MikeSmit.com out of the database?

Thanks,

-Mike
A full month goes by, and I hear nothing. So I follow-up, still as polite as I can be.
Hi Tim,

I waited 31 days, which seemed like a more than reasonable length of
time for you to have complied with my request.

Could you please confirm that you have removed all content from
MikeSmit.com from your database?

Thanks,

-Mike
Later that day, I get an email from their "Sr. Director, WW Sales & Business Development".
Hello Mike,

Tim has indicated that you have some concerns about your copyrighted website material on our Turnitin.com database. I would be more than happy to discuss this will you, please let me know when a convenient time would be for you. In the meantime, you may wish to have a look at the follow article which is telling with regards to copyrighted content and web search: http://www.eff.org/deeplinks/archives/004344.php

I will be happy to discuss this issue with you and hope that we can resolve any issues without delay.

Best Regards,

Malik
The article he cites refers to a recent decision stating that Google does not have to pay a webmaster $2 million for the privilege of caching his website. Now, I benefit from being listed in Google - people come to my site and learn about the things that I care about. I get no such benefit from TurnItIn.com maintaining their own private cache of my site. And TurnItIn.com doesn't offer a no-cache option like the Googlebot does. There are a lot of other differences between that decision and my situation, some of which I go to the trouble of outlining for him:
Hi Malik,

Thanks for sending me that link.

First, Google allows webmasters to remove their URL from the search engine; the website will not be crawled, and no cached information will be used, if users follow the instructions at http://www.google.com/webmasters/remove.html. You can use their form, or a robots.txt file. I am not sure whether they are legally required to not cache my pages if I ask them to. But clearly it is either a legal requirement, or they, as good corporate citizens and courteous users of the Internet, allow me to remove cached content from their servers. Why don't you?

Second, the judgment you sent me refers multiple times to the fact the the plaintiff knew about Google, knew how to have his pages uncached, and specifically chose not to do so. In my case, the circumstances are very different - as soon as I learned about the TurnItIn Bot, I immediately asked that my pages no longer be crawled or cached. A few months and several follow-up emails later, you guys got around to it.

Third, web crawlers acquiescing to the wishes of webmasters, as expressed in emails or robots.txt files, is the industry standard. I've never encountered a legitimate business that refused to honour a politely worded request like mine. The only people who deviate from this norm are the spammers and link spammers who crawl the internet looking for email addresses and blogs to spam. iParadigms claims to be a legitimate business, so I expected that your policies and procedures on this issue would mirror those of other legitimate web crawlers.

Fourth, I notice that you haven't said "no" yet. Of course, you also didn't say yes. You mentioned that you want to 'discuss' the issue with me, though so far I seem to be doing most of the talking. :)

Fifth, I read on your website that if universities ask that material be removed from your servers, you will honour that request. Clearly the technical capability exists; I am not sure why this is an issue worthy of discussion.

> I will be happy to discuss this issue with you and hope that we can resolve > any issues without delay.

The ship has sailed on that one - I sent in my original request on December 3, 2005. We're at about the 100 day mark.

Thanks very much for your time,

-Mike
To his credit, he responds quickly, though he ignores the issue completely:
Hello Mike,
Thank you for your email. As per your request earlier this year, we have stopped crawling your website, and will continue to respect the request.
Regards,

Malik
My request had actually been submitted last year, but I decided to let that one slide. But he didn't actually address the issue he originally emailed me about, so:
Thanks Malik,

That is great news. Well, not news really, since Tim already let me know about this.

I couldn't help but notice that your email specifically avoided the question that you originally emailed me about. What about the information from my website which is currently stored on your servers? Will you follow the accepted standard practice and remove it at my request?

Thanks,

-Mike
On March 14th, they quietly conceded, saying:
Hello Mike,

We will work to have your data removed from our database by the end of next week. I hope that this is acceptable to you.

Regards,
Malik
I'll be following up in a few weeks to make sure they got it done, but what an adventure it was just to get to this point! 100 days, multiple emails, a lot of persistence - just to get them to do something that Google lets me do with the click of a button. Admin Link
 comment on this page - view comments 

A guest says:
[Jun 13th @ 08:12pm]

Just another point you may like to consider. My site was well ranked by Google last year until it had a visit from turnitinbot. Pages disappeared from Google almost immediately afterwards, almost putting me out of business. Coincidence? The bot hasn't been back since - until today that is - just after I'd made Google aware of a new batch of pages! Maybe it's not just the educational sector looking for plagiarism. Draw your own conclusions.


ht says:
[Apr 17th @ 03:47pm]

Hi, i tried the same, but nobody is answering me. I even send them the checksums of my files.
iqukmn honbiuj says:
[Jun 3rd @ 12:12pm]

zrbyhapvi kvwesu ickj zpnjdhwl usvhweqm vmdkieqbj ebmi
careybagsbon says:
[Nov 19th @ 09:58am]

Greetings to all.

Prompt the best online shop on sale of Books.

refresh - add comment - hide

© 2005, Mike Smit. Contents may be hazardous to your health and sanity; use at your own risk.