Home | Networks | Community | Need Help? 

 
 Quick search

 
 
 RegisterRegister   Log inLog in 

More IRC spies
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    SearchIRC Forum Index -> The Future of IRC
Author Message
Mary
SearchIRC Admin
SearchIRC Admin


Joined: 03 May 2003
Posts: 686

PostPosted: Nov 13, 2003 12:06am    Post subject: More IRC spies Reply with quote

I don't know if this has anything to do with Google, but its certainly ominous for IRC.

http://cosco.hiit.fi/irchiver/FAQ/

It is a FAQ for a project called Irchiver.

The company, CoSCo of Finland, researches and develops information retrieval systems. They feel discussions in IRC channels have scientific value.

Irchiver does not ask permission to join networks or log channel discussions, and they do not give channel owners notice of their presence. They feel that since IRC is public, not encrypted, and logging is a standard feature on many IRC clients, that it is "fair" for them to join channels and log discussions without the ops knowledge or permission.

Logs will be published, conversations will be studied, and data made available on the web.

The FAQ says, the success of this project is ultimately up to IRC administrators, and "Users should be informed e.g. in MOTD about general logging policy."

I found this particularly interesting:

Quote:
1: Your bots invaded Freenode Oct 30th 2003. Why didn't you ask us before doing it?

You wouldn't have listened (: To be honest, we sent an email to Freenode administration a day before and informed them about the project. We may have acted a bit hastily and we could have waited longer, but still we're happy that our project received much attention after this happening.


I am sure it did. Let me help them along by letting the greater IRC community know users may find their nicknames and comments logged in order to be studied for trends and then published on the web. IRC networks may want to set up a welcoming committee for the good folks at Irchiver.

Irchiver claims that the bots will come from one IP, which will be easy for channels to ban, however they do not tell us the IP so networks can ban them PRIOR to a visit to their channels.

The Coordinator for CoSCo is:
Petri Myllymäki
email: Petri.Myllymaki@hiit.fi
tel: +358 50 3841523
Back to top
Mary
SearchIRC Admin
SearchIRC Admin


Joined: 03 May 2003
Posts: 686

PostPosted: Nov 13, 2003 9:47am    Post subject: More info.... Reply with quote

A little bit more research this morning shows that CoSCo's Irchiver joined bots to ALL of Freenode's channels at ONE time on October 30th. 10% of the network's channels noticed the bots and banned them, and apparently this was reported to the network, which glined the bots about 10 minutes later. The leader of the Irchiver project then joined a channel with lilo to discuss their project with all interested parties. The website states that they have since joined IRCnet, but did not log channels there.

I visited Freenode last night to see if the network was cooperating with the Irchiver company or not. Although some staff members were aware Irchiver had visited the network, they did not know details.

CoSCo claims to be both "academic" and "scientific", however they do not show any affiliation with an institution of higher learning, rather the speaker on Freenode referred listeners to a website that shows other CoSCo projects which have commercial advertising applications.

They claim their search engine will be open source, which raises concerns that other interested parties will now be launching bots into IRC channels to log user information and channel text as well. One can only speculate that Google's recent visits to networks are an experiment to see if there is value in a similar project for their own search engine.

The bots on Freenode came from irchive.it.hiit.fi
Back to top
Asmo
none
none


Joined: 06 May 2003
Posts: 28

PostPosted: Nov 13, 2003 1:40pm    Post subject: Reply with quote

Hmm, interesting info Razz
Back to top
moonman
Lurker
Lurker


Joined: 15 Jul 2003
Posts: 212

PostPosted: Nov 13, 2003 2:36pm    Post subject: Reply with quote

honestly, i don't see a problem. irc is an entirely open medium by nature. if an individual channel wants to keep information out of the public eye, they would do best to set the appropriate modes to keep prying eyes away. that's why networks give us +s/p/i/etc.
Back to top
Mary
SearchIRC Admin
SearchIRC Admin


Joined: 03 May 2003
Posts: 686

PostPosted: Nov 13, 2003 3:42pm    Post subject: Reply with quote

Well... SearchIRC already has bots on all of the networks, we catalogue all the channels, we have a search engine that filters and can mine IRC data (which is way different from html) from info gathered from each and every one of those channels, we have contacts with IRC administrators, and I like to feel we have a good reputation for doing what is right for the IRC community. We could have something like Irchiver is TRYING to do, all set up by midnight.

:P

Trust me, the people on big networks don't need me to tell them why this is a Bad Idea. There have been numerous well funded companies who have tried the same thing as Irchiver and Google. They have all failed - for one big reason. The people who create these projects know about IRC, but they don't know IRC at all.
Back to top
Mary
SearchIRC Admin
SearchIRC Admin


Joined: 03 May 2003
Posts: 686

PostPosted: Nov 14, 2003 8:45am    Post subject: Asmo's article Reply with quote

Asmo did an excellent job of reporting on Irchiver in today's IRC-Junkie. He wrote directly to CoSCo, the company that is developing Irchiver, and found out that this project is not purely academic nor scientific as claimed (as we suspected), but totally commercial in nature - that is, bots are sent into IRC channels to to see what everyone is talking about, in order to track trends that inform targeted advertising. (Translation for the marketing impaired: they want to watch us so the spambots will know exactly which nicks to message about that porn url.)

http://www.irc-junkie.org/index.php#newsitem1068809748,58828,
Back to top
Ville H. Tuulos
Guest





PostPosted: Nov 18, 2003 5:12pm    Post subject: Irchiver project Reply with quote

As the main person responsible for the Irchiver project, I feel obliged to correct few misunderstanding here:

1) CoSCo, Complex systems computation group, is a research group in the Department of Computer Science at University of Helsinki. Currently we work as a part of HIIT, Helsinki Institute for Information Technology which is a joint institute by Unv. of Helsinki and Helsinki University of Technology. All this information can be found on our web page.

All our work has a scientific motivation, yet it might (and hopefully will) have use also outside the academic community. We aren't in any means commercial.

With respect to Irchiver, we are especially interested to study IRC as an example of a highly dynamical environment, like a time series, in which we can utilize our statistical models.
That will be part of my Master's thesis.

2) We have no motivation nor intention to publish or collect logs in the long run. We are not going to compete with SearchIRC.com, which IMO is a great service, thank you for that. We want to collect a test data set to study feasibility of the idea w.r.t our models.

We will publish our code in open source to benefit the whole IRC community. As a proof-of-concept we hope that we can provide a public web demo which probably will work with a static data set. However that would require us to cooperate with some IRC network so that we can get appropriate permissions to log and show the data in public. Of course we won't do it without permission.

I'm happy to answer any further questions,

Ville H. Tuulos / Complex systems computation group
tuulos@cs.helsinki.fi
Back to top
Guest






PostPosted: Nov 18, 2003 6:36pm    Post subject: Reply with quote

Let me get this straight. You want to load massive amounts of bots onto IRC networks, and log channel discussions, for your Master's thesis?
Back to top
dusk
none
none


Joined: 18 Nov 2003
Posts: 3

PostPosted: Nov 19, 2003 10:42pm    Post subject: Reply with quote

Points for originality. Smile
Back to top
Asmo
none
none


Joined: 06 May 2003
Posts: 28

PostPosted: Nov 20, 2003 1:15pm    Post subject: Re: Irchiver project Reply with quote

Ville H. Tuulos wrote:
I'm happy to answer any further questions,


I'm still waiting for some questions I sent you...

I know the tone in my post on irc-junkie.org was pretty negative, but that is only becuase in your FAQ you showed great ignorance towards the users of the channels you are logging, and you are obviously keeping information behind when offering information on which users need to base their descision to allow the bot to be there or not.

Ville H. Tuulos wrote:
We have no motivation nor intention to publish or collect logs in the long run.


See what I mean. AT one point you say your going to make a search engine, where search results will be accompanied with a few example lines of the text from the channel (for more such claims look at the log you provide), and the other moment you say your never intended to make logs available.

And this is not the first time I notice your story has holes, and half truth's.

Maybe its time you get your story togheter.
Back to top
Guest






PostPosted: Nov 20, 2003 1:25pm    Post subject: Reply with quote

I welcome our new targeted ad IRC spamlords.

I like the idea of robots monitoring my IRC conversations to help a company develop targeted ads (for educational use and hopefully commercial use!)
Back to top
Guest






PostPosted: Nov 20, 2003 1:30pm    Post subject: Reply with quote

Don't be so harsh.

The notion that the software will "eventually" become open source means the project can do no wrong. After all, with the source code anyone who wants to can launch robots like these without having any clue what they're doing. That is a good thing (tm)
Back to top
Guest






PostPosted: Nov 20, 2003 1:40pm    Post subject: Reply with quote

We all know anyone can listen in to a cell phone conversation.

Since cellphone users shouldn't Expect privacy, perhaps our new overlords can be convinced to switch to publishing phone conversations on the web, complete with the name and phone numbers of the participants, so telemarketers can better target their advertisements.
Back to top
Ville H. Tuulos
Guest





PostPosted: Nov 20, 2003 8:18pm    Post subject: Reply with quote

Ok, my bad. Maybe I've not been clear enough. In my point of view there're two different things:

First: Developing a search *engine*. That means the code, the kernel, for a search engine. I understand that for most of the people "a search engine" means a web page. But if you try to understand search engines in a technical or scientific point of view, you see that *developing* and *researcing* theory and code is the thing we're interesting in. Not doing some web pages. We happily leave that business for those who know it better than us. You may want to read our FAQ again having this in mind.

However, as any other open source / research project we might want to show and demonstrate our results in public. But that's not our main intent.

Asmo:
Quote:
See what I mean. AT one point you say your going to make a search engine, where search results will be accompanied with a few example lines of the text from the channel (for more such claims look at the log you provide), and the other moment you say your never intended to make logs available.


See what I mean? We ARE doing a search engine which CAN show example lines from the channel - we provide the CODE which can do that. But we have no intention to publish the logs we're collecting. We only need them to DEVELOP the engine. Publising them would definitely infringe people's privacy. I know, I should have made this far more clear from the beginning.

We hope that our project will eventually produce a useful tool which then can be used by interested communities etc. It's up to them how do they collect data, respect privacy etc. We have no ultimate solution for that. We'll have to answer to the same questions if we were to make a public demo. We would have to make proper arrangements with an IRC network, inform the users beforehand etc.

What it comes to legitimacy of collecting a test data set, like we're doing now, is addressed in our FAQ section "Privacy policy". We feel that as we're handling the logs confidentially to develop the search engine and we're doing it openly and we let everyone to ban and kick the bot easily by using a single IP, we're not ruining the whole IRC. Especially as we're not going to do this forever.

Quote:
After all, with the source code anyone who wants to can launch robots like these without having any clue what they're doing. That is a good thing (tm)


You're saying that this isn't possible already? Our bots are like 200 lines of Python. I guess that I'd be laughed at if I said that I'm afraid to publish my code since someone could do the same thing as I did. After all, I'm not that special coder. There wouldn't be much science if we wanted to ban anything which has even a slightest possibility to be misused.

Asmo:
Quote:
you are obviously keeping information behind when offering information on which users need to base their descision to allow the bot to be there or not.


This seems to be always a problem when doing research. You begin with waving hands. It wouldn't be called research if we knew what we are doing. We had this simple idea: We have some statistical models and other infrastructure for handling large amounts of natural language. Static web pages are our main target, but we saw great potential in dynamic environments, like chats to extend our models. But for that we needed some real life data.

There's not much else I can say now. I don't simply know yet how gigabytes of IRC discussions behave in our statistical treatments. All I can say is that I truly hope we can come up with something useful for the community and I truly hope that the community could trust us and allow us to collect the initial test data set we need.

I hope that we could continue this discussion on #searchengine@freenode.net,

Ville H. Tuulos / Complex systems computation group

PS. Yes, I'm going to do my Master's thesis about this.
Back to top
Asmo
none
none


Joined: 06 May 2003
Posts: 28

PostPosted: Nov 20, 2003 11:11pm    Post subject: Reply with quote

I was unable to find back the log of the chat you had on Freenode Mr. Tuulos. Care to provide a link again to that?

It is also interesting to see some parts of your FAQ have changed. Yes in some parts you added a date to it, but in others you did not...


Last edited by Asmo on Nov 20, 2003 11:14pm; edited 1 time in total
Back to top
Display posts from previous:   
Post new topic   Reply to topic    SearchIRC Forum Index -> The Future of IRC All times are GMT - 6 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
 
Forum powered by phpBB
 
 © 2000 - 2008 EverythingIRC, Inc. All rights reserved. Please read our disclaimer