#darkArkEmbark - Data-Mining the online darkcoin community

georgem

Active Member
Jul 10, 2014
82
110
93
Today I had the idea of crawling the BCT darkcoin thread and extracting the post count of all users that ever posted on all 3133 pages.

This list is the final result of this work: A top 50 list and further down you can find a textfile listing username and post count of all 2623 users that ever posted on the BCT darkcoin thread.

(Since all data I use is freely available anyway, I will refine and post any RAW data I aquire thru crawling and robots, so that other people can use it for their own research or entertainment)



Full textfile with all 2623 users, here: http://textuploader.com/orbb

If this has any value for you, and if you like to see more, consider a donation: XtAdMy5nSxArut6rKKLQXxm84rpJNipXib
 
Last edited by a moderator:
  • Like
Reactions: fernando and tungfa

tungfa

Grizzled Member
Foundation Member
Masternode Owner/Operator
Apr 9, 2014
8,900
6,740
1,283
Great research !
Tx for that
(and sure honored to be on that list)
 
  • Like
Reactions: georgem

vertoe

Three of Nine
Mar 28, 2014
2,573
1,652
1,283
Unimatrix Zero One
Not understanding the value, my postcount is not 171...? Neither here nor on BCT.

Name: vertoe
Posts: 611
Activity: 252
Position: Sr. Member
 
  • Like
Reactions: Roslyn

fernando

Powered by Dash
Dash Core Team
Moderator
Foundation Member
May 9, 2014
1,527
2,058
283
Not understanding the value, my postcount is not 171...? Neither here nor on BCT.

Name: vertoe
Posts: 611
Activity: 252
Position: Sr. Member
It is just the not deleted posts at the Darkcoin ANN thread in bitcointalk.org.
 
  • Like
Reactions: georgem

fernando

Powered by Dash
Dash Core Team
Moderator
Foundation Member
May 9, 2014
1,527
2,058
283
Purely out of curiosity, how are you crawling the thread? Never done it.
Me neither and I would love to know. I want a way to get the bitcointalk thread to consume outside of the forum (sometimes I don't visit for a couple of days and catching up there is slow so I end up losing a lot of conversations). I made some experiments with rss and yahoo pipes but didn't get anything useful.
 
  • Like
Reactions: georgem

georgem

Active Member
Jul 10, 2014
82
110
93
for me, that was just a fun sunday evening experiment.
A first test to see how can I reasonably extract such data if I wanted to.
The ultimate goal is not to create such top 50 poster lists... the amount of posts says nothing about quality or value, so don't read that wrong please. lol, already I get mails from people who complain why they aren't on the list etc...please stop.

You know what the ultimate goal of this data mining will be?
Visualisations, time lapse visualisations, of whatever community platforms: forums, githubs, collaborations of any kind.

A guy on youtube created a visualisation sometime ago that completely blew my mind. It's what inspired me, and what I ultimately want to create:



another similar visualisation:
 

georgem

Active Member
Jul 10, 2014
82
110
93
The possibilities are basically endless.

Showing time lapse animations of how people interact with each other, how active they are, who quotes who, how many times, and all in a funny animated way with cool music.

This could also become useful to study troll behaviour.

It's going to be fun, I promise.

I will keep posting iterations of the progress of my work here.
Next step is to show the ANN Darkcoin post amount per user as a time lapse animation. :)
 
Last edited by a moderator:

georgem

Active Member
Jul 10, 2014
82
110
93
Purely out of curiosity, how are you crawling the thread? Never done it.
1) Download and save the html-sourcode for every single page. (this can be done with PHP or pretty much every programming language that can access the internet)
It is important to follow forum rules. For example some forums will temporarily ban your IP if you constantly refresh/load a page more than once a second.
Therefor I let my script sleep for 1 second after every page download.

2) Analyze the html code, look for patterns and similarities you can extract (that's the hard part)

3) use the extracted data to create visualization
 
  • Like
Reactions: fernando

georgem

Active Member
Jul 10, 2014
82
110
93
Continuing my darkcoin community research...
...using the forum extraction tool I am developing...
...here is how the title of this [ANN] thread changed over time, starting from day 1: January 18, 2014 (When darkcoin wasn't even called darkcoin yet... ;D) until today.



If this has any value for you, and if you like to see more, consider a donation: XtAdMy5nSxArut6rKKLQXxm84rpJNipXib

What is the next thing you want to see?

1) Do you want to know who of all the darkcoin users swears the most (using words like f**k etc..) then vote by donating any amount to this address:
Xwz1utupG5LqWMsCeZqF3vo9xtiA7FXcjH

2) If you would like to see a word cloud of all the comments evan duffield ever made, vote by donating to this address:
Xn9b76yUZ5zGAcMvFv41ZSYn9XHMJpX833

(I will eventually fulfill both tasks if they reach 5 DRK each, otherwise just the one that has more DRK)

Have fun.
 
Last edited by a moderator:

georgem

Active Member
Jul 10, 2014
82
110
93
Maybe I have to explain how I extract those past ANN Thread titles, because there isn't really a database storing them, so I pretty much have to make "educated guesses".
What I do is, I read every post of every user, and since the ANN Thread title of the day becomes also the title of each post, I can then derive based on how many users used a specific ANN Title, what the ANN title used to be.
The problem is, users can edit the post title, and every user who quotes this user will also aquire the edited thread title.
So I tried to filter out all the obvious crap, and all the ANN titles in my list have been posted/quoted by atleast 50-100 people each time, but still, especially the ANN title that you mentioned could be a fake one.
(Meaning it was never put officialy on the ANN thread title by the admin)

EDIT: best thing to do is probably provide the RAW data I aquired. Then people can make their own assessment.
The list works like this, first number is amount of posts that have this ANN title, then the first time this title appeared in a post, then the title itself.
There are some hilarious one liners in there by obvious trolls, lol:

http://textuploader.com/okaa