Ohio Bronies - Forums

Peanut Bucker is best pony.

You are not logged in.

#1 2013-10-06 15:17:45

Star ★
Pony
Starshine Trotter

Skype stats

I spent the morning hacking this together and overanalyzing the output. This takes a Skype log file and dumps out a bunch of bar graphs and stuff.

import re

from datetime import datetime
from collections import defaultdict
from operator import itemgetter


class cfg():

    # if True, non-ASCII characters in names are recoded with XML entities
    # (makes interactive testing a bit easier)
    recode = True

    # The data file, obviously.
    log_filename = 'skype-chat.log'

    # This file lists usernames that should be considered the same.
    # List one name per line, with a blank line between different users.
    # Each name after the first will be folded in with the first name.
    # If None, don't bother renaming anyone. (I don't suggest this, since
    # Skype is extensively derpy about names. You'll probably need to run
    # this script a number of times while messing with this file, too...)
    user_renames_filename = 'skype-rename.dat'

    # Output files - if any of these are None, the file is skipped.

    # Message count per person, sorted in decreasing order.
    per_user_totals_filename = 'stats-totals.txt'

    # Stats for users who write more than average.
    active_user_stats_filename = 'stats-top.txt'

    # Stats for all users, individually.
    all_user_stats_filename = 'stats-all.txt'

    # Stats for every user combined.
    global_stats_filename = 'stats-global.txt'

    # How wide to make the output, of course.
    output_width = 128

    # Regex for scraping timestamp / username data.
    # Fields:
    #   yyyy - Year (must be four digits)
    #   mm - Month
    #   dd - Day
    #   hh - Hour
    #   nn - Minute (Note: not 'mm' - that's month!)
    #   ss - Second
    #   ap - AM/PM indicator (exclude if timestamps are in 24 hour format)
    #   username - The username, obviously.
    # Omitted numeric fields are interpreted as zero (e.g. seconds)
    # Lines not matching this pattern will be silently ignored.
    message_pattern = re.compile(r'''
        ^\[ (?P<mm> \d\d?) / (?P<dd> \d\d?) / (?P<yyyy> \d\d\d\d)
            \s (?P<hh> \d\d?) : (?P<nn> \d\d) : (?P<ss> \d\d) \s (?P<ap> [AP]M)
            (?: \s \| \s Edited \s \d\d?:\d\d:\d\d \s [AP]M ) ?
        \] \s (?P<username> [^:]*):
    ''', re.VERBOSE)


# -----------------------------------------------------------------------------
# No further user-servicable parts!
# -----------------------------------------------------------------------------


rename = {}

if cfg.recode:
    recode = lambda s: s.encode('ascii', 'xmlcharrefreplace').decode('ascii')
else:
    recode = lambda s: s

if cfg.user_renames_filename:
    with open(cfg.user_renames_filename, encoding='utf8') as rename_data:
        rename_from = rename_to = None
        for name in rename_data:
            name = recode(name)
            name = name.rstrip('\r\n')
            if not name:
                rename_to = None
            elif not rename_to:
                rename_to = name
            else:
                rename_from = name
                rename[rename_from] = rename_to


users = defaultdict(list)

with open(cfg.log_filename, encoding='utf8') as logfile:
    for line in logfile:
        line = recode(line)
        m = cfg.message_pattern.match(line)
        if not m:
            continue
        md = m.groupdict()
        yyyy, mm, dd, hh, nn, ss = [int(md.get(f, '0'))
                                    for f in 'yyyy mm dd hh nn ss'.split()]
        ap = md.get('ap', '').lower()[:1]
        username = md['username']
        if ap == 'a' and hh == 12:
            hh = 0
        elif ap == 'p' and hh != 12:
            hh += 12
        timestamp = datetime(yyyy, mm, dd, hh, nn, ss)
        username = rename.get(username, username)
        users[username].append(timestamp)


per_hour = defaultdict(lambda: [0] * 24)
per_day = defaultdict(lambda: [0] * 7)

for username, timestamps in users.items():
    for timestamp in timestamps:
        per_hour[username][timestamp.hour] += 1
        per_day[username][timestamp.weekday()] += 1
        # username None collects stats for all users
        per_hour[None][timestamp.hour] += 1
        per_day[None][timestamp.weekday()] += 1


def user_stats(username):
    hours = per_hour[username]
    days = per_day[username]

    out = []

    # subtract for other stuff on the line, plus one more char for eol
    width = cfg.output_width - 18 - 1

    bars = lambda n, maxn: ('|' * int(n * width / maxn) if maxn else '')

    out.append('=== %s ===' % ('All Users' if username is None else username))
    out.append('Total messages: %d' % sum(days))
    out.append('Per hour:')
    for hour, count in enumerate(hours):
        out.append('\t%02d  %5d %s' % (hour, count, bars(count, max(hours))))
    out.append('Per day:')
    for day, count in enumerate(days):
        out.append('\t%3s %5d %s' % (
            ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'][day],
            count, bars(count, max(days))))
    return '\n'.join(out)


def write_stats_files():
    v = [sum(v) for k, v in per_day.items() if k is not None]
    average_per_user = sum(v) / len(v)

    if cfg.per_user_totals_filename:
        with open(cfg.per_user_totals_filename, 'w', encoding='utf8') as out:
            for username, data in sorted(users.items(),
                                         key=lambda x: -len(x[1])):
                print('%d %s' % (len(data), username), file=out)

    if cfg.active_user_stats_filename or cfg.all_user_stats_filename:
        with \
                open(cfg.active_user_stats_filename or os.devnull,
                     'w', encoding='utf8') as out_top, \
                open(cfg.all_user_stats_filename or os.devnull,
                     'w', encoding='utf8') as out_all:
            for username, data in sorted(users.items()):
                if len(data) > average_per_user:
                    print(user_stats(username), '\n', file=out_top)
                print(user_stats(username), '\n', file=out_all)

    if cfg.global_stats_filename:
        with open(cfg.global_stats_filename, 'w', encoding='utf8') as out:
            print('Average messages per user: %d\n' % average_per_user,
                  file=out)
            print(user_stats(None), file=out)

write_stats_files()

In case you missed the link in the chat, here's the stats for everyone combined for the past three months.

Offline

#2 2013-10-06 21:36:51

Mkanke
Peanut Buckering ALL the things!
Mkanke Trotter

Re: Skype stats

We post a lot :I


yZVDO6p.gifD4LwTWC.gifulgdssl.gifLi4qREa.gifUyOaK2h.gif

Offline

#3 2013-12-23 03:02:07

Shams
Celestial Master (Admin)

Re: Skype stats

Nerd.

Offline

#4 2013-12-23 03:35:18

Star ★
Pony
Starshine Trotter

Re: Skype stats

yes I am

Offline

#5 2014-08-28 14:33:03

Midimistro
One Person... One logo... Infinite possibilities

Re: Skype stats

Hey Starshine, is this code in C, C++ or another language?


My YouTube | My Soundcloud | My deviantArt | Skype Username: Midimistro
Request, commment, rate and subscribe!
tumblr_nelqkiTUgE1sdgqyuo1_500.png
If I don't get to you within 24 hours....I will get to you.... Eventually....

Offline

#6 2014-08-30 20:18:38

Star ★
Pony
Starshine Trotter

Re: Skype stats

'tis python

Offline

#7 2014-09-02 13:47:02

Midimistro
One Person... One logo... Infinite possibilities

Re: Skype stats

Could you post the C#, C or Java Equivalent? Python is the only language I haven't studied yet...
If you can't, I'll understand and try to convert it myself....


My YouTube | My Soundcloud | My deviantArt | Skype Username: Midimistro
Request, commment, rate and subscribe!
tumblr_nelqkiTUgE1sdgqyuo1_500.png
If I don't get to you within 24 hours....I will get to you.... Eventually....

Offline

#8 2014-09-02 17:01:47

Star ★
Pony
Starshine Trotter

Re: Skype stats

Well, I hardly remember Java, never touched C#, and C is a nasty language for processing heavily text-based stuff... which means to me, that sounds like fun! Might even toss in a couple extras.

I have no idea when I'll get to it though, since work and life have been draining me down pretty hard lately.

Offline

#9 2014-09-04 15:20:11

Midimistro
One Person... One logo... Infinite possibilities

Re: Skype stats

Starshine ★ wrote:

Well, I hardly remember Java, never touched C#, and C is a nasty language for processing heavily text-based stuff... which means to me, that sounds like fun! Might even toss in a couple extras.

I have no idea when I'll get to it though, since work and life have been draining me down pretty hard lately.

As I said before, the only language I don't know is Python. I can easily translate it to other languages, that is, once it is in Java, Python's Older Brother.

Based on the code I am reading, I need to know whether the following is a string or a reference to a certain file-type that is being read:

  log_filename = 'skype-chat.log'

If you want to talk about working on this together, I believe I added you as a contact on Skype (my username is Midimistro). Let me know if you have any questions...


My YouTube | My Soundcloud | My deviantArt | Skype Username: Midimistro
Request, commment, rate and subscribe!
tumblr_nelqkiTUgE1sdgqyuo1_500.png
If I don't get to you within 24 hours....I will get to you.... Eventually....

Offline

#10 2014-09-04 20:51:59

Star ★
Pony
Starshine Trotter

Re: Skype stats

The Java-ish equivalent for that would be something like

String log_filename = "skype-chat.log";

Offline

#11 2014-09-05 13:53:37

Midimistro
One Person... One logo... Infinite possibilities

Re: Skype stats

Starshine ★ wrote:

The Java-ish equivalent for that would be something like

String log_filename = "skype-chat.log";

That's kindof what I thought. Thanks... I will try to translate as much as I can by myself. If I have another question, I will ask you...


My YouTube | My Soundcloud | My deviantArt | Skype Username: Midimistro
Request, commment, rate and subscribe!
tumblr_nelqkiTUgE1sdgqyuo1_500.png
If I don't get to you within 24 hours....I will get to you.... Eventually....

Offline

#12 2015-12-06 20:29:38

Midimistro
One Person... One logo... Infinite possibilities

Re: Skype stats

Still haven't been able to convert it completely tongue. Maybe it would help if you put in {} where each  belongs so I know when you are starting a new class or not.
Two questions:
1. What is lambda?
2. Do you want to work together to make it in other languages or not really?


My YouTube | My Soundcloud | My deviantArt | Skype Username: Midimistro
Request, commment, rate and subscribe!
tumblr_nelqkiTUgE1sdgqyuo1_500.png
If I don't get to you within 24 hours....I will get to you.... Eventually....

Offline

#13 2015-12-15 10:09:26

Star ★
Pony
Starshine Trotter

Re: Skype stats

1. lambda is basically just a function. I tend to write a sort of lispy functional style where it works.
2. I ... don't care really. Basically this was a one-shot, "hey this would be fun" kind of thing. If you're thinking of turning it into a bot or summat, I'd help but that'd be like, yours.

(also, sorry for the delay replying to this, I am kind of terrible at the internets lately)

Offline

Quick reply

Write your message and submit

Board footer

Powdered by FluxBB