I’m working on a community dashboard for work that’ll help me monitor different ways in which people are interacting with the community. One metric I thought it’d be neat to have was how many people have filed more than X bug reports during the last period Y (month, say, or week) and what the average for that period over time has been. This involves a pretty complicated query wherein you have to group by both user and period, which is kind of tricky. It’s also very expensive: With only about 1200 bugs in our database, this query was taking six seconds when I was testing it out. That’s clearly unacceptable. So I poked around a bit and discovered the GROUP_CONCAT function, which lets you group by one column and output the relevant grouped data in a delimited column. So say your query is something like this:
select DATE_FORMAT(creation_ts,’%Y-%m’) date, group_concat(reporter) cnt from bugs group by DATE_FORMAT(creation_ts,’%Y-%m’)
Here I’m getting all months in the database and a listing of all reporters of bugs for that month. The “cnt” column is a comma-delimited lists of user ids for those who reported bugs in the given month, duplicates included (though you can specify DISTINCT to eliminate duplicates). The query returns almost instantly and gives me something I can parse pretty easily in the programming language of my choice.
In my case, I split the “cnt” column on commas for each result and tally the ids for users who appear in the list X or more times. I then pass this data back to a function that does math to get averages and to find a count for the current month, and voila, community metrics.
I blog this here because it’s the sort of thing I might want to remember later, and I always put that stuff here rather than at my work blog, where my regular readers probably figure this should actually go.