It’s strange to be simultaneously working on the biggest project of my life to date and applying for jobs, where I am asked to reframe that project, over and over, in other people’s terms. It’s a genre project. It’s a mixed-methods digital humanities project. It’s an academic Englishes corpus analysis project.
Sometimes all those frames are stymying. I’m reminded of the pointed man in the Pointless Forest, who points out, “A point in every direction is the same as no point at all.”
Briefly, I’m analyzing hyperlink and parenthetical citation practices in popular press, print academic, and online academic writing. I’ve collected about 10,000 texts and am investigating how often they use citations, what kinds of citations they use, how different people use citations differently.
In these last few paragraphs, I’ll describe several attempts at graphing a trend I recognized in my data: Across CCC and College English from 1996 – 2015, the authors who are most-cited in rhetoric and composition (a list I borrowed from Derek Mueller) tend to cite most extremely. That is, although on average most-cited authors have fewer in-text citations than newcomers, they’re overrepresented at the lowest and highest citation counts.
Attempt 1, in which I’m What’s Wrong with America
I often warn students of misleading graphing practices. Unexplained terms. Strange data groupings. Axes that aren’t to scale or don’t start at zero. But I’ve learned just how tempting these practices can be. Take a look at the X-axis above: The left bar is just articles with 5 or fewer citations. The second bar is all articles with 6 to 99 citations. The third bar is all articles with 100+ citations, meaning it actually includes the data represented in the final two bars. Terrible, terrible, terrible. Other problems, too: My writing group asked, “What does ‘expected’ mean, exactly?” Sterling, my fiancé, “Why doesn’t the ‘expected’ line go all the way to the Y-axis?” Mary P., my director, “Does ‘percentage of articles’ mean all the articles?”
Mary P. suggested I try doing something with different colors representing proportions instead (“Oh, like a heatmap?” “Sure!”).
Attempt 2, in which I Learn about Heatmaps
Several hours of wrestling with my data and plot.ly references later, I had this pretty line. When I showed it proudly to Sterling, he didn’t mince words: “This graph is really hard to read.” I pointed defensively at the bottom: “That’s the expected proportion. I’m showing where it diverges. See how it’s yellower on the left and right?” “Yeah, but is 30 or 60 closer to that color?” “Oh.” “And a heatmap’s better for when you’ve got multiple things on the Y-axis to compare.” I’d guessed as much when I couldn’t find any examples of another one-line heatmap, but I hoped I was the exception. Maybe I’ll come back to the heatmap when I’ve got similar data for other groups of authors. But once I’d nursed my wounded pride, I started over.
Attempt 3: Return of the Histogram
Finally, a graph that makes the visual point I wanted (high at both ends, low in the middle), without the X-axis convolution. It shows divergence from the average proportion of most-cited authors. So, at 0 in-text citations, most-cited authors make up about 25.6% of the articles, which is 9.2 percentage points above the 16.4% overall average.
I’m still not 100% thrilled: I need to revise the X-axis to clarify that counts are groups (1-15, 16-30, etc.). I hate how long the Y-axis label is. I don’t love the graph title.
On those last two points, I welcome your suggestions: Any ideas for how I might make the Y-axis label or the title less cumbersome?
Elizabeth Chamberlain is a 4th-Year PhD Candidate.