Radiolab talks and analysing my stories
Been a while since I blogged here. I should do so more often.
It has been my habit for years to eat my meals while listening to a talk, or watching a documentary, or watching a piece of fiction. Lately I've been listening to one of my favorite shows, Radiolab, after having downloaded a heap more of their shows. It really is an amazing show.
Yesterday I listened to an episode called "Oops". It is an hour long episode that originally aired on 28th June 2010. If you want to download it, the direct link is:http://www.podtrac.com/pts/redirect.mp3/audio4.wnyc.org/radiolab/radiolab090310.mp3
A lot of that episode was very funny, where they talked about the kind of silly errors that resulted from injudicious use of spell-checker programs, but one of the longer stories was extremely serious. It was how torture created an awful terrorist. I wish they'd followed the implications through more completely, but I was surprised that they just left it hanging there before moving on with the rest of the episode.
This morning I ate breakfast while listening to an episode from 26th July 2010 titled "Secrets of Success" in which Robert chatted with Malcolm Gladwell (one of my favorite thinkers) about what makes success. It was funny and very informative. I love the conclusion that, more than anything, doing something obsessively, basically for the love of it, is what makes someone so good at it that it often gets referred to somewhat mystically as "genius". It gives me hope that my writing might have some value, despite my vanishingly small audience.
Further to that last point, a few days ago I was listening to another Radiolab episode "Vanishing Words", from 5th of May, 2010.http://www.podtrac.com/pts/redirect.mp3/audio4.wnyc.org/radiolab_podcast/radiolab_podcast10success.mp3
The episode was about dementia, something that concerns me greatly, as it appears to run in my family. It is one of my greatest fears. The talk was largely about work that has been done using words as a window into the effect dementia has on the brain.
I couldn't stop thinking about it afterward, and ended up creating a fairly simple program that analysed each of my 6 novels, working out how many unique words each one contained, then attempted to estimate what kind of vocabulary that represented by dividing the unique words by total number of words. I'm not entirely sure this is the best, most reliable way to do this, but it might give a rough guide. I was surprised, and somewhat relieved to find that my books have been trending towards greater vocabularies. My story "flying" is a bit of an exception, having a very low vocabulary, but I think that may be because it consists almost entirely of dialogue and the main character is a fairly naïve young girl.
I love the fact that it's so damn easy to do that kind of thing in Linux. Unlike Microsoft Windows and Apple Mac computers, which actively discourage people from writing programs, Linux makes available dozens of easy tools for programming.
For my simple concordance
program I used mostly sed
, a very simple and fast stream editor that lets me feed text into a bunch of commands so that what comes out the other end is modified according to those commands. I also used Linux's tr
, and bc
commands. These are part of every Linux distribution.
I used sed mainly to get rid of any HTML tags I'd embedded in the text, and also to remove blank lines. The tr
command let me translate certain characters to other characters (uppercase to lowercase so words that started sentences were not considered different, and spaces to end of line characters to put each word on its own line) and explicitly delete certain characters (mostly punctuation and numbers). The wc
command counts characters, words, and lines in a text file. I sorted the file two ways using the sort
command. Firstly, after each word had been put on a separate line, I sort
ed them alphabetically so I could then run uniq
on the list, which collapsed the list down, getting rid of duplicates and prefixing each with the number of instances of that word. Then I sorted again, but this time numerically from least (most unusual) to highest (most common). I used bc
, the commandline calculator to find the ratio of unique words to total words as a single floating point number. Really pretty simple.
Another way of measuring the text is to analyse sentence complexity. There is already a Linux command that can do that. It is called style, though I'm not sure the output is very useful for what I want. The manual does give various formulas for calculating sentence complexity, so that's useful. I may look at doing that another day.
For anybody who is interested, here is my quick and simple concordance program. The parts in red are comments. They're just there to help me understand what the heck I was doing when I read it again six months later.
(I've put the code behind a cut tag because LJ messes up the entire journal if I have long lines.)( Read more...Collapse )
(Crossposted from http://miriam-e.dreamwidth.org/330209.html
at my Dreamwidth account. Number of comments there so far: