classic case of chicken and the egg?

So tell me, which came first: gender or purpose? what, you ask well, let me explain. last night i spent a long, lovely time in the bathtub reading Jennifer Coates introductory book, Women Men and Language. In the chapter discussing quantitative studies, Coates exemplified the classic sociolinguistic gender pattern: i.e., that men tend to speak in regional dialects, while women tend to strive for the standard dialect (due to reasons such as prestige and covert prestige). All the studies shown in this book were separated by *both* gender and socioeconomic patterns. Which makes me wonder it the emphasis on the socioeconomic does not skew the findingsat least making it difficult to say that it is only gender at work. What makes me question this mix are the findings I have read in the last two days by Herring, as well as by Huffaker and Calvert. Both studies (and I will look more at the Huffaker/Calvery article later), showed non-significant relationship with gender and variables traditionally associated with gender. There was a much stronger correlation with genre. And this makes me start to think if gender can be more easily disguised online, and patterns hereto associated with gender are not present in any significant way, then maybe gender was not the defining characteristic after all. Maybe features traditionally associated with gender (hedging, pronoun usage, etc.) have much more to do with the communicative purpose and the audience than the speaker/writer’s gender.

Comparing these off-line and online patterns is helping me focus my next hypotheses (article forthcoming). One aspect I want to examine is each posts communicative purpose and intended audience to determine if patterns traditionally associated with the bloggers gender could actually be originating from the purpose.

How do we answer the ‘how’?

We will leave here today and our language will have changed by the interaction that has taken place.’ –Nev Shrimpton

This was the closing thought one of our corpus linguists left us with after his very interesting seminar on Friday. And, while it is an exaggeration for emphasis, it is also true. Communication is a constant state of negotiation, and language in continuous flux. Those with whom we come into contact modify our language. We speak a certain way with a certain group, and even with ourselves. And while that thought in itself is interesting, even more so is the ‘how’. How does our language change (variation)? How do we use language to communicate in different situations and with different people? How do we do this when taking into consideration the ‘invisible readers’ of blogs and people outside our real-world sociolects (often limited geographically to a select group of speakers)? I believe that blogs are a exceptional object of research to answer this ‘how’. Blogs are social. We have established that they form social networks. The clustering/small world effects allow us to look for variation in regards to perceived general audience as well as to perceived social network. So again, how do we answer the ‘how’. Several ways, I would say.

Social network analysis:

    Where are people positioned in their network? How fluid are those positions? How often (if at all) do their interact with members of other networks?

Corpus Linguistics:

    1. Are different networks using their blogs in different ways? To begin to find this out, I want to identify the registers of different networks. I believe this is key. Are some more speech like than others? Are some more matter-of-fact, some more questioning? Where do they fall on the continuum of speech and writing? Does this differ between the different types of weblog networks? To find this out, I must tag for parts of speech. I will use grammatical patterns, rather than semantic, to determine register.

There are other important and interesting things to look at when using corpus methods. For example, you can use look at pronouns and nouns to measure referring expressions. I think this can be quite interesting, especially when considering that following discourse over different weblogs is not an easy task. This, of course, cannot be done purely from the corpus. You need to take into account whether or not the noun is new or given information. I think whether or not it is also a link will also be significant.

Semantic patterns are also very interesting and will play an important role in determining the register of a group. While this can be done with keyword lists, I think a much better and more useful way *is* with tOKo. You not only get the unique patterns, but their social relations as well. This makes intuitive guesses much less about intuition and more about measurement.

Sociolinguistic:

    How do their positions relate to language maintenance and variation (is there a relationship between the fluidity of placement and variation?)? What about other social variables? Does ‘real-world position’ (i.e. professor rather than a grad student in an academic network) make a difference? Gender? Geography?

About using XML files: The XML files I have at the moment are already tagged for author and URL, which will make exploring social and linguistic relationships easier. I want to add tags which will allow me to explore on different levels; not least, grammatical and syntactic.

Tagging is not trivia!

Well, duh, you say.

Tagging is not only not trivial, but the most important part of the text. The way that you tag it greatly determines what you get out of your corpus. I put away the corpus work a bit while trying to get the data to visualize, but now need to get things into the indexer. The first step is to tag for parts of speech. I would like to use the original XML files because they are already tagged for some of the meta data I need (author, comments, etc). Getting parts of speech, however, is more complicated. I have looked at lots of different taggers (Stanford POS tagger, CLAWS, XG Tagger, GATE, etc) and they all have their pros and cons. The ones that are not so complicated are too difficult to manipulate. The ones that I can really manipulate take a lot of pre-tagging to tag. What I am going to work on today are configuration files for the XGTagger. This ones seems optimal for my needs, but will take a bit of work to get it going. Considering my deadline, I better get working!! More later 🙂

speaking of tags 😛

by george, i think shes got it!

i have been floundering for a while now, trying to fit what i am seeing in weblog communities into the linguistic model i had previously chosen and i tried, and i read, and i was just *not* making the connections! when i started working with lilia and anjo, i began to dabble in sociolinguistic theories, but never really let my feet get  too wet because it was not the theory i was supposed to be using. well, i finally gave up! i have stopped trying to force my data into a prescribed model of linguistics when there is another that actually fits! it is the Cinderella slipper of my data and i am going to embrace it! ok all exuberance aside, i am so excited about modifying my thesis to use sociolinguistic models rather than cognitive blending! i checked out every book the library has and have read most by now. i am drinking up the literature like my morning coffee and everything just makes sense! in a little under two weeks, i get to present all this new stuff to the linguists in my department as well as to a few over skype. (an aside, i like this experimental way of conducting a seminar skyping in potential opponents.) soon that note, i will leave you with the best sentence i have read so far this morning

“(Linguistic) Changes do not simply spread through the population person by person, but get taken up and manipulated by communities of practice in the construction of social meaning.” –P. Eckert