Hatrack River Writers Workshop: What gender is your writing?

This is topic What gender is your writing? in forum Open Discussions About Writing at Hatrack River Writers Workshop.

To visit this topic, use this URL:
http://www.hatrack.com/ubb/writers/ultimatebb.php?ubb=get_topic;f=1;t=003517

Posted by apeiron (Member # 2565) on :

I saw this link on digg.com today and tried it out with my own writing. Strangely enough, my older stuff is more "feminine" and my newer stuff is more "masculine." I'm not sure what makes one gender use one type of phrasing more than another, but I'm assuming this is based on legitimate research. It made it into Nature anyway.

How about you? How does your writing compare with your gender?

http://www.bookblog.net/gender/genie.html

Posted by wetwilly (Member # 1818) on :

I am a male. Apparently my writing is about 50/50 male/female.

Wow, people research useless things.

Posted by xardoz (Member # 4528) on :

Not entirely useless, I'd argue. If you're man trying to write in a female style or voice, or vice-versa, this could help.

Posted by Leigh (Member # 2901) on :

Naturally, I'm male.

I found it actually quite good to find out what gender my writing style is. I'll use it as a way to refine my writing now, well at least a little more.

We could also use this, for those that are considered one gender, we could use it to help ourselves into the mind of the opposite gender.

Posted by Spaceman (Member # 9240) on :

I think the thing is completely meaningless.

Posted by ChrisOwens (Member # 1955) on :

I spot checked 5 stories, testing it with two of my stories, and going on critters and testing three random stories from male and female writers. I took its advice and copied more than 500 words in the box. It correctly identified all 5.

Posted by Scribbler (Member # 3743) on :

I tested a couple of things I wrote. One was male and one was female, but each was within 50 points of being the other. What does this mean? Am I genderless? Or just boring?

I was going to add something thoughtful here about my philosophies on algorithms vs. human feedback, but really, my brain's kind of kaput for the evening, so we'll just end it here.

Posted by luapc (Member # 2878) on :

I agree with Spaceman. I think this thing is meaningless. The thing that makes me question it is the words they identify as male and female. "The" and "a", two of the most common words in the English Language, are male? Come on!

I tested it on parts of two stories. One was from a male POV and the other from a female POV. Each one identified the piece as being clearly written by an author of the same sex as the POV character. The author's sex has nothing to do with it. In that light, I guess it might be useful to some authors to see how well they have stayed with the POV character, but even that I doubt.

Posted by apeiron (Member # 2565) on :

quote:
I agree with Spaceman. I think this thing is meaningless. The thing that makes me question it is the words they identify as male and female. "The" and "a", two of the most common words in the English Language, are male? Come on!

When I saw that, I assumed the researchers were saying that men tend to write more about things, and women more about people, which aren't prefixed with articles. I'd call that a gross simplification of a very complicated and individualized artform, but eh. It's still fun.

quote:
I tested it on parts of two stories. One was from a male POV and the other from a female POV. Each one identified the piece as being clearly written by an author of the same sex as the POV character.

Weird, I had the exact opposite results. I had expected my writing to follow whatever the POV's gender--shouldn't it to some extent?--but found it was more a function of when I wrote the piece.

I'd say the consistancy of what it says about my writing means that something real is being measured by the algorithm. Whether or not I'd call it "gender" I don't know...

Posted by trousercuit (Member # 3235) on :

If, for some crazy reason, you get a hankering for reading the research paper this is based on, you can find it here:

http://www.cs.toronto.edu/~gh/Courses/2528/Readings/Koppel-etal.pdf

Some interesting facts, distilled (and keep reading even if you don't like technical stuff, because this gets very interesting near the end):

- It's pretty standard machine learning / statistical NLP. That is, you present data (training instances) about a bunch of documents to a "learning algorithm," which spits out a hypothesis, which you can then use to classify documents the learner didn't see.

- They claim 80% male/female classification accuracy overall, which indicates that there are intrinsic differences between male and female writing.

- The algorithm trains on the British National Corpus (BNC), which contains tagged fiction and non-fiction in several genres and sub-genres.

- The list of texts and "lexical features" (words - heh) is here:

http://shekel.jct.ac.il/~argamon/gender-style/

- Each document was represented by 1081 numbers: the count of each feature in the document divided by the document length.

This last part means that the classifier they trained knows nothing more about a document than each feature's average number of occurrences per word. For example, let's say the features are "swish," "blick," and "horntail." A full training instance might look like this:

"Misty Riders of the Sunken Bog" MALE 2/2054 6/2054 0/2054

meaning that of the 2054 words in Misty Riders of the Sunken Bog, 2 were "swish," 6 were "blick," and 0 were "horntail."

So with just 566 of those, they trained a computer to classify documents distilled to the same kinds of numbers. (Except there were 1081 numbers per document.)

- They use some smart feature reduction to cut those 1081 features way down, and then they separate them with a linear classifer. This is, like, the weakest kind of classifier imaginable, but it does let you do some nice analysis.

- For fiction, they found that relatively high proportions of a, the and as tend to indicate maleness, whereas she, for, with and not indicate femaleness.

- In non-fiction, that and one indicate maleness, and for, with, not, and and in indicate femaleness.

- They've done similar analysis on parts of speech. Men tend to prefer noun specifiers (determiners, numbers, modifiers), and women tend to prefer negation, pronouns, and certain prepositions.

My thoughts:

- I don't know if the web page's version is trained on British English or American English. A nationality mismatch would seriously skew the results, especially given the kinds of words that were discovered to be the best indicators.

- The web page says it uses a simplified version of the original algorithm. I don't know what that means for classification accuracy, but I'd assume the web page won't get 80% like the original.

- Personally, I interpret their analysis as saying that men tend to write more concretely. This supports all of my preconceived notions and biases, which is nice for me.

- Given the above, you'd have to be a total idiot to take any test results as indicating your "true" gender or other such nonsense. This is classic machine learning: completely devoid of context (even between words!). It hasn't got a clue. It just knows if you avoid numbers and use "not" a bunch, you're probably a woman. If you happen to be a man who avoids numbers and uses "not" a bunch, I'd say you're an artist.

[This message has been edited by trousercuit (edited December 29, 2006).]

Posted by trousercuit (Member # 3235) on :

For the above post:

Female Score: 864
Male Score: 717

Of course, talking about the "lexical features" doesn't help with accuracy much...

Posted by trousercuit (Member # 3235) on :

quote:
I agree with Spaceman. I think this thing is meaningless. The thing that makes me question it is the words they identify as male and female. "The" and "a", two of the most common words in the English Language, are male? Come on!

I need to point out that the authors discovered from existing texts that high proportions of these words indicate a male author. The original researchers (the ones that achieved 80% classification accuracy) didn't pull it out of their bums.

I'm curious why these words contribute nothing to the "Female" score, though. It's certainly non-standard to do that when you construct a linear classifier. Either the web page is doing things wrong or the original research uses a technique I'm not familiar with. I only scanned it, after all.

Posted by Spaceman (Member # 9240) on :

I can only say that I think the algorithm is meaningless for fiction. I use the words that I need to use when I write story. For a female POV character, I tend to use a lot of instances of the word 'she,' thus a skewing toward the female side.

I did not try the option of non-fiction or blog entry. The algorithm might be more successful there, but I'm not interested enough to try it.

Posted by Christine (Member # 1646) on :

I've done this before...I did it today with my WIP and it came out female...but not extremely so. I've had it guess wrong in the past.

I do think that there is a quality that makes writing seem more feminine or masculine but I think it is a subjective quality that a computer can't track. Why is "the" masculine? Why is "was" feminine?

I find it marginally useful to be able to write in a masculine style when I am writing from a man's POV in a story or novel. But I would rather have peer reviewers tell me if my character and voice are realistic rather than a computer style analyzer. It's fun but useless.

Posted by wbriggs (Member # 2267) on :

I thought maybe it would show what POV I was writing in, but it doesn't do that well, either.

I think this sort of thing is fascinating when it actually works.

Posted by trousercuit (Member # 3235) on :

Well, you would, Mister AI Researcher Person.

I do for similar reasons, of course...

quote:
Why is "the" masculine? Why is "was" feminine?

If you remove the negative connotations, that's the most interesting thing about their findings. Why indeed? Why would men usually use "the" more than women? Remember, they didn't pull it out of their bums, they found this from existing documents.

The only thing we can definitively say is that there's a statistical correlation. And you're missing a subtle point that everyone else seems to be missing, too: it's the relatively high frequencies of these words that indicate gender. Notice the weighted sum it computes. (You may not be missing this point, but assuming you're not, if you're going to criticize something you ought to represent it honestly.)

Like I said, machine learning algorithms are pretty dumb, and seriously lack context.

However, think about how you determine the gender of a writer. There are certain trigger words, phrases, and ideas, general tone, subject, and whatnot. That's your conscious mind. Who's to say your unconscious one isn't doing something similar to the machine learning algorithm to bias or bootstrap your conscious processes?

[This message has been edited by trousercuit (edited December 29, 2006).]

Posted by trousercuit (Member # 3235) on :

quote:
I use the words that I need to use when I write story.

Right. And next you'll claim that only very specific words are suitable to communicate your ideas, and that anyone trying to communicate these same ideas, male or female, would use exactly these same words to do it, and in the same proportions?

Preposterous, sir! Meet me at the flagpole with your best sword! I'll bring a revolver...

[This message has been edited by trousercuit (edited December 29, 2006).]

Posted by luapc (Member # 2878) on :

The only thing this does is try to pigeonhole authors by their style of writing, and associate it with gender, and I just don't think it works. Any good author can make convincing characters of either gender, and completely fool their readers. Through out literary history, there's always been authors who have taken pen names of the opposite gender to get their work more accepted. If female authors really wrote feminine, and males masculine, there's no way any author could pull this off, but it's done all the time.

It's interesting, and very scientific, and I'm all for using statistics, I just don't think it can be done with good fiction authors in this manner. There's just some human things that don't fit very well into statistics, and I think this is one of them.

Another thing to consider is that there are many ways to measure and use statistics. How often have you heard people argue that data was incorrectly analyzed from some report? There's always ways to twist data and make it come out the way you want.

Posted by trousercuit (Member # 3235) on :

By the way, I'd like to apologize in advance on behalf of the universe itself to everyone who gets rubbed the wrong way by research that indicates innate differences between male and female mental wiring.

Posted by trousercuit (Member # 3235) on :

I love how we all argue against 80% classification accuracy on unseen documents. Okay, we don't all do that. Some of us do. Or we forget.

"I don't think it works!" "I think it's meaningless!" The first criticism is demonstrably false, and the second lacks a good, solid definition of "meaningless."

At any rate, I'm done. This conversation is determined to stay at a very low "I have a gut feeling and I don't like statistics when they tell me something I don't agree with" level, which I find entirely unpalatable. Sure, statistics can be faked, but these people did not choose those words, they had an algorithm, which has no gender biases, choose them from 1081 candidates. Also, they get 80% accuracy on unseen documents, which isn't amazing, but it's much better than random guessing.

Also keep in mind that the web page "Gender Genie" may not accurately represent the original research.

Posted by kings_falcon (Member # 3261) on :

I think it is sort of fascinating.

When playing with smaller sections of text that were strictly from one character's POV, the "writer" tended to match the gender of the POV. Overall on large 1,000+ word sections, the "author" was pretty strongly female, which is good all things considered.

Face it, there are differences in the way men and women think. Why wouldn't it come out in our writing too?

Even my legal breifs turn up as female writer. Humm, hold that thought. Okay, when I take documents the firm's senior partner (male) wrote, it generally comes up as a "male" writer's work.

Posted by Survivor (Member # 213) on :

I think it's no more meaningless than a genie that asked for your weight, age, and benchpress and then said whether you were a man or woman. Unfortunately, it's also no less offensive to our modern sensibility.

I sometimes score as male, sometimes as female. So it's a little less accurate, for me personally, than a benchpress algorithm would be...but if I were a woman, then it'd probably be a lot more accurate than the benchpress algorithm.

Posted by Spaceman (Member # 9240) on :

quote:
Right. And next you'll claim that only very specific words are suitable to communicate your ideas, and that anyone trying to communicate these same ideas, male or female, would use exactly these same words to do it, and in the same proportions?

Absolutely not. I can come up with probably ten different ways to say everything. How I say it depends on the context and the mood of the story. If everyone would write it the same way, why should I bother to write at all?

Posted by Spaceman (Member # 9240) on :

I didn't realize I had to define words that were in the dictionary.

Websters:

quote:
meaningless - having no meaning; without significance; senseless.

I'm using the word to mean without significance.

Posted by Christine (Member # 1646) on :

trousercuit, I think you're missing the point. Perhaps "meaningless" isn't the right word (and actually, that's not the one I used), but useless does ring true. What, exactly, does it matter that a computer can predict the gender of an author 80% of the time? Why is that important? Does it help us write?

I have no problems with studies that show innate differences between men and women. I believe that there are innate differences between men and women (at least, taken across whole populations..but then again, statistically speaking you will almost always find differences between two populations if you take a big enough sample).

I only question the usefulness of much of this research, most specifically and especially this one. Writing is so subjective anyway.

Posted by wbriggs (Member # 2267) on :

Christine, you'll be happy -- or not -- to know that your last post was written by a man!

This post was written by a woman. ?!?

spark.com has that gender test full of weird questions such as

So, does Canada suck or what?
a) Yes
b) Yeah

...and from it, apparently, it gets you right with reasonable accuracy.

This one doesn't, based on my trials and the trials of others here and in Hatrack Forums. Too bad.

[This message has been edited by wbriggs (edited December 29, 2006).]

Posted by Spaceman (Member # 9240) on :

To further elaborate on Christine's points... The original had an 80% success rate. With what? Were these random manuscripts written by professionals or rank amateurs? Who selected what manuscripts would go into the pile from which they were selected? How were the manuscripts randomized? What is the confidence interval for that 80% figure? What about the confidence about classification of the words? How do we know they classified them right?

Everything is quite vague. Lies, damn lies, and statistics. Moreover, the entire study serves no real purpose considering most of us here experienced a success rate of about 50%. That only proves a good writer can fool the algorithm.

Posted by discipuli (Member # 3395) on :

mines 49% female , 51% male . It was probably based on a much more complex analysis , but i don't think they can program a machine to analyse metaphor , alliteration etc. without years of work. It just looks at basics .

Posted by Robert Nowall (Member # 2764) on :

The last story I sent out was male...an essay I wrote for Kathleen was also male...but a nasty lurid little story I wrote a few years back was female. I'll have to sample my novel and paste a chunk of that in there. This is an interesting topic...