How about you? How does your writing compare with your gender?
http://www.bookblog.net/gender/genie.html
Wow, people research useless things.
I found it actually quite good to find out what gender my writing style is. I'll use it as a way to refine my writing now, well at least a little more.
We could also use this, for those that are considered one gender, we could use it to help ourselves into the mind of the opposite gender.
I was going to add something thoughtful here about my philosophies on algorithms vs. human feedback, but really, my brain's kind of kaput for the evening, so we'll just end it here.
I tested it on parts of two stories. One was from a male POV and the other from a female POV. Each one identified the piece as being clearly written by an author of the same sex as the POV character. The author's sex has nothing to do with it. In that light, I guess it might be useful to some authors to see how well they have stayed with the POV character, but even that I doubt.
quote:
I agree with Spaceman. I think this thing is meaningless. The thing that makes me question it is the words they identify as male and female. "The" and "a", two of the most common words in the English Language, are male? Come on!
When I saw that, I assumed the researchers were saying that men tend to write more about things, and women more about people, which aren't prefixed with articles. I'd call that a gross simplification of a very complicated and individualized artform, but eh. It's still fun.
quote:
I tested it on parts of two stories. One was from a male POV and the other from a female POV. Each one identified the piece as being clearly written by an author of the same sex as the POV character.
Weird, I had the exact opposite results. I had expected my writing to follow whatever the POV's gender--shouldn't it to some extent?--but found it was more a function of when I wrote the piece.
I'd say the consistancy of what it says about my writing means that something real is being measured by the algorithm. Whether or not I'd call it "gender" I don't know...
http://www.cs.toronto.edu/~gh/Courses/2528/Readings/Koppel-etal.pdf
Some interesting facts, distilled (and keep reading even if you don't like technical stuff, because this gets very interesting near the end):
- It's pretty standard machine learning / statistical NLP. That is, you present data (training instances) about a bunch of documents to a "learning algorithm," which spits out a hypothesis, which you can then use to classify documents the learner didn't see.
- They claim 80% male/female classification accuracy overall, which indicates that there are intrinsic differences between male and female writing.
- The algorithm trains on the British National Corpus (BNC), which contains tagged fiction and non-fiction in several genres and sub-genres.
- The list of texts and "lexical features" (words - heh) is here:
http://shekel.jct.ac.il/~argamon/gender-style/
- Each document was represented by 1081 numbers: the count of each feature in the document divided by the document length.
This last part means that the classifier they trained knows nothing more about a document than each feature's average number of occurrences per word. For example, let's say the features are "swish," "blick," and "horntail." A full training instance might look like this:
"Misty Riders of the Sunken Bog" MALE 2/2054 6/2054 0/2054
meaning that of the 2054 words in Misty Riders of the Sunken Bog, 2 were "swish," 6 were "blick," and 0 were "horntail."
So with just 566 of those, they trained a computer to classify documents distilled to the same kinds of numbers. (Except there were 1081 numbers per document.)
- They use some smart feature reduction to cut those 1081 features way down, and then they separate them with a linear classifer. This is, like, the weakest kind of classifier imaginable, but it does let you do some nice analysis.
- For fiction, they found that relatively high proportions of a, the and as tend to indicate maleness, whereas she, for, with and not indicate femaleness.
- In non-fiction, that and one indicate maleness, and for, with, not, and and in indicate femaleness.
- They've done similar analysis on parts of speech. Men tend to prefer noun specifiers (determiners, numbers, modifiers), and women tend to prefer negation, pronouns, and certain prepositions.
My thoughts:
- I don't know if the web page's version is trained on British English or American English. A nationality mismatch would seriously skew the results, especially given the kinds of words that were discovered to be the best indicators.
- The web page says it uses a simplified version of the original algorithm. I don't know what that means for classification accuracy, but I'd assume the web page won't get 80% like the original.
- Personally, I interpret their analysis as saying that men tend to write more concretely. This supports all of my preconceived notions and biases, which is nice for me.
- Given the above, you'd have to be a total idiot to take any test results as indicating your "true" gender or other such nonsense. This is classic machine learning: completely devoid of context (even between words!). It hasn't got a clue. It just knows if you avoid numbers and use "not" a bunch, you're probably a woman. If you happen to be a man who avoids numbers and uses "not" a bunch, I'd say you're an artist.
[This message has been edited by trousercuit (edited December 29, 2006).]
Female Score: 864
Male Score: 717
Of course, talking about the "lexical features" doesn't help with accuracy much...
quote:
I agree with Spaceman. I think this thing is meaningless. The thing that makes me question it is the words they identify as male and female. "The" and "a", two of the most common words in the English Language, are male? Come on!
I need to point out that the authors discovered from existing texts that high proportions of these words indicate a male author. The original researchers (the ones that achieved 80% classification accuracy) didn't pull it out of their bums.
I'm curious why these words contribute nothing to the "Female" score, though. It's certainly non-standard to do that when you construct a linear classifier. Either the web page is doing things wrong or the original research uses a technique I'm not familiar with. I only scanned it, after all.
I did not try the option of non-fiction or blog entry. The algorithm might be more successful there, but I'm not interested enough to try it.
I do think that there is a quality that makes writing seem more feminine or masculine but I think it is a subjective quality that a computer can't track. Why is "the" masculine? Why is "was" feminine?
I find it marginally useful to be able to write in a masculine style when I am writing from a man's POV in a story or novel. But I would rather have peer reviewers tell me if my character and voice are realistic rather than a computer style analyzer. It's fun but useless.
I think this sort of thing is fascinating when it actually works.
quote:
Why is "the" masculine? Why is "was" feminine?
If you remove the negative connotations, that's the most interesting thing about their findings. Why indeed? Why would men usually use "the" more than women? Remember, they didn't pull it out of their bums, they found this from existing documents.
The only thing we can definitively say is that there's a statistical correlation. And you're missing a subtle point that everyone else seems to be missing, too: it's the relatively high frequencies of these words that indicate gender. Notice the weighted sum it computes. (You may not be missing this point, but assuming you're not, if you're going to criticize something you ought to represent it honestly.)
Like I said, machine learning algorithms are pretty dumb, and seriously lack context.
However, think about how you determine the gender of a writer. There are certain trigger words, phrases, and ideas, general tone, subject, and whatnot. That's your conscious mind. Who's to say your unconscious one isn't doing something similar to the machine learning algorithm to bias or bootstrap your conscious processes?
[This message has been edited by trousercuit (edited December 29, 2006).]
quote:
I use the words that I need to use when I write story.
Right. And next you'll claim that only very specific words are suitable to communicate your ideas, and that anyone trying to communicate these same ideas, male or female, would use exactly these same words to do it, and in the same proportions?
Preposterous, sir! Meet me at the flagpole with your best sword! I'll bring a revolver...
[This message has been edited by trousercuit (edited December 29, 2006).]
It's interesting, and very scientific, and I'm all for using statistics, I just don't think it can be done with good fiction authors in this manner. There's just some human things that don't fit very well into statistics, and I think this is one of them.
Another thing to consider is that there are many ways to measure and use statistics. How often have you heard people argue that data was incorrectly analyzed from some report? There's always ways to twist data and make it come out the way you want.
"I don't think it works!" "I think it's meaningless!" The first criticism is demonstrably false, and the second lacks a good, solid definition of "meaningless."
At any rate, I'm done. This conversation is determined to stay at a very low "I have a gut feeling and I don't like statistics when they tell me something I don't agree with" level, which I find entirely unpalatable. Sure, statistics can be faked, but these people did not choose those words, they had an algorithm, which has no gender biases, choose them from 1081 candidates. Also, they get 80% accuracy on unseen documents, which isn't amazing, but it's much better than random guessing.
Also keep in mind that the web page "Gender Genie" may not accurately represent the original research.
When playing with smaller sections of text that were strictly from one character's POV, the "writer" tended to match the gender of the POV. Overall on large 1,000+ word sections, the "author" was pretty strongly female, which is good all things considered.
Face it, there are differences in the way men and women think. Why wouldn't it come out in our writing too?
Even my legal breifs turn up as female writer. Humm, hold that thought. Okay, when I take documents the firm's senior partner (male) wrote, it generally comes up as a "male" writer's work.
I sometimes score as male, sometimes as female. So it's a little less accurate, for me personally, than a benchpress algorithm would be...but if I were a woman, then it'd probably be a lot more accurate than the benchpress algorithm.
quote:
Right. And next you'll claim that only very specific words are suitable to communicate your ideas, and that anyone trying to communicate these same ideas, male or female, would use exactly these same words to do it, and in the same proportions?
Absolutely not. I can come up with probably ten different ways to say everything. How I say it depends on the context and the mood of the story. If everyone would write it the same way, why should I bother to write at all?
Websters:
quote:
meaningless - having no meaning; without significance; senseless.
I'm using the word to mean without significance.
I have no problems with studies that show innate differences between men and women. I believe that there are innate differences between men and women (at least, taken across whole populations..but then again, statistically speaking you will almost always find differences between two populations if you take a big enough sample).
I only question the usefulness of much of this research, most specifically and especially this one. Writing is so subjective anyway.
This post was written by a woman. ?!?
--
spark.com has that gender test full of weird questions such as
So, does Canada suck or what?
a) Yes
b) Yeah
...and from it, apparently, it gets you right with reasonable accuracy.
This one doesn't, based on my trials and the trials of others here and in Hatrack Forums. Too bad.
[This message has been edited by wbriggs (edited December 29, 2006).]
Everything is quite vague. Lies, damn lies, and statistics. Moreover, the entire study serves no real purpose considering most of us here experienced a success rate of about 50%. That only proves a good writer can fool the algorithm.