Here’s my first session idea: I’m compiling a corpus of both old and new media vernacular texts as part of a semantic/anthropological examination of American beliefs about health. (It’s called CADOH—Corpus of American Discourses on Health). I’ve been using the pilot stages of it to look at the distribution of terms such as fat, stress, cold, and oil. I’m envisioning its final form as a mix of vernacular discussions. While good corpora exist already for contemporary magazine, newspaper, and fiction (e.g. COCA), I’m aiming to capture more transient conversations about health, including blog posts and their comments, listervs, online forums and wikis, letters to the editor, and radio transcripts. To make it useful for others, I will need copyright approval for sharing the texts. So I’m proposing a helpathon in order to hear from others who have dealt with compiling current materials. What ways to request copyrighted info have been helpful? For those items not under a Creative Commons license, are the costs prohibitive for re-using current material? And, once the copyright issues are dealt with, what’s the best way to make the corpus accessible? Would this be a good Omeka-type project?
#1 by Richard Leslie on March 5, 2012 - 8:54 am
Really interested in how to deal with the copyright issues.
#2 by Ben Brumfield on March 5, 2012 - 4:59 pm
I’m moving to a project in which I’ll be dealing with UK copyright law, which is much more horrible when it comes to cultural materials than US law. I’ll need to track the provenance of each stage of manuscripts, from microform to scanning to transcription to correction. I’d be interested to hear what other people are doing when the data isn’t “free”.
#3 by rafia on March 10, 2012 - 12:42 pm
COCA seems to have based their corpus on fair use “The following is an extended discussion of why be believe that our use of the texts in the Corpus of Contemporary American English (COCA) is within the bounds of US Fair Use Law. Similar arguments would be used for other corpora that we have created.” corpus.byu.edu/copyright.asp
#4 by Laurel Stvan on March 10, 2012 - 1:13 pm
COCA struck a compromise–they only get to use a small snippet of each text, and then the big publishers allowed them to use their materials. But it’s a “blessing and curse” scenario. I’d like to make available full texts of many articles, as a more cross-disciplinary resource for people who want to look beyond a couple of sentences for context…
#5 by tabata on April 15, 2012 - 3:55 am
Do you have a Facebook fan page for your site?
#6 by rafia on April 18, 2012 - 6:17 pm
@tabata We do not currently have a Facebook site, but we may have one in the future…
Trackback: sometimes