Subject: proposal for a morphological database of classical greek

i am writing to seek input on a proposal that we tentatively plan to submit to the neh at the end of the summer . the idea is fairly simple : we want to use the morphological parser that we have been developing for the past eight years or so to generate morphological analyses of every unique string in the thesaurus linguae graecae ( tlg ) , the database of greek texts available on cd rom from uc irvine . the tlg is large - - 42 million words at present and a new version is due out later this year with 57 million words . greek is a highly inflected language - - not as bad as georgian and some others , but a verb can , with prefixes , have millions of different forms . the tlg corpus extends over a thousand years and includes virtually all literary greek , and thus would support diachronic as well as synchronic linguistic analysis . i would like to know if there is anything we could do that would make this work on greek useful for the linguistics community in general ? classicists need this database , but it would be very exciting if it could stimulate additional work . the working summary of the project follows . the proposal outline is fairly succinct ( c . 7 pages ) but it is full of greek and does not lend itself readily to transliteration . if you would like to see a copy , please send me your us mail address and we will send one to you . casual reactions to just this summary are , however , more than welcome . note : reactions need not be positive . if this does not seem a worthwhile thing to pursue , i would love to know why . thanks ! gregory crane department of classics boylston 319 harvard university cambridge ma 02138 crane @ ikaros . harvard . edu a linguistic database of classical greek this project will extend an existing parser for classical greek , expanding its database of stems to cover the majority of all words attested in the literary record , and will use this database to create a morphologically parsed database of more than 1 , 000 , 000 unique strings available in the tlg : in the end , we will publish the database of analyzed strings , the databases of stems and endings which drive the parser and the parser itself . the resulting databases are an essential piece of scholarly infrastructure that will ( 1 ) revolutionize current searching techniques for the tlg and other greek databases , ( 2 ) make it possible to apply more sophisticated retrieval / text analysis to greek texts , and ( 3 ) provide a basic but crucial lookup tool that will aid non-specialists in other fields ( e . g . , philosophy , political science , religion ) who seek to work directly with the greek database . note : this document is a sketch for a possible proposal to be submitted to the neh at the end of august 1992 . it is , in effect , a proposal for a proposal and is thus open to revision on any and all points .
