Subject: phonemicity of writing

i would like to get some estimate of what percentage of the world 's written languages are represented orthographically in a phonemic manner . more specifically , how many written languages are such that one can predict the phonological properties of a word - - - including stress , accent or tone - - - merely by consulting the string of symbols used to write that word , and without further information , such as the morphological structure of the word ? for a language whose writing system is largely phonemic , one could write down a set of rules for word pronunciation , and in the ideal case the number of rules would be within an order of magnitude of the number of graphemes . ( a few lexical exceptions do n't matter , as long as there are n't hundreds of them . ) i am leaving the sense of ` phoneme ' intentionally vague : normally a phonemic written representation implies that one can predict the surface phonemic representation from the written form of the word , but i would be perfectly happy considering a system to be phonemic if some more abstract level of phonological representation were represented , from which the surface phonemic representation could be predicted by regular phonological rules / principles . ( i should also note , to clarify the question further , that i am interested primarily in the correspondence between the written form and the spoken form for the the standard variet ( y , ies ) of the language , which the written form presumably reflects to some degree : i am not interested ( at the moment ) in dialects of the language which deviate to varying degrees from the standard . ) so , under this definition , spanish would presumably count as very phonemic since one can nearly always predict the pronunciation of a word , including its stress , from the orthography . romanian is less phonemic since while the actual set of phonemes in a word is mostly determinable by the set of graphemes used ( with the representation of glides being slight source of complication ) , the placement of stress requires some knowledge of the morphological class of the word ( following work of ioana chitoran ) . english is presumably among the least phonemic , since the ` regular rules ' of pronunciation are themselves quite complex , and there are many lexical exceptions . the particular classification of the writing system as logographic , moraic or segmental is unimportant : in principle chinese writing could be classed as phonemic ( albeit with a rather large set of graphemes ) , but for the fact that especially among the more common characters there are quite a few with pronunciation ambiguities which can only be resolved using lexical information . i am familiar with several of the recent books on writing systems : but while these typically contain in-depth analyses of particular systems , as far as i can tell , nobody has done a survey of this kind . ( if on the contrary , someone can point me to a survey that answers this question , i would be most grateful . ) so , i would be very interested in getting as much information related to this question on as many languages as people are sufficiently familiar with . i think i already know the answer to these questions for the more familiar western european languages ( including some less familiar ones like irish and welsh ) , as well as romanian , russian , hebrew , arabic , chinese , japanese and malagasy . i would be particularly interested in knowing about languages for which writing systems have only recently been developed , or for which the spelling system has recently undergone a massive restructuring : conventional wisdom has it that in such cases the writing system should be very phonemic , but perhaps that is not always true . please send any replies to me , and if there are a sufficient number i will post the results of this survey to the list . - richard sproat linguistics research department at&t bell laboratories | tel ( 908 ) 582-5296 600 mountain avenue , room 2d - 451 | fax ( 908 ) 582-7308 murray hill , nj 07974 , usa | rws @ research . att . com
