Date: Tue, 29 Jun 1999 13:10:51 +0200 From: Vaclav Hanzl To: j.wells@ucl.ac.uk, zdena.palkova@ff.cuni.cz, xbatusek@informatics.muni.cz Cc: jh@ff.cuni.cz Subject: Czech SAMPA - merged proposal Dear colleagues, thank you for all your effort devoted to Czech SAMPA. To make our information exchange more transparent, I created private www location http://noel.feld.cvut.cz/sampa/ It contains our relevant email exchanges, proposals, articles etc. If you miss something there, please let me know. Here is my comparison of all we have so far (I am using preliminary SAMPA as proposed in [hanzl1] - close to that proposed in [batusek1]). There seem to be no opinion conflicts regarding vowels: a e i o u a: e: i: o: u: (only [hanzl1] indicates very slight hesitation e / E and u / U ) and also no conflicts regarding these consonants: b d f g h\ x j k l m n k p r s S t v z Z t_s d_z t_S d_Z So far so good, now the hard part: Palatals t' d' n' --------------------- [palkova1] says: "They are palatals, not palatalised alveolars. ... Unfortunately, the IPA notation by separate letters [c], turned f, etc. is not possible in ASCII. ..." and she proposes t\ d\ n\. There is a small mistake - there IS a way to write [c], turned f, etc. - see [wells0]. So we have three possibilities: [batusek1] [hanzl1] [palkova1] [wells0] t' d' n' t\ d\ n\ c J\ J t' d' n' ... very intuitive for Czech people (similar to orthography) but slightly phonetically incorrect t\ d\ n\ ... also rather intuitive but quite new in SAMPA (thought in [wells0] these symbols are still free) c J\ J ... confusing for Czech people but phonetically exact and already defined in SAMPA (I personally do not insist on [hanzl1] here, SAMPA will confuse Czech people anyway so I would be happy with [wells0] solution.) Fricative trill /r'/ (orthography 'r^') --------------------------------------- [palkova1]: "In contrast to sonorant [r], the famous Czech fricative trill r^ is an alveolar obstruent and is used in two complementary distributed variants, voiced and voiceless. It is very different from palatalised [r] as e.g. in Russian. Our suggestion for transcription: r_r (voiced) and r_r0 (voiceless), or r\ (voiced) and r\0 (voiceless), (How to indicate the voiced/voiceless distinction in SAMPA, by the way?)" The last question is answered in [wells0]: voiceless _0 (0=figure) voiced _v and it also defines raised _r and also IPA inv. r r\ (sounds quite different from Czech r^) so we are out of luck here - all symbols proposed in [palkova1] are already used for something else in [wells0]. In [batusek2], /r_0/ and /r_r/ are mentioned as possible transcriptions of voiced and voiceless variants. Again, this does not match [wells0] at all. [batusek1] proposed /r'/ which is criticised in [palkova1] as phonetically incorrect. [hanzl1] extended this (phonetically incorrect) notation using voiced/voiceless symbols from [wells0]: r'_v and r'_0 (Personally I am at the end of my wits here. We might allocate a new symbol, for example P\ and precise it to P\_v and P\_0 when needed. Alternatively, we might allocate two new symbols, say P\ and Q\, and use it for voiced and voiceless variants. Anyway, if we Czechs are to elicit any additions to [wells0] there is nothing more appropriate than our famous fricative trills, right? Forigners are often confused by the existence of TWO different phones for 'r^' - while most Czechs do not notice there are two - so it might be good to use the two allophones in all transcriptions. The two allophones are also needed in speech recognition systems; without them it would be hard to cope with some gradual backward assimilations in connected speech, for example: vepr^ byl v e b P\ b i l a pig was where the voiced /b/ forces voiced variant of 'r^' (transcribed here as /P\/) and this voiced /P\/ forces the change of voiceless /p/ to voiced /b/. (Most Czechs would be very surprised by this /b/ which they pronounced but they really do pronounce it.) This is a normal way of Czech assimilations and omitting the two allophones (working with just one phoneme) would make this very hard to describe. So my opinion is: Let's allocate P\ for voiced 'r^' and Q\ for voiceless 'r^'.) Syllabic consonants ------------------- [batusek1] proposes obligatory marking of syllabic consonants: r= l= m= [hanzl1] proposes to make this optional only. [batusek2] still insists on /r=/ /l=/ /m=/ - "it might be hard for a forigner to recognize ... two syllables in words like hlta/ or krku" (Well, there are many hard things in our language. Forigner might as well think that there is a big difference between pronunciation of /r/ and /r=/ while there is nearly none. Speech recognition systems do better (I guess) when they use common model for /r/ and /r=/. So I might still insist on an optional /=/. Fortunately there are simple never failing (I hope) rules for changing the transcription without /=/ to transcription with /=/ (and vice versa of course) so I do not insist on [hanzl1]. I am much more pressed by time than by this subtle difference so I will accept any of the two solutions.) Sequences of two consonants similar to affricates ------------------------------------------------- [hanzl1] proposes rather nonstandard solution also involving indeterminacy (possible assimilation of a sequence to affricate) while [batusek1] even neglects the difference sequence/affricate. [batusek2] accepts using underscore to mark affricates but takes into question indeterminacy tricks of [hanzl1]. [palkova1] also proposes underscore to mark affricates. (My opinion: To converge to some quick solution, I propose to leave the indeterminacy out of the basic 'level 1' Czech SAMPA. So the disjunctor /-/ is never used, /ts/ means sequence of /t/ and /s/ and we do not care about possible assimilation (coalescence to affricate). Those worried by indeterminacy may use 'level 2' Czech SAMPA with indeterminacy solution based on [hanzl1].) Diphthongs and sequences of two vowels -------------------------------------- Situation is similar to that with affricates. [batusek1] neglects the difference, [hanzl1] proposes rather complex solution involving all three cases (diphthong, sequence, indeterminacy), [palkova1] touches the problem but does not propose a solution. [batusek1] proposes to consider also diphthongs /aj/, /ej/ etc. [hanzl1] proposes to (nearly) neglect them. (Very rough and simplistic solution would be to nearly neglect the problem and use no conjunctors (/_/), no disjunctors (/-/) and not to distinguish diphthongs and vowel sequences. However for /o_u/ and /a_u/ this would be really too simplistic, as demonstrated in [hanzl1]. So my modest proposal is to have obligatory conjunctor in diphthongs /o_u/ and /a_u/ in 'level 1' Czech SAMPA and not to care about anything else.) Definition of additional allophones ----------------------------------- [hanzl1] proposes definition of additional allophones /F/, /N/ and /G/ (in their [wells0] meaning) thought it proposes to make their use optional only. [batusek1] does not include them, [batusek2] proposes leave out /G/ and discuss further /F/ and /N/. [palkova1] suggests to use /N/; does not mention the others. (Proposal for 'level 1' consensus: obligatory /N/, leave out /F/ and /G/. They should however still be defined in 'level 2' thought not required.) ==================================================================== To summarise, after merging all the opinions available so far the basic Czech SAMPA might look like this: Vowels: ------- i mys^ miS mouse e les les forest a pas pas passport o rok rok year u kus kus piece i: pi/t pi:t to drink e: le/k le:k drug a: ra/d ra:t glad o: mo/da mo:da fashion u: pu^l pu:l half diphthongs: o_u mouka mo_uka flour a_u auto a_uto car Consonants: ----------- plosives: p pes pes dog b bota bota shoe t tam tam there d du/m du:m house c tito cito these J\ de^d J\et grandfather k krk kr=k neck g kde gde where affricates: t_s ci/l t_si:l aim d_z leckdy led_zgdi at times t_S c^as t_Sas time d_Z dz^ba/n d_Zba:n jug fricatives: f forma forma form v vak vak bag s sen sen dream z zub zup tooth P\ r^a/d P\a:t order Q\ tr^i tQ\i three S s^aty Sati clothes Z z^al Zal regret j jas jas brightness x chata xata cottage h\ had h\at snake liquids: r ret ret lip l led let ice nasals: m ma/k ma:k poppy n noc not_s night N banka baNka bank J nic Jit_s nothing syllabic versions of consonants: l= vlk vl=k wolf m= osm osm= eight r= krk kr=k neck -------------------------- This proposal requires the following additions to [wells0]: - allocation of P\ and Q\ for Czech voiced and voiceless fricative alveolar obstruent trills Extensions below would also require: - addition of suffix /_j/ to list of those reserved for diphthongs and affricates (currently the list contains: _s _S _z _Z _p _b _i _u _y) =========================================================== Those interested in more detailed transcription might use the following optional improovements: 1) additional allophones: PHONE ALLOPHONE OF ORTOGRAPHY TRANSCRIPTION MEANING F m tramvaj traFvaj tram G x abych byl abiGbil so as I am 2) additional diphthongs: e_u euforie e_uforie euphoria and maybe also diphthongs like /a_j/, /e_j/ etc. 3) better treatment of diphthongs and affricates (including indeterminacy) as proposed in [hanzl1] ============================================================ As always, I am looking forward for your comments. All of you have promissed more detailed comment on this subject, so there might be other opinion clashes not mentioned in my today's comparison. With the best wishes, (especially with the wish to have Czech SAMPA Really Soon Now,) Yours Sincerely Vaclav Hanzl