Subject: Sampa From: Zdena Palková To: "Vaclav Hanzl" Date: Mon, 30 Oct 2000 15:31:13 -0000 X-Mailer: Microsoft Outlook Express 5.00.2314.1300 Mily Vaclave, po delsi dobe ke mne znovu dorazil problem SAMPA. Kol. Horak z URE, nasi absolventi Hanika a Hesounova a p. Batusek z Brna zrejme vyrobili opet dalsi Sampu for Czech. Poslali ji prof Wellsovi. Vis o tom neco? Plati neco? Ing. Horak mi poslal e-mailem tabulku. Pokusim se Ti ji predat dale. Snad se mi to podari. Zatim srdecne zdravim. Zdena Palkova --------------------------------------------------------------------- Subject: Czech SAMPA revision From: Robert Batusek To: "prof. John Wells" cc: Sarka Simackova , Pavel Nygryn , Mikulas Pinos , Fonetici z Prahy -- Jiri Hanika , Betty Hesounova , Petr Horak Date: Tue, 19 Sep 2000 15:09:05 +0200 Dear Sir, I am sending you a revisited version of SAMPA for Czech. I would like to ask you to review it as well. If you will not have any serious remarks, it would be nice if you would publish the document on the SAMPA web (including link from the main SAMPA page). In the case of any problems with my proposal, I am ready to collaborate with you. Best Regards Robert ********************************************************************** Robert Batusek Ph. D. student xbatusek@fi.muni.cz Faculty of Informatics http://www.fi.muni.cz/~xbatusek Masaryk University Brno ********************************************************************** --------------------------------------------------------------------- Subject: Czech SAMPA From: hanzl To: j.wells@ucl.ac.uk, zdena.palkova@ff.cuni.cz, horak@ure.cas.cz, xbatusek@informatics.muni.cz, simackov@ffnw.upol.cz, nygryn@fi.muni.cz, pinos@fi.muni.cz, geo@cuni.cz, betty@ure.cas.cz, horak@ure.cas.cz, cernocky@urel.fee.vutbr.cz, pollak@fel.cvut.cz Date: Tue, 31 Oct 2000 12:30:37 +0100 X-Mailer: Mew version 1.94b25 on Emacs 20.4 / Mule 4.0 (HANANOEN) Dear professor Wells, dear colleagues, Let's try to decide universally acceptable solution for Czech SAMPA. After discussion in summer 1999, which seemed to resolve most problems (I think), I just got a copy of current Mr. Batusek's proposal, which again contains many problems criticised and resolved already in 1999. The 1999 agreement (I beleived) proposal was already used in Czech SpeechDat database (CD's in print, to bee soon distributed through ELRA, accepted by SpeechDat consortium and SPEX) and therefore is going to be widely used. I sent the proposal to prof. Wells but he was just leaving for Japan and had no time to review it. It would be very unfortunate to end up with two or more Czech SAMPA variants. Therefore I would like to ask Mr. Batusek to reconsider his proposal (and remind our discussion) and if possible help us to universally accept Czech SAMPA compatible with the SpeechDat Czech SAMPA. The 1999's discussion can be found at http://noel.feld.cvut.cz/sampa Here I repeat the most serious problems I see in Mr. Batusek's proposal. In general Mr. Batusek's proposal is designed to make it easy for use by Czech amateurs. This is not the aim of SAMPA, it should be good for international professionals. It should adhere to already existing design of prof. Wells and not to violate it in name of questionable ease of use (based on resemblance to Czech orthography). Some details: ts, dz, tS, dZ ... SAMPA using these would be ambiguous without spaces t', d', n' ... phonetically wrong, we should use standard solution from core design of prof. Wells r', R' ... phonetically nonsense aU, eU, oU ... contradicts definition of U in core SAMPA The whole Mr. Batusek's proposal can be senn at: http://noel.feld.cvut.cz/sampa/batusek4-english.html In general, I thank to Mr. Batusek for initiative and HTML formatting, but regarding the SAMPA definition details I would urge him to read carefully paper of prof. Wells: http://noel.feld.cvut.cz/sampa/wells0-english.ps and consider the SAMPA goals and and their design implications. The SpeechDat Czech SAMPA solution can be found at http://noel.feld.cvut.cz/sampa/speechdat-sampa.ps Copy of my email to prof. Wells including this definition: http://noel.feld.cvut.cz/sampa/hanzl5-english.txt I include it in this message as well. I agree that for Czech people it is not so so nice (at the first thought) as Mr. Batusek's proposal, but I repeat that similarity to national orthography is not the aim of SAMPA. The aim of SAMPA (as I understand it) is to make international collaboration easier. Please help us go this way. Here I repeat the core of my email containing Speechdat Czech SAMPA design, sent to prof. Wells this summer: ------------------------------------- From: hanzl@noel.feld.cvut.cz To: John Wells Date: Mon, 12 Jun 2000 19:39:41 +0200 ... The final proposal was designed with great care for the overall all-language SAMPA desing, so its inclusion should be smooth and free of conflicts. Main differences from the old proposal (and probably most interesting points for you) are as follows: - final proposal prefers international symbols where available (and gives up questionable similarity with Czech orthography), e.g.: c J\ J instead of old t' d' n' - makes Czech SAMPA uniquely parsable even without spaces: t_s d_z t_S d_Z instead of old ts dz tS dZ - proposes new symbols for voiced fricative trill (r with caron) which is unique for Czech: instead of many variants proposed, use P\ and possibly for the voiceless allophone Q\ This is the point which I think really needs your approval. However symbols "\P" and "\Q" are free now, there is no clash. On the other hand, every other solution proposed for SAMPA coding of the Czech fricative trill was blatantly phonetically misleading and/or in conflict with global symbol meanings. Allocation of new symbols really seems to be the only solution. (The other global change would be addition of suffix /_j/ to list of those reserved for diphthongs and affricates (currently the list contains: _s _S _z _Z _p _b _i _u _y) - I think this is minor and there is also no clash.) Dear professor Wells, I hope you will find time to review the final proposal and if possible make it the official Czech SAMPA definition. (I can help with creating the replacement web pages if you like.) With the Best Regards Vaclav Hanzl ------------------------------------------------------------------- FINAL CZECH SAMPA PROPOSAL ------------------------------------------------------------------- The Czech orthography (second column) is represented like this: ^ after u indicates ring accent (krouzek) ^ after other characters indicates wedge accent (caron, hacek) / after character indicates acute accent (carka) Vowels: ------- i mys^ miS mouse e les les forest a pas pas passport o rok rok year u kus kus piece i: pi/t pi:t to drink e: le/k le:k drug a: ra/d ra:t glad o: mo/da mo:da fashion u: pu^l pu:l half diphthongs: o_u mouka mo_uka flour a_u auto a_uto car Consonants: ----------- plosives: p pes pes dog b bota bota shoe t tam tam there d du/m du:m house c tito cito these J\ de^d J\et grandfather k krk kr=k neck g kde gde where affricates: t_s ci/l t_si:l aim d_z leckdy led_zgdi at times t_S c^as t_Sas time d_Z dz^ba/n d_Zba:n jug fricatives: f forma forma form v vak vak bag s sen sen dream z zub zup tooth P\ r^a/d P\a:t order S s^aty Sati clothes Z z^al Zal regret j jas jas brightness x chata xata cottage h\ had h\at snake liquids: r ret ret lip l led let ice nasals: m ma/k ma:k poppy n noc not_s night N banka baNka bank J nic Jit_s nothing ============================================================== The symbols describel below are not required in the Czech SAMPA transcription. It is however likely that those interested in more detailed transcription would define and use their own symbols for the phenomena mentioned below, so we rather define a common symbols here: 1) syllabic versions of consonants (thought the sound is the same): l= vlk vl=k wolf m= osm osm= eight r= krk kr=k neck 2) additional allophones: PHONE ALLOPHONE OF ORTOGRAPHY TRANSCRIPTION MEANING F m tramvaj traFvaj tram G x abych byl abiGbil so as I am Q\ P\ tr^i tQ\i three 3) additional diphthongs: e_u euforie e_uforie euphoria and maybe also diphthongs like /a_j/, /e_j/ etc. ============================================================== Reasoning and explanation can be found in http://noel.feld.cvut.cz/sampa/hanzl4-english.txt and all the discussions are in http://noel.feld.cvut.cz/sampa ... -------- end of repeated email from this summer --------- I am looking forward for your opinions. Best Regards Vaclav Hanzl +-----------------------------------------------------------------------+ | Czech Technical University in Prague fax: (+420 2) 243 10 784 | | Faculty of Electrical Engineering, K331 or (+420 2) 311 1786 | | Technicka 2 | | 166 27 Prague 6, Czech Republic email: hanzl@noel.feld.cvut.cz | | http://amber.feld.cvut.cz/user/Hanzl | +-----------------------------------------------------------------------+ --------------------------------------------------------------------- Subject: Re: Sampa From: hanzl To: zdena.palkova@ff.cuni.cz Date: Tue, 31 Oct 2000 12:51:18 +0100 X-Mailer: Mew version 1.94b25 on Emacs 20.4 / Mule 4.0 (HANANOEN) Mila Zdeno, dekuji za upozorneni. Nevim proc pan Batusek stale trva na nekterych nesmyslnych navrzich - doufam, ze to nikdo v jeho podobe nezacal pouzivat (zeptam se pana Horaka na stav veci). Srdecne zdravi Vaclav Hanzl > Subject: Sampa > From: Zdena Palkova > To: "Vaclav Hanzl" > Date: Mon, 30 Oct 2000 15:31:13 -0000 > > Mily Vaclave, > po delsi dobe ke mne znovu dorazil problem SAMPA. Kol. Horak z URE, > nasi absolventi Hanika a Hesounova a p. Batusek z Brna zrejme vyrobili > opet dalsi Sampu for Czech. Poslali ji prof Wellsovi. Vis o tom neco? > Plati neco? Ing. Horak mi poslal e-mailem tabulku. Pokusim se Ti ji > predat dale. Snad se mi to podari. Zatim srdecne zdravim. Zdena > Palkova > --------------------------------------------------------------------- Subject: Czech SAMPA - dotaz From: hanzl To: horak@ure.cas.cz Cc: zdena.palkova@ff.cuni.cz, geo@cuni.cz, betty@ure.cas.cz Date: Tue, 31 Oct 2000 12:56:28 +0100 X-Mailer: Mew version 1.94b25 on Emacs 20.4 / Mule 4.0 (HANANOEN) Vazena kolegyne a vazeni kolegove, muzete mi prosim sdelit, zda nekdo k necemu pouziva navrh SAMPy od pana Batuska? Mam k nemu velmi zasadni pripominky (viz muj dlouhy mail), tak doufam, ze se to jeste da zachranit. S pozdravem Vaclav Hanzl +-----------------------------------------------------------------------+ | Czech Technical University in Prague fax: (+420 2) 243 10 784 | | Faculty of Electrical Engineering, K331 or (+420 2) 311 1786 | | Technicka 2 | | 166 27 Prague 6, Czech Republic email: hanzl@noel.feld.cvut.cz | | http://amber.feld.cvut.cz/user/Hanzl | +-----------------------------------------------------------------------+ --------------------------------------------------------------------- Subject: Re: Czech SAMPA From: Robert Batusek To: hanzl@noel.feld.cvut.cz Date: Thu, 2 Nov 2000 13:25:24 +0100 Dear Sir, I am sorry, if I am making troubles to you. As there is still no link from the main SAMPA page to Czech SAMPA and there is still my proposal at the SAMPA WWW pages I supposed that prof. Wells did not agree with your proposal. The group of people who were present in September at the TSD conference thus created a new SAMPA proposal based on my original paper. Again, I want to emphasize that I accept _any_ solution. If you have already used SAMPA in the SpeechDat project, I think that the best solution now is to use your version. Best Regards Robert Batusek P.S.: Jedinym, ovsem zasadnim, problemem tedy zustava, jak prinutit prof. Wellse, aby tu SAMPu zverejnil. Zda se, ze ani soustredeny natlak nekolika lidi nepomaha. Pokud Vas napada nejaky zpusob, jak to zaridit, muzete jej aplikovat. Priznam se, ze me uz v tomto smeru nenapada nic. :-((( Dalsi moznost je vytvorit stranku ceske SAMPy nekde u nas doma a dat vedet vsem, kdo se o to zajimaji. Toto reseni ma ovsem zrejme nevyhody. :-( ********************************************************************** Robert Batusek Ph. D. student xbatusek@fi.muni.cz Faculty of Informatics http://www.fi.muni.cz/~xbatusek Masaryk University Brno ********************************************************************** --------------------------------------------------------------------- Subject: Re: Czech SAMPA From: Jirka Hanika To: hanzl@noel.feld.cvut.cz, j.wells@ucl.ac.uk, zdena.palkova@ff.cuni.cz, horak@ure.cas.cz, xbatusek@informatics.muni.cz, simackov@ffnw.upol.cz, nygryn@fi.muni.cz, pinos@fi.muni.cz, geo@cuni.cz, betty@ure.cas.cz, cernocky@urel.fee.vutbr.cz, pollak@fel.cvut.cz Date: Thu, 2 Nov 2000 15:02:07 +0100 Reply-To: geo@cuni.cz User-Agent: Mutt/1.2i Dear colleagues, > Here I repeat the most serious problems I see in Mr. Batusek's > proposal. Yes, let us discuss them. As far as I am concerned, I'm going to accept any consensus. I am also not expert in SAMPA. > In general Mr. Batusek's proposal is designed to make it easy for use > by Czech amateurs. This is not the aim of SAMPA, it should be good for > international professionals. I don't think SAMPA makes very good sense unless being good for 1) international professionals who know the specific language 2) if possible international amateurs who know the specific language If you are interested in people who don't know the specific language (here, Czech) but not in Czech amateurs, I'd personally recommend unicode -encoded IPA for applications instead. > It should adhere to already existing > design of prof. Wells and not to violate it in name of questionable > ease of use (based on resemblance to Czech orthography). Agreed. But I don't think it does. > Some details: > ts, dz, tS, dZ ... SAMPA using these would be ambiguous without spaces No, you have to use t-S etc. for t + S like in Russian, if I understand the proposal correctly. But suggest something else. > t', d', n' ... phonetically wrong, we should use standard > solution from core design of prof. Wells If SAMPA is completely unreadable ASCII-wise, what is its advantage over IPA? Czech "l" is very different from English "l" (pronounced more forwards) and you don't seem to object to using the same character. But your point is valid, if applied consistently (to most Czech consonants and vowels). Do you really want SAMPA characters to have completely language-independent pronunciation? Then you'll have to revisit existing German and French SAMPA severely. > r', R' ... phonetically nonsense I agree here completely. But I don't think SAMPA gives extra choices. What do you suggest? > aU, eU, oU ... contradicts definition of U in core SAMPA What's the difference between Czech aUto and German aUto? Jirka Hanika --------------------------------------------------------------------- Subject: Re: Czech SAMPA - dotaz From: Jirka Hanika To: hanzl@noel.feld.cvut.cz Cc: zdena.palkova@ff.cuni.cz, horak@ure.cas.cz, betty@ure.cas.cz, geo@ff.cuni.cz Date: Thu, 2 Nov 2000 15:09:44 +0100 Reply-To: geo@cuni.cz User-Agent: Mutt/1.2i > Vazena kolegyne a vazeni kolegove, > > muzete mi prosim sdelit, zda nekdo k necemu pouziva navrh SAMPy od > pana Batuska? Ne, ale pokud nekdo nepremluvi projekt MBROLA, aby prestal pouzivat SAMPU, tak dost potrebujeme, aby NEJAKA SAMPA pro cestinu vznikla. Jakousi SAMPu pouzivaji ceske difonove inventare, ktere asi pred rokem vznikly v Brne a jsou na webu Universite de Mons ke stazeni. Druhy z nich jde dokonce poslouchat. Asi je to trochu starsi verze te Batuskovy. O jinem navrhu nevim. > Mam k nemu velmi zasadni pripominky (viz muj dlouhy mail), tak doufam, > ze se to jeste da zachranit. Mne naopak pripada rozumny, ale tim netvrdim, ze musi vypadat zrovna takhle. Jestli jste sepsal vlastni navrh, rad se na nej podivam. Podle meho je SAMPA kompromis "by design", stejne jako musi byt jakakoli foneticka transkripce. Mejte se dobre, Jirka Hanika --------------------------------------------------------------------- Subject: Re: Czech SAMPA From: Robert Batusek To: hanzl@noel.feld.cvut.cz Date: Thu, 2 Nov 2000 17:29:43 +0100 On Thu, 2 Nov 2000 hanzl@noel.feld.cvut.cz wrote: > Vazeny pane Batusku, > > omlouvam se za ponekud ostry ton meho emailu, nejak jsem nemohl unest > predstavu dalsich nejasnosti. To nic, chapu, ze Vas to rozcililo, ale my jsme vazne jenom chteli dotlacit tu SAMPu ke zverejneni. > S profesorem Wellsem je to opravdu tezke, pokud jej ani ted nepohneme > k cinu (az se jak doufam sjednotime v Cechach), asi budeme muset > vytvorit stranku v Cechach. No, on to mozna nebude az takovy problem. Pokud to budou akceptovat vsichni, kdo jsou na tomto mailing listu a propagovat to dal, tak se to podle me ujme. Ale vyckejme, treba se to jeste podari 'oficialni' cestou. S pozdravem Robert Batusek ********************************************************************** Robert Batusek Ph. D. student xbatusek@fi.muni.cz Faculty of Informatics http://www.fi.muni.cz/~xbatusek Masaryk University Brno ********************************************************************** --------------------------------------------------------------------- Subject: Re: Czech SAMPA From: hanzl To: geo@cuni.cz, geo@math.cas.cz, j.wells@ucl.ac.uk, zdena.palkova@ff.cuni.cz, horak@ure.cas.cz, xbatusek@informatics.muni.cz, simackov@ffnw.upol.cz, nygryn@fi.muni.cz, pinos@fi.muni.cz, betty@ure.cas.cz, cernocky@urel.fee.vutbr.cz, pollak@fel.cvut.cz Date: Thu, 02 Nov 2000 18:07:36 +0100 X-Mailer: Mew version 1.94b25 on Emacs 20.4 / Mule 4.0 (HANANOEN) Dear colleagues, First of all, please accept my apologies for rather agressive tone of my previous emails. I had no intention to attack Mr. Batusek and his co-workers from TSD conference. I was too scared by possible chaos resulting from divergence of our ideas. Now I cooled down. Please excuse me, let's work now. It would be great to have Czech SAMPA linked from the main page of professor Wells. Otherwise we might encounter even more different versions of Czech SAMPA being developed by other groups. Let's try to unify our design, then it will be easier for prof. Wells to fully accept it. I would also appreciate any feedback from prof. Wells himself - many times I proclaimed conformance of our design with the overall SAMPA design; prof. Wells is the one to either support or reject my claims of this type. Now, however selfish I might sound, I urge you: Please, please, if by the chance you still do not have large programs and dictionaries using other versions of Czech SAMPA, please consider using Czech SpeechDat SAMPA as the core of any desing you use. We spent many days with the design, trying to have good reasons for any subtle detail. We tried to discuss it widely and we might have omitted some of you; sorry if we did. But now the SpeechDat database is a matter of fact - please do not diverge from it if you do not have to. The problems in Czech SAMPA design are rooted already in rather poor match of IPA capabilities and Czech needs. It is bound to be awkward at first. But, the more we converge with the rest of the world, the easier it is for foreigners to use Czech SAMPA - and, once we mastered Czech SAMPA designed this way - , the easier it is for us to use foreign SAMPAs. A few answers: -- Mr. Hanika wrote: >> ts, dz, tS, dZ ... SAMPA using these would be ambiguous without spaces > >No, you have to use t-S etc. for t + S like in Russian, if I understand >the proposal correctly. But suggest something else. I suggested t_s, d_z etc. - e.g. use of conjunctor "_" rather than disjunctor "-". It is longer, but you do not have to add context-dependent disjunctor whenever the two phones happen to follow each other. Every phone has a string corresponding to it and words are formed by just concatenating these strings, this important property makes machine processing much easier I beleive. > Do you really want SAMPA characters to have completely > language-independent pronunciation? Then you'll have > to revisit existing German and French SAMPA severely. Of course not. But we can converge less or more. "n'" converges less. "J" converges more. >> r', R' ... phonetically nonsense > >I agree here completely. But I don't think SAMPA gives extra choices. >What do you suggest? For this only case I suggest allocation of new symbols "P\" and "Q\" (as you might have seen later on in my previous email). > If SAMPA is completely unreadable ASCII-wise, what is its advantage > over IPA? Standard ASCII encoding I would say, but it is not unreadable either I beleive. Here is a random excerpt from SpeechDat pronunciation lexicon, try it. (I removed accents in the first column to avoid any email problems, lexicon is in ISO8859-2, you will guess hacky and carky I hope, second column is frequency. There are spaces between phones in standard lexicon format, but they are not necessary to parse it.): dukazum 3 d u: k a z u: m dukazy 5 d u: k a z i dukla 2 d u k l a dukladne 1 d u: k l a d n e: dukladne 1 d u: k l a d J e dukladnym 2 d u: k l a d n i: m dulezite 22 d u: l e Z i t e: dulezitejsi 3 d u: l e Z i c e j S i: dulezitou 3 d u: l e Z i t o_u dulezity 13 d u: l e Z i t i: dulezitych 3 d u: l e Z i t i: x dulezitym 2 d u: l e Z i t i: m dulni 6 d u: l J i: dum 12 d u: m dunajska 1 d u n a j s k a: dunivy 1 d u J i v i: duraz 2 d u: r a s durazne 1 d u: r a z J e durdzinovy 1 d u r d_z i n o v i dusledek 6 d u: s l e d e k dusledky 10 d u: s l e t k i dusledneji 1 d u: s l e d J e j i dusseldorf 4 d i s l d o r f d i: z l d o r f dustojne 1 d u: s t o j J e dustojnik 1 d u: s t o j J i: k dustojniky 4 d u: s t o j J i: k i dusan 10 d u S a n dusanu 3 d u S a n u duse 2 d u S e dusek 10 d u S e k dusi 2 d u S i duskova 5 d u S k o v a: dusoval 1 d u S o v a l dutiny 5 d u c i n i duverou 2 d u: v j e r o_u duveru 1 d u: v j e r u duvod 17 d u: v o t duvodem 19 d u: v o d e m duvodu 6 d u: v o d u duvodu 17 d u: v o d u: duvodum 1 d u: v o d u: m duvody 4 d u: v o d i dva 4778 d v a dvaadvacateho 19 d v a a d v a t_s a: t e: h\ o dvaadvacaty 17 d v a a d v a t_s a: t i: dvacateho 383 d v a t_s a: t e: h\ o dvacatej 1 d v a t_s a: t e j dvacatou 2 d v a t_s a: t o_u I am looking forward for your opinions. Best Regards Vaclav Hanzl --------------------------------------------------------------------- Subject: Re: Czech SAMPA - dotaz From: hanzl To: geo@cuni.cz, geo@math.cas.cz Date: Thu, 02 Nov 2000 18:20:24 +0100 X-Mailer: Mew version 1.94b25 on Emacs 20.4 / Mule 4.0 (HANANOEN) Vazeny pane Haniko, Dekuji za odpoved. Mrzi mne, ze jsem nevedel o navrhu, ktery vznikl na TSD (a ze jeho tvurci zrejme nevedeli o nasem). > Jestli jste sepsal vlastni navrh, rad se na nej podivam. Byl obsazen v mem emailu, prosim podivejte se na nej a poslete mi svuj nazor. Je take tady: http://noel.feld.cvut.cz/sampa/speechdat-sampa.ps http://noel.feld.cvut.cz/sampa/hanzl5-english.txt a pripojuji ho i k tomuto emailu. S pozdravem Vaclav Hanzl ------------------------------------------------------------------- CZECH SAMPA PROPOSAL ------------------------------------------------------------------- The Czech orthography (second column) is represented like this: ^ after u indicates ring accent (krouzek) ^ after other characters indicates wedge accent (caron, hacek) / after character indicates acute accent (carka) Vowels: ------- i mys^ miS mouse e les les forest a pas pas passport o rok rok year u kus kus piece i: pi/t pi:t to drink e: le/k le:k drug a: ra/d ra:t glad o: mo/da mo:da fashion u: pu^l pu:l half diphthongs: o_u mouka mo_uka flour a_u auto a_uto car Consonants: ----------- plosives: p pes pes dog b bota bota shoe t tam tam there d du/m du:m house c tito cito these J\ de^d J\et grandfather k krk kr=k neck g kde gde where affricates: t_s ci/l t_si:l aim d_z leckdy led_zgdi at times t_S c^as t_Sas time d_Z dz^ba/n d_Zba:n jug fricatives: f forma forma form v vak vak bag s sen sen dream z zub zup tooth P\ r^a/d P\a:t order S s^aty Sati clothes Z z^al Zal regret j jas jas brightness x chata xata cottage h\ had h\at snake liquids: r ret ret lip l led let ice nasals: m ma/k ma:k poppy n noc not_s night N banka baNka bank J nic Jit_s nothing ============================================================== The symbols describel below are not required in the Czech SAMPA transcription. It is however likely that those interested in more detailed transcription would define and use their own symbols for the phenomena mentioned below, so we rather define a common symbols here: 1) syllabic versions of consonants (thought the sound is the same): l= vlk vl=k wolf m= osm osm= eight r= krk kr=k neck 2) additional allophones: PHONE ALLOPHONE OF ORTOGRAPHY TRANSCRIPTION MEANING F m tramvaj traFvaj tram G x abych byl abiGbil so as I am Q\ P\ tr^i tQ\i three 3) additional diphthongs: e_u euforie e_uforie euphoria and maybe also diphthongs like /a_j/, /e_j/ etc. ============================================================== --------------------------------------------------------------------- Subject: Re: Czech SAMPA From: Jirka Hanika To: hanzl@noel.feld.cvut.cz Cc: geo@cuni.cz, j.wells@ucl.ac.uk, zdena.palkova@ff.cuni.cz, horak@ure.cas.cz, xbatusek@informatics.muni.cz, simackov@ffnw.upol.cz, nygryn@fi.muni.cz, pinos@fi.muni.cz, betty@ure.cas.cz, cernocky@urel.fee.vutbr.cz, pollak@fel.cvut.cz Date: Fri, 3 Nov 2000 00:04:48 +0100 Reply-To: geo@cuni.cz User-Agent: Mutt/1.2i Dear colleagues, I'm very sorry that I somehow missed the complete proposal in the previous mail by Vaclav Hanzl, through an inexplicable slip of eye. Otherwise I would comment on it at that time and I would understand the comment better in a few places. > It would be great to have Czech SAMPA linked from the main page of > professor Wells. Otherwise we might encounter even more different > versions of Czech SAMPA being developed by other groups. Let's try to > unify our design, then it will be easier for prof. Wells to fully > accept it. Yes, we all agree this is necessary. > Now, however selfish I might sound, I urge you: Please, please, if by > the chance you still do not have large programs and dictionaries using > other versions of Czech SAMPA, please consider using Czech SpeechDat > SAMPA as the core of any desing you use. We spent many days with the > design, trying to have good reasons for any subtle detail. We tried to > discuss it widely and we might have omitted some of you; sorry if we > did. But now the SpeechDat database is a matter of fact - please do > not diverge from it if you do not have to. So are the Brno TTS inventories (the only existing MBROLA voices for Czech). Converting them to a different encoding may be even less trivial than converting the huge database of SpeechDat, as their MBROLA encoded versions may be somewhat out of the authors' reach. But please let's be open in covergence, though it is non-trivial. > I suggested t_s, d_z etc. - e.g. use of conjunctor "_" rather than > disjunctor "-". It is longer, but you do not have to add > context-dependent disjunctor whenever the two phones happen to follow > each other. Every phone has a string corresponding to it and words are > formed by just concatenating these strings, this important property > makes machine processing much easier I beleive. Yes, I agree. I prefer your solution (but Robert Batusek's one is consistent, too). > > Do you really want SAMPA characters to have completely > > language-independent pronunciation? Then you'll have > > to revisit existing German and French SAMPA severely. > > Of course not. But we can converge less or more. "n'" converges > less. "J" converges more. OK. I thought you meant some three character ugliness. I agree J is a closer match and potentially as readable as n' > >> r', R' ... phonetically nonsense > > > >I agree here completely. But I don't think SAMPA gives extra choices. > >What do you suggest? > > For this only case I suggest allocation of new symbols "P\" and "Q\" > (as you might have seen later on in my previous email). I'm not sure I like new symbols, but it is a possible solution. I'm aware IPA fails here, too. > > If SAMPA is completely unreadable ASCII-wise, what is its advantage > > over IPA? Here I again thought you want to use three character stuff extensively. I don't have this problem with your suggested (but again, not with the other one). Maybe I prefer your one a little bit over the TSD one, even though I participated at its finalization. Jirka Hanika --------------------------------------------------------------------- Subject: Re: Czech SAMPA - dotaz From: Jirka Hanika To: hanzl@noel.feld.cvut.cz Date: Fri, 3 Nov 2000 00:08:01 +0100 Reply-To: geo@cuni.cz User-Agent: Mutt/1.2i Dobry den, moc se omlouvam, ale prave odjizdim z Cech, takze se na Vas navrh podrobne podivam az po vikendu. Zatim jen - podle nemciny bych uprednostnil aU pred a_u. Na prvni pohled jsem si nevsiml niceho, co by mi prislo spatne. A prekodovavat kvuli tomu databazi je mozna trochu moc prace. Ale bude zalezet i na nazoru Brnaku. Mejte se dobre, Jirka Hanika --------------------------------------------------------------------- Subject: Re: Czech SAMPA From: hanzl@noel.feld.cvut.cz To: geo@cuni.cz, geo@math.cas.cz Cc: hanzl@noel.feld.cvut.cz, j.wells@ucl.ac.uk, zdena.palkova@ff.cuni.cz, horak@ure.cas.cz, xbatusek@informatics.muni.cz, simackov@ffnw.upol.cz, nygryn@fi.muni.cz, pinos@fi.muni.cz, betty@ure.cas.cz, cernocky@urel.fee.vutbr.cz, pollak@fel.cvut.cz Date: Fri, 03 Nov 2000 18:43:55 +0100 X-Mailer: Mew version 1.94b25 on Emacs 20.4 / Mule 4.0 (HANANOEN) Thanks to Mr. Hanika for his comments. I just have one explanation: >> I suggested t_s, d_z etc. - e.g. use of conjunctor "_" rather than >> disjunctor "-". It is longer, but you do not have to add >> context-dependent disjunctor whenever the two phones happen to follow >> each other. Every phone has a string corresponding to it and words are >> formed by just concatenating these strings, this important property >> makes machine processing much easier I beleive. > >Yes, I agree. I prefer your solution (but Robert Batusek's one >is consistent, too). I am afraid Robert Batusek's one is not consistent. Different things end up with identical transcriptions unless you separate phones by spaces (ra:tse=ra:tse, vjetsem=vjetsem etc.): ORTHOGRAPHY SAMPA MEANING ra/ce r a: ts e a breed ra/d se r a: t s e he likes klacku k l a ts k u a bludgeon (genitive, singular) Kladsku k l a t s k u region Kladsko (local case, singular) ve^cem v j e ts e m a thing (dative, plural) vjet sem v j e t s e m to drive here inside r^ic^i/ r' i tS i: he yells r^its^i/ r' i t S i: more sparse poc^i/t p o tS i: t to begin pods^i/t p o t S i: t to line (to sew) This was the problem which triggered our effort to design it better in 1999. (See http://noel.feld.cvut.cz/sampa/hanzl1-english.txt for detailed description, our SAMPA proposal contained there is far from the final one but the reasoning holds.) Regards Vaclav Hanzl --------------------------------------------------------------------- Subject: Re: Czech SAMPA From: Jirka Hanika To: hanzl@noel.feld.cvut.cz Date: Sat, 4 Nov 2000 23:58:59 +0100 Reply-To: geo@cuni.cz User-Agent: Mutt/1.2i > >> I suggested t_s, d_z etc. - e.g. use of conjunctor "_" rather than > >> disjunctor "-". It is longer, but you do not have to add > >> context-dependent disjunctor whenever the two phones happen to follow > >> each other. Every phone has a string corresponding to it and words are > >> formed by just concatenating these strings, this important property > >> makes machine processing much easier I beleive. > > > >Yes, I agree. I prefer your solution (but Robert Batusek's one > >is consistent, too). > > I am afraid Robert Batusek's one is not consistent. Different things > end up with identical transcriptions unless you separate phones by > spaces (ra:tse=ra:tse, vjetsem=vjetsem etc.): Uz jsem myslel, ze si rozumime, ale nebyla to bohuzel pravda. Povinny disjunktor je samozrejme stejne jako jednoznacny jako konjunktor. Vsechny Vase priklady skonci s ruznymi transkripcemi. > > ORTHOGRAPHY SAMPA MEANING > > ra/ce r a: ts e a breed > ra/d se r a: t s e he likes Bez mezer: ra:tse, ra:t-se. Ted tedy muzete navazat svou (racionalni) reakci ocitovanou na zacatku tohoto mailu a mohli bychom se takto opakovat ad libitum, takze uz tim ostatni adresaty nebudu obtezovat a pockam si na dalsi vyvoj. Jirka Hanika --------------------------------------------------------------------- Subject: Re: Czech SAMPA From: hanzl@noel.feld.cvut.cz To: geo@cuni.cz, geo@math.cas.cz Cc: hanzl@noel.feld.cvut.cz Date: Sun, 05 Nov 2000 12:09:31 +0100 X-Mailer: Mew version 1.94b25 on Emacs 20.4 / Mule 4.0 (HANANOEN) > Uz jsem myslel, ze si rozumime, ale nebyla to bohuzel pravda. > Povinny disjunktor je samozrejme stejne jako jednoznacny jako > konjunktor. Vsechny Vase priklady skonci s ruznymi transkripcemi. OK, ale o disjunktorech neni v navrhu pana Batuska ani zminka. > >> I suggested t_s, d_z etc. - e.g. use of conjunctor "_" rather than > >> disjunctor "-". It is longer, but you do not have to add Tady jsem citoval sve uvahy o moznych resenich - explicitne pridat do navrhu pana Batuska disjunktory ci konjunktory, aby to bylo zapisovano pokud mozno vzdy stejne, a ne aby to kazdy zacal delat jinak, a teprve v okamziku, kdy problem nahodou objevi. Vaclav Hanzl --------------------------------------------------------------------- Subject: Re: Czech SAMPA From: Jirka Hanika To: hanzl@noel.feld.cvut.cz Date: Mon, 6 Nov 2000 20:48:20 +0100 Reply-To: geo@cuni.cz User-Agent: Mutt/1.2i > > Uz jsem myslel, ze si rozumime, ale nebyla to bohuzel pravda. > > Povinny disjunktor je samozrejme stejne jako jednoznacny jako > > konjunktor. Vsechny Vase priklady skonci s ruznymi transkripcemi. > > OK, ale o disjunktorech neni v navrhu pana Batuska ani zminka. To je pravda. Ja jsem to chapal jako dusledek definice disjunktoru a existence ts (asi jsem byl trochu pod vlivem uvah typu Harris: Structural Linguistics, kapitola Junctures). Na TSD se o t_s vs. t-s mluvilo a rustinu tam nekdo citoval jako dobrou analogii, takze mi to trochu moc samo od sebe vyvstavalo. Kdybych k tomu navrhu prisel uplne zvenku, urcite bych taky udelal chybu ([pjetset]). > > >> I suggested t_s, d_z etc. - e.g. use of conjunctor "_" rather than > > >> disjunctor "-". It is longer, but you do not have to add > > Tady jsem citoval sve uvahy o moznych resenich - explicitne pridat do > navrhu pana Batuska disjunktory ci konjunktory, aby to bylo zapisovano > pokud mozno vzdy stejne, a ne aby to kazdy zacal delat jinak, a teprve > v okamziku, kdy problem nahodou objevi. Aha. Ja jsem si myslel, ze reagujete na navrh pana Batuska (v mem vysepsanem chapani), proto jsem reagoval tak nechapave. Inu, ocekavam reakci pana Batuska, ze se vsim souhlasi, ponevadz ani jemu nejde, pokud muzu soudit, o nic, nez aby se dalo pracovat. Jirka Hanika