This slideshow was presented on the 7th Czech-German Speech Processing Workshop in Prague,
Czech Republic, September 8-10, 1997.
Supercomputing on PC Clusters in Continuous Speech Recognition
Vaclav Hanzl, FEE CTU Prague,
This contribution is about:
Brief top hardware history:
- Hardware performance
- Money spent for it
- Revolution caused by commodity market and free Internet
Brief history of free software:
- Supercomputing centers used to be top tech
- But workstations had bigger market and quickly developed to be better
- But PCs have even bigger market and now they jeopardize workstations - of course,
no WS manufacturer is going to tell you...
- Long time it was just TeX and emacs - but it existed
- X-windows was free as a company consensus
- GNU free UNIX tools was finally completed by UNIX kernel - LINUX
- Free Internet (ftp, www) let it spread all over
- Even software companies produce lots of free software now
- The whole operating system with many applications is readily available
with program sources to everybody
Who Is Using Linux PC Clusters?
and what are they doing with them?
Goddard Space Flight Center (Beowulf)
- Los Alamos National Laboratory (Loki)
- Drexel University
- CACR at Caltech (Hyglac and Naegling)
- High Energy Physics lab in Germany (Hermes)
Laboratory, SCL (Alice)
- Sandia National Laboratory, Livermore, California (DAISy and Megalon)
University - physics (Brahma)
- Clemson University (Grendel)
- of course we at FEE CTU Prague (Magi)
- and many others...
And is also used quite well in
- Physics (high energy simulations)
- Cosmology (N-body system simulations)
- Seismic Data Processing
- Photorealistic Rendering (raytracing)
- Computer Science (pure research interest)
- Continuous Speech Recognition Research (and some applications)
- Generally any research environment, where huge CPU power and
data storage are more important than standard support from SW & HW vendor
Hardware Resources Needed in Continuous Speech Recognition Research
What we need, alias what we miss usually:
-> Diskspace For Corpora
Examples of LDC corpora sizes (1 CD ~ 600-650 MB):
- Wall Street Journal 0: 15 CD
- Wall Street Journal 1: 34 CD
- SWITCHBOARD-1: 23 CD
- Cambridge Read News: 6 CD
-> CPU Power for Long-Running Experiments
In working CSR laboratory, we can usually find some background processes running for
days and weeks...
Or, generally speaking,
- Hidden Markov Models training (HTK toolkit)
- Matlab process digging on some unknown task...
- Neural Net training - maybe using own C program
All this can be split into parallel processes on the level of shell programming!!!
- Iterative training, each step on many database items
- Parameter tuning - same experiments with different constant
Performance/Price - Workstations vs Linux PCs
- Very vague quantity - what is price and what is performance???
- All manufacturers claim they have the best performance/price
CPU Power - SPECint95 & fp95 per US $10^6
|HP VISUALIZE B160L,32MB,2GB,17",Sep30/96
|HP VISUALIZE B132L,32MB,2GB,17",Sep30/96 *
|HP VISUALIZE J280,64MB,2GB,Sep30/96
|HP J210XC 1proc.,Jan20/97
|HP J210XC 2proc.,Jan20/97
|Sun Ultra 1/140,Jan20/97
|SG O2 R5000 SC,64MB,4GB,20"c
|SG O2 R5000 SC,32MB,1GB,17" *
|our Linux PC cluster,May/97
Additional RAM price 5.6 times cheaper for PCs than for HP (fall of '96)
Same Price Workstation Compared:
How many times less gives you HP J210XC, 64MB RAM, 2GB HD:
Linux PC cluster offers at least three times more for the same money.
Software for Linux PC Cluster
Free Software (all with source code):
- Linux distribution - full UNIX for individual nodes (RedHat)
- system kernel, GNU UNIX utilities, emacs editor, ...
- C++ compiler, debugger, make, ...
- X-windows system, tcl/tk
- IP networking, NFS (Network FileSystem),
- RAID - harddisk mirroring, multiple disks FS
- PVM (Parallel Virtual Machine)
- MPI (Message Passing Interface)
- Beowulf project - common PID space, signals, ...
- Condor libraries - proces migration and checkpointing
Our Own Software:
- HTK (Hidden Markov Model Toolkit)
- System scripts - setting node hostname and IP, creating map of distributed filesystem when harddisks are moved, etc.
- Job spooling system
- Scripts replacing several programs by cluster versions
- HTK HERest - split training database between many HERests
- Matlab - reserve free node for each process
- sort - split data, sort parts, merge
- ps, kill, w, ...
- Special cluster management tools
Hardware of Linux PC Cluster
Each of 8 nodes consists of:
and all the nodes share:
Beside this, there are many cables and cables and cables...
- Intel Pentium Pro 200 Mhz CPU with 256k integrated L2 cache
- Intel VS440FX (Venus) motherboard, with 82440FX PCIset
- 64 MB RAM (in two EDO 32MB SIMMs 60ns 72pin 32bit, leaves two more slots empty)
- two 4.3 GB IDE disks (IBM DCAA-34330 alias Capricorn) in removable shelves
- 100 Mb/s Fast Ethernet adapter 3c905 (alias Etherlink XL 10/100 TX PCI alias Boomerang)
- SVGA card (cheap PCI 1MB S3 trio 64V+)
- ATX case (4x Minitower Intel CC5 200W, 4x Bigtower 235W)
- Two optrons we added to monitor power LED and control power button (through male 15pin Cannon)
No floppy disk, no CDROM, no big
monitor and no mouse - the only connection of the system
with the rest of the world are power cord and 100Mbps network cable.
You can have the following slide in Postscript
Huge Disk Storage Solution
One fileserver, HW RAID, SCSI disks
- network & memory bus bottleneck!
cheap IDE disks spred all over the cluster
- how to connect them?
- Each node has several IDE disks connected via sofrware RAID
- Mirror - safe storage for program sources, documents etc.
- Data Strip - big filesystem over multiple disks, quick read
- XOR parity - economical safety
- Filesystems on nodes connected via NFS,
and symlinks map
- Disks (in shelves) can be transparently moved to any node
- 8 parts = 8 size limits - need to balance disk usage
- balancing daemon, subverted mkdir and other tricks, or just balance by hand (split big corpora)
How Can Normal Users Use Linux Cluster?
- Use HTK exploiting all the processing power of all nodes
- Use MATLAB - each process on dedicated node
- Use parallel make, sort, ...
- Use simple JOB SPOOLING SYSTEM to make things parallel:
job program param1
job program param2
job -after program program2
- Use cluster as normal network of UNIX machines (rsh), or NOW
- Link own C/C++ programs with CONDOR libraries - huge tasks
can be checkpointed & migrated to other node (old NN programs)
- Create standard parallel programs using PVM or MPI
- Use/Create RPC based programs (X-waves)
- Use/Create pipes/TCP based programs (Hypotheses Manager)
What Can We Expect In The Future?
- More programs designed to use full power of clusters
- More free programs for speech processing (with source code)
- More commercial programs for LINUX (standard UNIX platform)
- Standard distribution of cluster operating system (free, GPL)
- Convergence of supercomputers (ASCI RED) and LINUX clusters
- LINUX clusters used in many more research areas
- MICROSOFT effort to provide similar possibilities with WinNT
This page was created and this WWW server is maintained by