Progress and Challenges in Automatic Speech Recognition

فهرست عناوین اصلی در این پاورپوینت

فهرست عناوین اصلی در این پاورپوینت

● Progress and Challenges in Automatic Speech Recognition
● Overview
● Pattern Recognition
● Simplistic approach
● Why not?
● N-dimensional pattern space
● Signal processing
● Possible ASR approaches
● System Overview
● System limitations
● Speech production
● Free variation
● Cues from perception
● Speech Signal Analysis
● Short-time spectral analysis
● Pitch (F0) estimation
● Parameterization
● Cepstral processing
● Cepstral coefficients
● Feature Transforms
● Major issues in ASR
● Comparing ASR tasks
● Comparing ASR – 2
● Dynamic Time Warping
● Improving HMMs
● Neural Networks
● Long Short-Term Memory (LSTM) Memory Block Structure
● Back-propagation in Time
● Connectionist Temporal Classification
● Fast Graphics Processing Units (GPUs)
● Deep Neural Networks
● DNN training issues
● Recent new ideas in ASR

نوع زبان: انگلیسی حجم: 2.42 مگا بایت
نوع فایل: اسلاید پاورپوینت تعداد اسلایدها: 56 صفحه
سطح مطلب: نامشخص پسوند فایل: ppt
گروه موضوعی: زمان استخراج مطلب: 2019/06/15 11:30:24

لینک دانلود رایگان لینک دانلود کمکی

اسلایدهای پاورپوینت مرتبط در پایین صفحه

عبارات مهم استفاده شده در این مطلب

عبارات مهم استفاده شده در این مطلب

speech, asr, feature, spectral, energy, analysis, model, signal, space, frequency, measure, coefficient,

توجه: این مطلب در تاریخ 2019/06/15 11:30:24 به صورت خودکار از فضای وب آشکار توسط موتور جستجوی پاورپوینت جمع آوری شده است و در صورت اعلام عدم رضایت تهیه کننده ی آن، طبق قوانین سایت از روی وب گاه حذف خواهد شد. این مطلب از وب سایت زیر استخراج شده است و مسئولیت انتشار آن با منبع اصلی است.

در صورتی که محتوای فایل ارائه شده با عنوان مطلب سازگار نبود یا مطلب مذکور خلاف قوانین کشور بود لطفا در بخش دیدگاه (در پایین صفحه) به ما اطلاع دهید تا بعد از بررسی در کوتاه ترین زمان نسبت به حدف با اصلاح آن اقدام نماییم. جهت جستجوی پاورپوینت های بیشتر بر روی اینجا کلیک کنید.

عبارات پرتکرار و مهم در این اسلاید عبارتند از: speech, asr, feature, spectral, energy, analysis, model, signal, space, frequency, measure, coefficient,

مشاهده محتوای متنیِ این اسلاید ppt

مشاهده محتوای متنیِ این اسلاید ppt

progress and challenges in automatic speech recognition douglas o shaughnessy overview automatic speech recognition asr a pattern recognition task review relevant aspects of human speech production and perception acoustic phonetic principles digital analysis methods parameterization and feature extraction training and adaptation of models overview of asr approaches practical techniques hidden markov models deep neural networks acoustic and language models cognitive and statistical asr pattern recognition need to map a data point in n dimensional space to a label input data samples in time output text for a word sentence assumption signals for similar words cluster in the space problem how to process speech signal into suitable features simplistic approach store all possible speech signals with their corresponding texts then just need a table look up moore s law will solve asr problem storage doubling every year computation power doubling every ۱.۵ years why not short utterance of ۱ s and coding rate ۱ kbps kilobit second ۲۵ frames s ۱ coefficients a frame ۴ bits coefficient ۲۱ signals suppose each person spoke ۱ word every second for ۱ hours about ۱ ۱۷ short utterances well beyond near term capability n dimensional pattern space asr assigns a label text to each input utterance similar speech is assumed to cluster in the feature space often use simple distance to measure similarity between input and centroids of trained models this assumes that perceptual and or production similarity correlates with distance in the feature space – often not the case unless features are well chosen similarity measures representation of a frame of speech n dimensional vector point in n dimensional space n number of parameters or features if features well chosen similar values for different versions of same phonetic segment and distinct values for segments that differ phonetically then separate regions can be readily established in the feature space for each segment. distance measures must focus any comparison measure on relevant spectral aspects of the speech units euclidean distance measure simple itakura saito measure used with lpc maximizing probability in stochastic approaches distance measures if n dimensions and w weighting matrix signal processing not only to reduce costs mostly to focus analysis on important aspects of the signal thus raising accuracy use the same analysis to create model and to test it not done in some recent end to end asr however most asr uses either mfcc or log spectral filter bank energies otherwise feature space is far too complex possible asr approaches emulate how humans interpret speech treat simply as a pattern recognition problem exploit power of computers expert system methods stochastic methods system overview courtesy of bin ma ۲ ۱۵ system limitations inadequate training data in speaker dependent systems user fatigue memory limitations computation searching among many possible texts inadequate models poor assumptions made to reduce computation and memory at the cost of reduced accuracy hard to train model parameters speech production speech not an arbitrary signal source of input to asr human vocal tract data compression should take account of the human source precision of representation not exceed ability to control speech free variation aspects of speech that speakers do not directly control are free variation can be treated as distortion noise other sounds reverberation puts limits on accuracy needed creates mismatch between trained models and any new input intra speaker people never say the same exact utterance twice inter speaker everyone is different size gender dialect … environment snr microphone placement … compare to vision pr changes in lighting shadows obscuring objects viewing angle focus vocal tract length normalization vtln noise suppression speaker controls amplitude pitch formants voicing speaking rate mapping from word and phoneme concepts in the brain to the acoustic output is complex trying to decipher speech is more complex than identifying objects in a visual scene vision edges texture coloring orientation motion speech indirect not observing the vocal tract cues from perception auditory system sensitive to dynamic positions of spectral peaks durations relative to speaking rate fundamental frequency f patterns important where and when energy occurs less relevant overall spectral slope bandwidths absence of energy formant tracking algorithms err in transitions not directly used in asr for many years speech signal analysis distribution of speech energy in frequency spectral amplitude pitch period estimation sampling rate typically ۸ sec for telephone speech ۱ ۱۶ sec otherwise usually ۱۶ bits sample ۸ bit mu law logpcm in the telephone network short time spectral analysis feature determination e.g. formant frequencies f requires error prone methods so automatic methods parameters preferred fft fast fourier transform lpc linear predictive coding mfcc mel frequency cepstral coefficients rasta plp log spectral filter bank energies pitch f estimation often errors in weak speech and in transitions between voiced and unvoiced speech e.g. doubling or halving f peak pick the time signal look for energy increase at each closure of vocal cords usually first filter out energy above ۱ hz retain strong harmonics in f۱ region often use autocorrelation to eliminate phase effects often not done in asr due to the difficulty of exploiting f in its complex role of signaling different aspects of speech communication parameterization objective model speech spectral envelope with few ۸ ۱۶ coefficients ۱ linear predictive coding lpc analysis standard spectral method for low rate speech coding ۲ cepstral processing common in asr also can exploit some auditory effects ۳ vector quantization vq reduces transmission rate but also asr accuracy cepstral processing cepstrum inverse fft of the log amplitude fft of the speech small set of parameters often ۱ ۱۳ as lpc but allows warping of frequency to match hearing inverse dft orthogonalizes gross spectral detail in low order values finer detail in higher coefficients c total speech energy often discarded cepstral coefficients c۱ balance of energy low vs. high frequency c۲ …c۱۳ encode increasingly fine details about the spectrum e.g. resolution to ۱ hz mel cepstral coefficients mfccs model low frequencies linearly above ۱ hz logarithmically feature transforms linear discriminant analysis lda as in analysis of variance anova regression analysis and principal component analysis pca lda finds a linear combination of features to separate pattern classes maximum likelihood linear transforms speaker adaptive transforms map sets of features e.g. mfcc spectral energies to a smaller more efficient set major issues in asr segmenting speech spoken without pauses continuous speech …

کلمات کلیدی پرکاربرد در این اسلاید پاورپوینت: speech, asr, feature, spectral, energy, analysis, model, signal, space, frequency, measure, coefficient,

این فایل پاورپوینت شامل 56 اسلاید و به زبان انگلیسی و حجم آن 2.42 مگا بایت است. نوع قالب فایل ppt بوده که با این لینک قابل دانلود است. این مطلب برگرفته از سایت زیر است و مسئولیت انتشار آن با منبع اصلی می باشد که در تاریخ 2019/06/15 11:30:24 استخراج شده است.

  • جهت آموزش های پاورپوینت بر روی اینجا کلیک کنید.
  • جهت دانلود رایگان قالب های حرفه ای پاورپوینت بر روی اینجا کلیک کنید.

رفتن به مشاهده اسلاید در بالای صفحه

پاسخی بگذارید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *