# Mixture Models and the EM Algorithm

### فهرست عناوین اصلی در این پاورپوینت

● CLUSTERING
● What is a Clustering?
● Clustering Algorithms
● Mixture Models and the EM Algorithm
● Model-based clustering
● Gaussian Distribution
● Gaussian Model
● Fitting the model
● Maximum Likelihood Estimation (MLE)
● MLE
● Mixture of Gaussians
● Mixture model
● Mixture Model
● Mixture Models
● EM (Expectation Maximization) Algorithm
● Relationship to K-means
● CLUSTERING Evaluation
● Clustering Evaluation
● Clusters found in Random Data
● Different Aspects of Cluster Validation
● Measures of Cluster Validity
● Measuring Cluster Validity Via Correlation
● Using Similarity Matrix for Cluster Validation
● Internal Measures: SSE
● Estimating the “right” number of clusters
● Internal Measures: SSE
● Internal Measures: Cohesion and Separation
● Internal measures – caveats
● Framework for Cluster Validity
● Statistical Framework for SSE
● Statistical Framework for Correlation
● Empirical p-value
● External Measures for Clustering Validity
● Confusion matrix
● Measures
● Another clustering
● External Measures of Cluster Validity: Entropy and Purity
● Final Comment on Cluster Validity
● Sequence Segmentation
● Sequential data
● Time-series data
● Time series analysis
● Sequence Segmentation
● Example
● Basic definitions
● The K-segmentation problem
● Basic Definitions
● Optimal solution for the k-segmentation problem
● Rule of thumb
● Dynamic Programming Recursion
● Dynamic programming table
● Example
● Dynamic-programming algorithm
● Algorithm Complexity
● Heuristics
● Other time series analysis

 نوع زبان: انگلیسی حجم: 3.05 مگا بایت نوع فایل: اسلاید پاورپوینت تعداد اسلایدها: 78 صفحه سطح مطلب: نامشخص پسوند فایل: pptx گروه موضوعی: زمان استخراج مطلب: 2019/06/15 12:37:21

اسلایدهای پاورپوینت مرتبط در پایین صفحه

### عبارات مهم استفاده شده در این مطلب

datum, model, parameter, cluster, distribution, clustering, mixture, ., probability, value, gaussian, different,

توجه: این مطلب در تاریخ 2019/06/15 12:37:21 به صورت خودکار از فضای وب آشکار توسط موتور جستجوی پاورپوینت جمع آوری شده است و در صورت اعلام عدم رضایت تهیه کننده ی آن، طبق قوانین سایت از روی وب گاه حذف خواهد شد. این مطلب از وب سایت زیر استخراج شده است و مسئولیت انتشار آن با منبع اصلی است.

در صورتی که محتوای فایل ارائه شده با عنوان مطلب سازگار نبود یا مطلب مذکور خلاف قوانین کشور بود لطفا در بخش دیدگاه (در پایین صفحه) به ما اطلاع دهید تا بعد از بررسی در کوتاه ترین زمان نسبت به حدف با اصلاح آن اقدام نماییم. جهت جستجوی پاورپوینت های بیشتر بر روی اینجا کلیک کنید.

عبارات پرتکرار و مهم در این اسلاید عبارتند از: datum, model, parameter, cluster, distribution, clustering, mixture, ., probability, value, gaussian, different,

### مشاهده محتوای متنیِ این اسلاید ppt

data mining lecture ۸ the em algorithm clustering validation sequence segmentation clustering what is a clustering in general a grouping of objects such that the objects in a group cluster are similar or related to one another and different from or unrelated to the objects in other groups inter cluster distances are maximized intra cluster distances are minimized clustering algorithms k means and its variants hierarchical clustering dbscan mixture models and the em algorithm model based clustering in order to understand our data we will assume that there is a generative process a model that creates describes the data and we will try to find the model that best fits the data. models of different complexity can be defined but we will assume that our model is a distribution from which data points are sampled example the data is the height of all people in greece in most cases a single distribution is not good enough to describe all data points different parts of the data follow a different distribution example the data is the height of all people in greece and china we need a mixture model different distributions correspond to different clusters in the data. gaussian distribution example the data is the height of all people in greece experience has shown that this data follows a gaussian normal distribution reminder normal distribution mean standard deviation gaussian model what is a model a gaussian distribution is fully defined by the mean and the standard deviation we define our model as the pair of parameters this is a general principle a model is defined as a vector of parameters fitting the model we want to find the normal distribution that best fits our data find the best values for and but what does best fit mean maximum likelihood estimation mle suppose that we have a vector of values and we want to fit a gaussian model to the data probability of observing point probability of observing all points assume independence we want to find the parameters that maximize the probability maximum likelihood estimation mle the probability as a function of is called the likelihood function it is usually easier to work with the log likelihood function maximum likelihood estimation find parameters that maximize sample mean sample variance mle note these are also the most likely parameters given the data if we have no prior information about or x then maximizing is the same as maximizing mixture of gaussians suppose that you have the heights of people from greece and china and the distribution looks like the figure below dramatization mixture of gaussians in this case the data is the result of the mixture of two gaussians one for greek people and one for chinese people identifying for each value which gaussian is most likely to have generated it will give us a clustering. mixture model a value is generated according to the following process first select the nationality with probability select greece with probability select china given the nationality generate the point from the corresponding gaussian if greece if china we can also thing of this as a hidden variable z that takes two values greece and china parameters of the greek distribution parameters of the china distribution our model has the following parameters mixture model mixture probabilities parameters of the china distribution parameters of the greek distribution our model has the following parameters for value we have for all values we want to estimate the parameters that maximize the likelihood of the data mixture model mixture probabilities distribution parameters our model has the following parameters for value we have for all values we want to estimate the parameters that maximize the likelihood of the data mixture model mixture probabilities distribution parameters mixture models once we have the parameters we can estimate the membership probabilities and for each point this is the probability that point belongs to the greek or the chinese population cluster given from the gaussian distribution for greek em expectation maximization algorithm initialize the values of the parameters in to some random values repeat until convergence e step given the parameters estimate the membership probabilities and m step compute the parameter values that in expectation maximize the data likelihood mle estimates if ’s were fixed fraction of population in g c relationship to k means e step assignment of points to clusters k means hard assignment em soft assignment m step computation of centroids k means assumes common fixed variance spherical clusters em can change the variance for different clusters or different dimensions ellipsoid clusters if the variance is fixed then both minimize the same error function clustering evaluation clustering evaluation how do we evaluate the goodness of the resulting clusters but clustering lies in the eye of the beholder then why do we want to evaluate them to avoid finding patterns in noise to compare clusterings or clustering algorithms to compare against a ground truth clusters found in random data random points k means dbscan complete link determining the clustering tendency of a set of data i.e. distinguishing whether non random structure actually exists in the data. comparing the results of a cluster analysis to externally known results e.g. to externally given class labels. evaluating how well the results of a cluster analysis fit the data without reference to external information. use only the data comparing the results of two different sets of cluster analyses to determine which is better. determining the ‘correct’ number of clusters. for ۲ ۳ and ۴ we can further distinguish whether we want to evaluate the entire clustering or just individual clusters. different aspects of cluster validation numerical measures that are applied to judge various aspects of cluster validity are classified into the following three types. external index used to measure the extent to which cluster labels match externally supplied class labels. e.g. entropy precision recall internal index used to measure the goodness of a clustering structure without reference to external information. e.g. sum of squared error sse relative index used to compare two different …

### کلمات کلیدی پرکاربرد در این اسلاید پاورپوینت: datum, model, parameter, cluster, distribution, clustering, mixture, ., probability, value, gaussian, different,

این فایل پاورپوینت شامل 78 اسلاید و به زبان انگلیسی و حجم آن 3.05 مگا بایت است. نوع قالب فایل pptx بوده که با این لینک قابل دانلود است. این مطلب برگرفته از سایت زیر است و مسئولیت انتشار آن با منبع اصلی می باشد که در تاریخ 2019/06/15 12:37:21 استخراج شده است.

http://www.cs.uoi.gr/~tsap/teaching/2013-cs059/slides/datamining-lect8.pptx

• جهت آموزش های پاورپوینت بر روی اینجا کلیک کنید.
• جهت دانلود رایگان قالب های حرفه ای پاورپوینت بر روی اینجا کلیک کنید.

رفتن به مشاهده اسلاید در بالای صفحه