Concepts in Clustering

فهرست عناوین اصلی در این پاورپوینت

فهرست عناوین اصلی در این پاورپوینت

● Clustering
● Idea and Applications
● When & From What
● Concepts in Clustering
● Inter/Intra Cluster Distances
● How hard is clustering?
● Classical clustering methods
● K-means
● K-means Example
● K Means Example
● Time Complexity
● Problems with K-means
● Variations on K-means
● Bisecting K-means
● Class of 16th October
● Hierarchical Clustering Techniques
● Hierarchical Agglomerative Clustering Example
● Single Link Example
● Properties of HAC
● Impact of cluster distance measures
● Complete Link Example
● Bisecting K-means
● Buckshot Algorithm
● Text Clustering
● Which of these are the best for text?
● Challenges/Other Ideas
● Other (general clustering) challenges

نوع زبان: انگلیسی حجم: 1.49 مگا بایت
نوع فایل: اسلاید پاورپوینت تعداد اسلایدها: 31 صفحه
سطح مطلب: نامشخص پسوند فایل: ppt
گروه موضوعی: زمان استخراج مطلب: 2019/06/14 11:50:38

لینک دانلود رایگان لینک دانلود کمکی

اسلایدهای پاورپوینت مرتبط در پایین صفحه

عبارات مهم استفاده شده در این مطلب

عبارات مهم استفاده شده در این مطلب

cluster, k, distance, ., point, clustering, mean, centroid, intra, object, two, start,

توجه: این مطلب در تاریخ 2019/06/14 11:50:38 به صورت خودکار از فضای وب آشکار توسط موتور جستجوی پاورپوینت جمع آوری شده است و در صورت اعلام عدم رضایت تهیه کننده ی آن، طبق قوانین سایت از روی وب گاه حذف خواهد شد. این مطلب از وب سایت زیر استخراج شده است و مسئولیت انتشار آن با منبع اصلی است.

در صورتی که محتوای فایل ارائه شده با عنوان مطلب سازگار نبود یا مطلب مذکور خلاف قوانین کشور بود لطفا در بخش دیدگاه (در پایین صفحه) به ما اطلاع دهید تا بعد از بررسی در کوتاه ترین زمان نسبت به حدف با اصلاح آن اقدام نماییم. جهت جستجوی پاورپوینت های بیشتر بر روی اینجا کلیک کنید.

عبارات پرتکرار و مهم در این اسلاید عبارتند از: cluster, k, distance, ., point, clustering, mean, centroid, intra, object, two, start,

مشاهده محتوای متنیِ این اسلاید ppt

مشاهده محتوای متنیِ این اسلاید ppt

clustering ۱ ۹ ۲ ۲ idea and applications clustering is the process of grouping a set of physical or abstract objects into classes of similar objects. it is also called unsupervised learning. it is a common and important task that finds many applications. applications in search engines structuring search results suggesting related pages automatic directory construction update finding near identical duplicate pages when from what clustering can be done at indexing time at query time applied to documents applied to snippets clustering can be based on url source put pages from the same server together text content polysemy bat banks multiple aspects of a single topic links look at the connected components in the link graph a h analysis can do it concepts in clustering defining distance between points cosine distance which you already know overlap distance a good clustering is one where intra cluster distance the sum of distances between objects in the same cluster are minimized inter cluster distance while the distances between different clusters are maximized objective to minimize f intra inter clusters can be evaluated with internal as well as external measures internal measures are related to the inter intra cluster distance external measures are related to how representative are the current clusters to true classes see entropy and f measure in steinbach et. al. inter intra cluster distances intra cluster distance sum min max avg the absolute squared distance between all pairs of points in the cluster or between the centroid and all points in the cluster or between the medoid and all points in the cluster inter cluster distance sum the squared distance between all pairs of clusters where distance between two clusters is defined as distance between their centroids medoids spherical clusters distance between the closest pair of points belonging to the clusters chain shaped clusters lecture of ۱ ۱۴ how hard is clustering one idea is to consider all possible clusterings and pick the one that has best inter and intra cluster distance properties suppose we are given n points and would like to cluster them into k clusters how many possible clusterings too hard to do it brute force or optimally solution iterative optimization algorithms start with a clustering iteratively improve it eg. k means classical clustering methods partitioning methods k means and em k medoids hierarchical methods agglomerative divisive birch model based clustering methods k means works when we know k the number of clusters we want to find idea randomly pick k points as the centroids of the k clusters loop for each point put the point in the cluster to whose centroid it is closest recompute the cluster centroids repeat loop until there is no change in clusters between two consecutive iterations. iterative improvement of the objective function sum of the squared distance from each point to the centroid of its cluster k means example for simplicity ۱ dimension objects and k ۲. numerical difference is used as the distance objects ۱ ۲ ۵ ۶ ۷ k means randomly select ۵ and ۶ as centroids two clusters ۱ ۲ ۵ and ۶ ۷ meanc۱ ۸ ۳ meanc۲ ۶.۵ ۱ ۲ ۵ ۶ ۷ meanc۱ ۱.۵ meanc۲ ۶ no change. aggregate dissimilarity sum of squares of distanceeach point of each cluster from its cluster center intra cluster distance .۵۲ .۵۲ ۱۲ ۲ ۱۲ ۲.۵ ۱ ۱.۵ ۲ k means example k ۲ reassign clusters converged from mooney example of k means in operation from hand et. al. time complexity assume computing distance between two instances is o m where m is the dimensionality of the vectors. reassigning clusters o kn distance computations or o knm . computing centroids each instance vector gets added once to some centroid o nm . assume these two steps are each done once for i iterations o iknm . linear in all relevant factors assuming a fixed number of iterations more efficient than o n۲ hac to come next problems with k means need to know k in advance could try out several k unfortunately cluster tightness increases with increasing k. the best intra cluster tightness occurs when k n every point in its own cluster tends to go to local minima that are sensitive to the starting centroids try out multiple starting points disjoint and exhaustive doesn’t have a notion of outliers outlier problem can be handled by k medoid or neighborhood based algorithms assumes clusters are spherical in vector space sensitive to coordinate changes weighting etc. in the above if you start with b and e as centroids you converge to a b c and d e f if you start with d and f you converge to a b d e c f example showing sensitivity to seeds variations on k means recompute the centroid after every or few changes rather than after all the points are re assigned improves convergence speed starting centroids seeds change which local minima we converge to as well as the rate of convergence use heuristics to pick good seeds can use another cheap clustering over random sample run k means m times and pick the best clustering that results bisecting k means takes this idea further… lowest aggregate dissimilarity intra cluster distance bisecting k means for i ۱ to k ۱ do pick a leaf cluster c to split for j ۱ to iter do use k means to split c into two sub clusters c۱ and c۲ choose the best of the above splits and make it permanent can pick the largest cluster or the cluster with lowest average similarity hybrid method ۱ divisive hierarchical clustering method uses k means class of ۱۶th october midterm on october ۲۳rd. in class. ۵۸.bin ۵۹.bin hierarchical clustering techniques generate a nested multi resolution sequence of clusters two types of algorithms divisive start with one cluster and recursively subdivide bisecting k means is an example agglomerative hac start with data points as single point clusters and recursively merge the closest clusters dendogram hierarchical agglomerative clustering example put every point in a cluster by itself. for i ۱ to n ۱ do let c۱ and c۲ be the most mergeable pair of clusters create c۱ ۲ as parent of c۱ and c۲ example for simplicity we still use ۱ dimensional objects. numerical difference is used as the distance objects ۱ ۲ ۵ ۶ ۷ agglomerative clustering find two closest objects and merge ۱ ۲ so we have now …

کلمات کلیدی پرکاربرد در این اسلاید پاورپوینت: cluster, k, distance, ., point, clustering, mean, centroid, intra, object, two, start,

این فایل پاورپوینت شامل 31 اسلاید و به زبان انگلیسی و حجم آن 1.49 مگا بایت است. نوع قالب فایل ppt بوده که با این لینک قابل دانلود است. این مطلب برگرفته از سایت زیر است و مسئولیت انتشار آن با منبع اصلی می باشد که در تاریخ 2019/06/14 11:50:38 استخراج شده است.

  • جهت آموزش های پاورپوینت بر روی اینجا کلیک کنید.
  • جهت دانلود رایگان قالب های حرفه ای پاورپوینت بر روی اینجا کلیک کنید.

رفتن به مشاهده اسلاید در بالای صفحه

پاسخی بگذارید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *