-
-
Save jasonyanwenl/44c5b9c1dcc3d0d5cd4ae82f85cb54e9 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "<h2><center>Tutorial on Latent Dirichlet Allocation</center></h2>" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Introduction\n", | |
| "Latent Dirichlet Allocation(LDA) is a very powerful topic model to find topics in a given text. On the one hand, LDA help engineers and scientists find main topic components in a huge amount of documents. On the other hand, Since documents can be distinguished based on their different topics, many other natural languange processing such as text classification will have better result based on the result of LDA.\n", | |
| "\n", | |
| "This tutorial will help you understand the basic machanism of sampling and inference from LDA topic model. Then we will introduce the implementation of this model and apply it to some data.\n", | |
| "\n", | |
| "## A quick example\n", | |
| "Instead of assigning one specific topic for one document, LDA consider it in the way that every documents is a combination of different topics. The number of topics is given, then basically the model will do two things: one is to find a distribution of these topics for each of the given documents; another is to find words distribution for each of the topics. Note that the name of each topics doesn't matter, what is important is their distribution. To achieve this, LDA also assign one topic for each word in each document (in article, we call this as \"position\") and then assign word for this position based on the topic assigned to this position. The following is an quick example of LDA model.\n", | |
| "<img src=\"https://www.dropbox.com/s/azgtj4rbsvedpu1/Figure1.pdf?raw=1\">\n", | |
| "\n", | |
| "In the above figure, there are 5 very short documents(DOC1, DOC2, ...). Given 2 topics A and B. So what LDA do is:\n", | |
| "\n", | |
| "1. find topic distribution for each of these 5 documents(only shows for first two documents here). \n", | |
| "2. find word distribution for each of these 2 topics(on the left of this figure). \n", | |
| "3. Assign one of 2 topics to each position in each document and assign word for each position. The color in the figure represent different topic assignment. For example, \"ate\" in document 1 is assigned topic A and \"bananas\" in document 2 is assigned topic B.\n", | |
| "\n", | |
| "It is worthy noting that same word may be assigned to different topics, such as the \"banana\" in document 1 and \"bananas\" in document 2(after lemmatization they are same). The main task of LDA is to obtain the latent variables topic words distribution, document topics distribution and topic assignment for each position in each document." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Model interpretation\n", | |
| "Before going into this part, it is really recommended to watch this [video](http://videolectures.net/mlss09uk_blei_tm/). Actually, the speaker in this video is just the proposer of this model--Professor David M. Blei. His lecture is very helpful for you to understand this model.\n", | |
| "\n", | |
| "### dirichlet distribution\n", | |
| "[Dirichlet distribution](https://en.wikipedia.org/wiki/Dirichlet_distribution) describes a distribution over distribution. The mathmatical expression for it is:\n", | |
| "$$f(x_1,x_2,\\cdots,x_K;\\alpha_1,\\alpha_2,\\cdots,\\alpha_K)=\\dfrac{1}{B(\\alpha)}\\prod_{i=1}^Kx_i^{\\alpha_i-1}\\sim Dir(\\alpha_1,\\alpha_2,\\cdots,\\alpha_K)$$\n", | |
| "where $x_i, i\\in 1,\\cdots,K$ are random variables satisfying $\\sum_{i=1}^Kx_i=1$ and $x_i\\geqslant0$. Hence all $x_i$ can be seen as a distribution for some random variable and thus dirichlet distribution can be seen as a distribution over distribution. $\\alpha_i$, $i\\in1,\\cdots,K$ are parameters for dirichlet. These parameters control the shape of the distribution. This may be abstract for the first time, you can see a good example of visualization of 3D Dirichlet dirtribution [here](http://blog.bogatron.net/blog/2014/02/02/visualizing-dirichlet-distributions/). \n", | |
| "\n", | |
| "Dirichlet distribution is chosen as prior for topic words distribution and document topics distribution. The reason is one good property of dirichlet distribution is that it is a [conjugate piror](https://en.wikipedia.org/wiki/Conjugate_prior) of multinomial distribution (to be introduced in the next part), which is chosed as statistical model for both topic words distribution and of document topics distribution. Hence, putting Dirichlet and multinomial distribution into LDA model will get relatively concise mathmatical result and help to inference from it. \n", | |
| "\n", | |
| "One thing to note is that when $\\alpha_i$ is small, the distribution of $x_i, i\\in 1,\\cdots,K$ is very sparse, meaning that most of $x_i$ will actually toward zero while some $x_i$ approach relative large value. For LDA, one common choice is to set all $\\alpha_i$ as same small value. Therefore, LDA model assumes that one document tends to consists of a few topics instead of a mixture of many topics. Same meaning there is no preference for any one of the topics.\n", | |
| "\n", | |
| "### Multinomial distribution\n", | |
| "[Multinomial distribution](https://en.wikipedia.org/wiki/Multinomial_distribution) is the generalization of binomial distribution. The mathmatical expression for it is:\n", | |
| "$$f(x_1,x_2,\\cdots,x_K;n,p_1,p_2,\\cdots,p_K)=\\dfrac{\\Gamma(\\sum_ix_i+1)}{\\prod_i\\Gamma(x_i+1)}\\prod_{i=1}^Kp_i^{x_i}\\sim Mult(p_1,p_2,\\cdots,p_K)$$\n", | |
| "where $n$ is number of trials and $x_i, i\\in 1,\\cdots,K$ is number $i$ appears with probability $p_i$ in each trial. \n", | |
| "\n", | |
| "Multinomial distribution is chosen as topic assignment distribution of each word and word assignment distirbution for one position in one document. Therefore, for each position in each document, we sample once from multinomial model(parameters provided by dirichlet model of document topics distribution) to get its topic and again sample once from another multinomial model(parameters provided by dirichlet model of topic words distribution).\n", | |
| "\n", | |
| "### LDA model\n", | |
| "Now let's talk about LDA model. There is a very good graphical model representing LDA model given by the first paper describing this model$^{[1]}$:\n", | |
| "<img src=\"https://www.dropbox.com/s/kob0vdrrybq3cjw/Figure2.pdf?raw=1\">\n", | |
| "The statistical model can be expressed as follows:\n", | |
| "\n", | |
| "Given document index $d=\\{1,\\cdots,D\\}$, word index $w=\\{1,\\cdots,W\\}$, $K$ topics and topic index $k=\\{1,\\cdots,K\\}$, LDA assumes:\n", | |
| "$$\\beta_k\\sim Dir(\\beta_k|\\eta_1,\\eta_2,\\cdots,\\eta_W)$$\n", | |
| "\n", | |
| "$$\\theta_d\\sim Dir(\\theta_d|\\alpha_1,\\alpha_2,\\cdots,\\alpha_K)$$\n", | |
| "\n", | |
| "$$z_{dw}\\sim Mult(z_{dw}|\\theta_d)$$\n", | |
| "\n", | |
| "$$w_{dk}\\sim Mult(w_{dk}|\\theta,z_{dw}=k)$$\n", | |
| "\n", | |
| "Please combine the figure and math expression to understand the following interpretation of this model: \n", | |
| "\n", | |
| "1. the topic words distribution is given by dirichlet distribution with hyper parameters $\\eta$. It will generate K topics with different word distribution for each topic. Each topic includes a distribution of all words appear in all input documents.\n", | |
| "2. the document topics distribution is given by dirichlet distribution with hyper parameters $\\alpha$. It will generate D topic distributions with different topic distribution for each document. Each document includes a distribution of all K topics.\n", | |
| "3. Assign each position in each document a topic $z_{dw}$ range from $1,\\cdots,K$.\n", | |
| "4. Assign each position in each document a word $w_{dk}$ from the words distribution corresponding to the given topic $z_{dw}=k$.\n", | |
| "\n", | |
| "### Inference from model\n", | |
| "\n", | |
| "One task of LDA is to inference latent variable(e.g. $\\theta$, $\\beta$ and $z$) given input texts. This is also where the name \"latent\" comes from. Although employing dirichlet and multinomial distribution make the model easier, it is still quite complicate to directly inference them from this model. However, there are some approximate inference methods for it and they are practical useful. Here are three main methods$^{[2]}$:\n", | |
| "1. Gibbs sampling\n", | |
| "2. Variational methods$^{[1]}$\n", | |
| "3. Particle filtering$^{[3]}$\n", | |
| "\n", | |
| "Here in our tutorial, we will give details about Gibbs sampling. For other two methods, you can find very good reference from respective links.\n", | |
| "\n", | |
| "For [Gibbs sampling](https://en.wikipedia.org/wiki/Gibbs_sampling), We will introduce both standard Gibbs sampling and collapsed Gibbs sampling. Standard Gibbs sampling is easier while collapsed Gibbs sampling is more efficiently.\n", | |
| "\n", | |
| "- Standard Gibbs sampling\n", | |
| "\n", | |
| "There is a good explanation for applying standard Gibbs sampling to LDA, you can see it [here](https://wiseodd.github.io/techblog/2017/09/07/lda-gibbs/). Gibbs sampling is a method from [Markov Chain Monte Carlo](https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo). Generally, we do not do mathmatical derivation from the model. Instead we sample the latent variables alternatively and update one random variable whle fixing all others. Iterating the above steps many times until approaching some criterion and we get approximate latent variables.\n", | |
| "\n", | |
| "Here we will ignore the process of derivation and directly give the sampling formula as follows:\n", | |
| "$$p(\\beta_k|z,w)=Dir(\\eta+n_{1:W}(w_{(1:D),k}))$$\n", | |
| "\n", | |
| "$$p(\\theta_d|z_{d,1:W},\\beta)=Dir(\\alpha+n_{1:K}(z_{d,1:W}))$$\n", | |
| "\n", | |
| "$$p(z_{dw}|z_{-dw},w,\\theta_d,\\beta)\\propto p(z_{dw}|\\theta_d)p(w|\\beta_{k=z_{dw}},z_{dw})$$\n", | |
| "\n", | |
| "Let's take a look at these three formulas:\n", | |
| "\n", | |
| "1. the first formula is about sampling topic words distribution $\\beta_k$ given the topic $k$, all words in this topic in all document($1:D$) $w_{(1:D),k}$. Since Dirichlet distribution is conjugate to multinomial distribution, the posterior distribution of $\\beta_k$ is still dirichlet while parameters controling each word need to update. Updating is simply add term $n_{1:W}(w_{(1:D),k})$, which is a count vector to count number of each word($1:W$) in topic $k$ in all documents($1:D$) . We repeat this for all topics($1:K$).\n", | |
| "\n", | |
| "2. the second formula is about sampling document topics distribution $\\theta_d$ given the document $d$, all positions in this document $z_{d,1:W}$ and all topic words distribution $\\beta$. Similarly, since Dirichlet distribution is conjugate to multinomial distribution, the posterior distribution of $\\theta_d$ is still dirichlet while parameters controling each topic need to update. Updating is simply add term $n_{1:K}(z_{d,1:W})$, which is a count vector to count number of $z=k$ in all words($1:W$) in document $d$ for each topic $1:K$. We repeat this for all documents($1:D$).\n", | |
| "\n", | |
| "3. the third formula is about sampling topic assignment for each position in each document $z_{dw}$ given all other position's topic assignment $z_{-dw}$, topics distribution for this document $\\theta_d$ and all topic words distribution $\\beta$. This is posterior probability of $z_{dw}$. Therefore, it is proportional to prior times likelihood. The prior term is $p(z_{dw}|\\theta_d)$, meaning given document d's topic distribution $\\theta_d$, the probability of seeing $z_{dw}$ for position $w$. The likelihood term is $p(w|\\beta_{k=z_{dw}},z_{dw})$, meaning given topic $k=z_{dw}$ words distribution $\\beta_{k=z_{dw}}$ and topic assignment of position $w$ in document d $z_{dw}$, the probability of seeing word $w$.\n", | |
| "\n", | |
| "So far we have a basic method to inference latent variable from LDA model. However, this is not computational efficiently since we need to calculate for three kind of latent variables (acutally the total number is not too less than $D+K+DW$) in each iteration. Hence, we introduce collapsed Gibbs sampling in the following part.\n", | |
| "\n", | |
| "- Collapsed Gibbs sampling\n", | |
| "\n", | |
| "Compared to standard Gibbs sampling, collapsed Gibbs sampling is more efficient. The reason is that it integrates out the latent variable $\\theta$ and $\\beta$. What is left is only $z_{dw}$ for each position in each document. Wikipedia for LDA has a specific derivation of [collapsed Gibbs sampling](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation#Inference). Here we ignore the process of that and give the result formula directly as follows:\n", | |
| "$$p(z_{dw}|z_{-dw},w)\\propto (n_{z}(z_{-dw})+\\alpha_{z_{dw}})\\dfrac{n_{w}(w_{1:D,z_{-dw}})+\\beta_{w}}{\\sum_{w'} n_{w'}(w_{1:D,z_{-dw}})+\\sum_{\\beta_{1:W}}}$$\n", | |
| "\n", | |
| "One easy way to understand this formula is that instead of calculating latent variable $\\theta$ and $\\beta$ from their distribution formula, we directly calculate them from the sample we get, e.g. empirical version. It can be seen from the following formulas:\n", | |
| "$$\\theta_d \\approx \\dfrac{n_{z}(z_{-dw})+\\alpha_{z_{dw}}}{\\sum_{z'}n_{z'}(z_{-dw})+\\sum_{z'}{\\alpha_{1:K}}}\\quad\n", | |
| "\\beta_k \\approx \\dfrac{n_{w}(w_{1:D,z_{-dw}})+\\beta_{w}}{\\sum_{w'} n_{w'}(w_{1:D,z_{-dw}})+\\sum_{\\beta_{1:W}}}$$\n", | |
| "\n", | |
| "Now let's implement them!" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Model implementation\n", | |
| "For, Standard Gibbs sampling\n", | |
| "There is a good material for explaining collapsed Gibbs sampling and its algorithm [here](http://u.cs.biu.ac.il/~89-680/darling-lda.pdf). \n", | |
| "\n", | |
| "For, Collapsed Gibbs sampling\n", | |
| "Please look at this reference $[4]$ for pseudo code of collapsed Gibbs sampling.\n", | |
| "\n", | |
| "The whole code can also be found in the following. You can read it carefully as they are heavily commented . \n", | |
| "\n", | |
| "Before coding, one thing need to be noted is that the format of input documents. These input documents need to be processed as bag of words vector to make sure they have same length. Although it is seemlingly tougher to count each latent variables, still, we can make our life easier via construct three index vectors. They are the index of words for every word in all documents idx_W, the index of each word's document idx_D, the index of each word's topic Z." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import numpy as np\n", | |
| "import scipy as sp\n", | |
| "import matplotlib.pyplot as plt\n", | |
| "%matplotlib inline\n", | |
| "from sklearn.decomposition import LatentDirichletAllocation\n", | |
| "from sklearn.feature_extraction.text import CountVectorizer\n", | |
| "from collections import Counter" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "class myLDA:\n", | |
| " def __init__(self, N_K=10, alpha=1, eta=1):\n", | |
| " \"\"\"\n", | |
| " Model initialization\n", | |
| " \n", | |
| " Args:\n", | |
| " N_K: number of topics specified to put in the model\n", | |
| " alpha: prior parameter for dirichlet distribution of document proportion\n", | |
| " eta: prior parameter for dirichlet distribution of topic proportion\n", | |
| " \"\"\"\n", | |
| " np.random.seed(1)\n", | |
| " self.N_K = N_K\n", | |
| " self.alpha = alpha\n", | |
| " self.eta = eta\n", | |
| " \n", | |
| " # implement inference of LDA via standard Gibbs Sampling\n", | |
| " def lda_sgs(self,D ,max_iter=100):\n", | |
| " \"\"\"\n", | |
| " Model for latent dirichlet allocation. Inferenced via standard gibbs sampling.\n", | |
| "\n", | |
| " Args:\n", | |
| " D: a list of documents.\n", | |
| " N_D: number of documents\n", | |
| " N_W: number of words. Different between each document. \n", | |
| " max_iter: maximum number of iteration used for gibbs sampling\n", | |
| " Returns:\n", | |
| " (Theta, Beta, Word): a tuple consist of document topic proportion ndarray Theta and topic word proportion ndarray and a list of all unique words\n", | |
| " \"\"\"\n", | |
| " \n", | |
| " idx_W, idx_D = self.matrix_to_list(D)\n", | |
| " N_D,N_W = D.shape # get the number of documents and the number of unique words\n", | |
| " \n", | |
| " alpha_vec = self.alpha*np.ones(self.N_K)\n", | |
| " eta_vec = self.eta*np.ones(N_W)\n", | |
| " \n", | |
| " # Theta: Document topic proportion\n", | |
| " Theta = np.zeros([N_D,self.N_K])\n", | |
| " for d in range(N_D):\n", | |
| " Theta[d] = np.random.dirichlet(alpha_vec)\n", | |
| " \n", | |
| " # Beta: Topic word proportion\n", | |
| " Beta = np.zeros([self.N_K,N_W])\n", | |
| " for k in range(self.N_K):\n", | |
| " Beta[k] = np.random.dirichlet(eta_vec)\n", | |
| "\n", | |
| " # Z: Topic assignment\n", | |
| " Z = np.zeros(len(idx_W)) # the kth element represent the kth word's topic\n", | |
| " n_d_k = np.zeros([N_D,self.N_K]) # matrix of number of kth topic in dth document -> document topic proportion\n", | |
| " n_w_k = np.zeros([N_W,self.N_K]) # matrix of number of word w in kth topic -> topic word distribution\n", | |
| " for idx_w in range(len(idx_W)):\n", | |
| " d = idx_D[idx_w] # get the word w's document\n", | |
| " w = idx_W[idx_w] # get the idx_wth word\n", | |
| " z_new = np.random.multinomial(1, Theta[d]).argmax() # randomly generate the topic\n", | |
| " n_d_k[d][z_new] += 1\n", | |
| " n_w_k[w][z_new] += 1\n", | |
| " Z[idx_w] = z_new\n", | |
| " \n", | |
| " for it in range(max_iter): # loop for each iteration\n", | |
| " \n", | |
| " # step1: update topic word propostion Beta\n", | |
| " for k in range(self.N_K): # loop for each topic\n", | |
| " eta_vec = self.eta*np.ones(N_W)\n", | |
| " Beta[k] = np.random.dirichlet(eta_vec + n_w_k[:,k])\n", | |
| " \n", | |
| " # Step2: update document topic proportion Theta\n", | |
| " for d in range(N_D): # loop for each document\n", | |
| " alpha_vec = self.alpha*np.ones(self.N_K)\n", | |
| " Theta[d] = np.random.dirichlet(alpha_vec + n_d_k[d]) # re-sample document topic proportion Theta\n", | |
| " \n", | |
| " # step3: update word topic assignment Z\n", | |
| " for idx_w in range(len(idx_W)): # loop for each word\n", | |
| " d = idx_D[idx_w] # get the word w's document\n", | |
| " w = idx_W[idx_w] # get the idx_wth word\n", | |
| " z = int(Z[idx_w]) # get the idx_wth word's current topic assignment\n", | |
| " n_d_k[d][z] -= 1\n", | |
| " n_w_k[w][z] -= 1\n", | |
| " prob_vec = np.exp(np.log(Theta[d]) + np.log(Beta[:,w])) # probablity of likelihood x prior (vector of topics)\n", | |
| " posterior_vec = prob_vec / np.sum(prob_vec) # get posterior after normalization\n", | |
| " z_new = np.random.multinomial(1,posterior_vec).argmax()\n", | |
| " Z[idx_w] = z_new\n", | |
| " n_d_k[d][z_new] += 1\n", | |
| " n_w_k[w][z_new] += 1\n", | |
| " \n", | |
| " return (Theta, Beta) \n", | |
| " \n", | |
| " # implement inference of LDA via collapsed Gibbs Sampling\n", | |
| " def lda_cgs(self,D, max_iter=100): \n", | |
| " \"\"\"\n", | |
| " Model for latent dirichlet allocation. Inferenced via collapsed gibbs sampling.\n", | |
| "\n", | |
| " Args:\n", | |
| " D: a sparse matrix representation of documents via bag of words. shape = [N_D,N_W]\n", | |
| " N_D: number of documents\n", | |
| " N_W: number of words.\n", | |
| " max_iter: maximum number of iteration used for gibbs sampling\n", | |
| " Returns:\n", | |
| " (Theta, Beta): a tuple consist of document topic proportion ndarray Theta and topic word proportion ndarray.\n", | |
| " \"\"\"\n", | |
| " \n", | |
| " idx_W, idx_D = self.matrix_to_list(D)\n", | |
| " N_D,N_W = D.shape # get the number of documents and the number of unique words\n", | |
| " \n", | |
| " n_d_k = np.zeros([N_D,self.N_K]) # matrix of number of kth topic in dth document -> document topic proportion\n", | |
| " n_w_k = np.zeros([N_W,self.N_K]) # matrix of number of word w in kth topic -> topic word distribution\n", | |
| " n_allw_k = np.zeros(self.N_K) # vector of number of all words in kth topic\n", | |
| " \n", | |
| " # initialize Z: Topic assignment\n", | |
| " Z = np.zeros(len(idx_W)) # the kth element represent the kth word's topic\n", | |
| " \n", | |
| " #loop for each word in all documents to construct three matrix: n_d_k, n_w_k, n_allw_k\n", | |
| " for idx_w in range(len(idx_W)):\n", | |
| " d = idx_D[idx_w] # get the word w's document\n", | |
| " w = idx_W[idx_w] # get the idx_wth word\n", | |
| " z_new = np.random.randint(self.N_K) # randomly generate the topic \n", | |
| " n_d_k[d][z_new] += 1\n", | |
| " n_w_k[w][z_new] += 1\n", | |
| " n_allw_k[z_new] += 1\n", | |
| " Z[idx_w] = z_new\n", | |
| " \n", | |
| " for it in range(max_iter): # loop for each iteration\n", | |
| " for idx_w in range(len(idx_W)):\n", | |
| " d = idx_D[idx_w] # get the idx_wth word's document\n", | |
| " w = idx_W[idx_w] # get the idx_wth word\n", | |
| " z = int(Z[idx_w]) # get the idx_wth word's current topic assignment\n", | |
| " \n", | |
| " n_d_k[d][z] -= 1\n", | |
| " n_w_k[w][z] -= 1\n", | |
| " n_allw_k[z] -= 1\n", | |
| " \n", | |
| " p = np.zeros(self.N_K)\n", | |
| " for k in range(self.N_K): # loop for each topic\n", | |
| " p[k] = (n_d_k[d][k] + self.alpha) * (n_w_k[w][k] + self.eta) / (n_allw_k[k] + self.eta * N_W)# get current word's each topic assignment probability (un-normalized)\n", | |
| " p = p/p.sum()\n", | |
| " \n", | |
| " z_new = np.random.choice(self.N_K,p=p)\n", | |
| " Z[idx_w] = z_new\n", | |
| " \n", | |
| " n_d_k[d][z_new] += 1\n", | |
| " n_w_k[w][z_new] += 1\n", | |
| " n_allw_k[z_new] += 1\n", | |
| " \n", | |
| " return (n_d_k/np.sum(n_d_k,axis=1)[:,np.newaxis], np.transpose(n_w_k/n_allw_k))\n", | |
| " \n", | |
| " # convert a sparse matrix of document(represented as bag of words) to lists of word index and of document index \n", | |
| " def matrix_to_list(self, X):\n", | |
| " \"\"\"\n", | |
| " Convert a matrix of document to a list of word index and a list of document index. \n", | |
| " The input matrix should be a sparse matrix.\n", | |
| " The kth element in the list of document index corresponding to the kth word.\n", | |
| " Args:\n", | |
| " X: a sparse matrix of documents.\n", | |
| " Returns:\n", | |
| " (idx_W, idx_D): a tuple consist of a list of word index and a list of document index\n", | |
| " \"\"\"\n", | |
| " X = X.tocoo()\n", | |
| " data = X.data\n", | |
| " row = X.row\n", | |
| " col = X.col\n", | |
| " idx_W = np.repeat(col,data)\n", | |
| " idx_D = np.repeat(row,data)\n", | |
| " \n", | |
| " return (idx_W, idx_D)\n", | |
| " \n", | |
| " def visualization(self, Theta ,Beta ,n='all', m='all'):\n", | |
| " \"\"\"\n", | |
| " Visualize the document topic proportion and topic word distribution\n", | |
| " Args:\n", | |
| " Theta: document topic proportion.\n", | |
| " Beta: topic word distribution for Theta.\n", | |
| " n: number of topics to visualize. Default is all. Ordered by value\n", | |
| " m: number of words to visualize for Beta. Default is all. Ordered by value\n", | |
| " \"\"\"\n", | |
| " if(n=='all' or n > Theta.shape[1]): n = Theta.shape[1]\n", | |
| " if(m=='all' or m > Beta.shape[1]): m = Beta.shape[1]\n", | |
| " \n", | |
| " plot_Theta_index = Theta.argsort(axis=1)[:,::-1][:,:n]\n", | |
| " plot_Beta_index = Beta.argsort(axis=1)[:,::-1][:,:m]\n", | |
| " # get the row/column index of item to be plot\n", | |
| " col_Theta = list(plot_Theta_index.flat)\n", | |
| " col_Beta = list(plot_Beta_index.flat)\n", | |
| " row_Theta = np.repeat(np.arange(Theta.shape[0]),n)\n", | |
| " row_Beta = np.repeat(np.arange(Beta.shape[0]),m)\n", | |
| " plot_Theta = Theta[row_Theta,col_Theta].reshape([Theta.shape[0],n])\n", | |
| " plot_Beta = Beta[row_Beta,col_Beta].reshape([Beta.shape[0],m])\n", | |
| " \n", | |
| " fig, ax= plt.subplots(ncols=Theta.shape[0])\n", | |
| " i=0\n", | |
| " for col in ax:\n", | |
| " col.bar(plot_Theta_index[i], plot_Theta[i])\n", | |
| " i += 1\n", | |
| " plt.tight_layout()\n", | |
| " \n", | |
| " fig, ax= plt.subplots(ncols=Beta.shape[0])\n", | |
| " i=0\n", | |
| " for col in ax:\n", | |
| " col.bar(plot_Beta_index[i], plot_Beta[i])\n", | |
| " i += 1\n", | |
| " plt.tight_layout()\n", | |
| " \n", | |
| " plt.show()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Play with our model\n", | |
| "Here we show two examples. One is a very toy example to check if our model works well. In another example, we will get some short paragraph from real news as our document and put them into our LDA model and the python sklearn package's LDA model and compare their result. The result may be a little different since we use Gibbs sampling to do inference while sklearn use variational method.\n", | |
| "\n", | |
| "__note__: In the real application, LDA works not good if documents are too short. It works well for hung amount of documents and also after some pre-processing such as lemmatization, getting rid of stop words, etc." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "- Toy example" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 92, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "******Standard Gibbs Sampling******\n", | |
| "Document topic proportion:\n", | |
| "[[ 0.05501766 0.94498234]\n", | |
| " [ 0.15035029 0.84964971]\n", | |
| " [ 0.80038678 0.19961322]\n", | |
| " [ 0.76326565 0.23673435]\n", | |
| " [ 0.04671193 0.95328807]]\n", | |
| "Topic word proportion\n", | |
| "[[ 0.27760591 0.43785051 0.0043018 0.06959751 0.21064427]\n", | |
| " [ 0.00142387 0.04409961 0.54931774 0.34720817 0.05795061]]\n", | |
| "Visualization\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAFT5JREFUeJzt3X2sZVdZx/Hvw2AlqUCJUxPS9jJVy8uAhMqFYkgEpZhpG6cmInYaIjWVGw2lWqqxBAK1/lMhQtCUl6GZFElKKY0vo1RrlBqUtNiLtJW2KY61wgBJBygYQ6AMPP5xzh3PnHtmzp4z6+6z9t7fTzLJPefs2Xd19cz+7b322s+KzESSpNo8adkNkCRpFgNKklQlA0qSVCUDSpJUJQNKklQlA0qSVCUDStImEbErIh6OiAMRcc2Mz1ci4s6I+FxE3B8RFy6jnV0QEfsi4rGI+PwxPo+I+JNxX98fET/ddhtrZUBJOkpEbANuAC4AdgJ7ImLn1GZvA27NzHOBS4D3tdvKTrkJ2HWczy8Azhn/WQPe30KbOsGAkjTtpcCBzHwkM58AbgEuntomgaeNf3468JUW29cpmfkp4BvH2eRi4M9y5G7gtIh4Zjutq9uTl/WLt2/fnjt27FjWrz8pn/3sZ7+Wmacvux2T7M/yBtynZwBfmnh9EDhvaptrgb+PiDcBpwLnz9vpgPtznln9fQbw1ekNI2KN0VUWp5566ouf+9znblGTtlbT/lxaQO3YsYP19fVl/fqTEhH/vew2TLM/yxtwn8aM96Zrou0BbsrMP46InwE+EhEvyMwfTLXjyAF1ZWVlqP05d/cz3ptZgy4z9wJ7AVZXV7Pv/ekQn6RpB4GzJl6fyeYhvMuBWwEy8y7gKcD26R1l5t7MXM3M1dNPr+4iuRZN+nuQDChJ0+4BzomIsyPiFEaTIPZPbfNF4FUAEfE8RgF1qNVW9sd+4NfGs/leBnwrMzcN7w3R0ob4JNUpMw9HxBXAHcA2YF9mPhAR1wHrmbkfuBr4UERcxWg46rJ0aYSZIuKjwCuB7RFxEHgH8EMAmfkB4HbgQuAA8G3g15fT0voYUJI2yczbGR04J997+8TPDwIvb7tdXZSZe+Z8nsAbW2pOpzjEJ0mqkgElSaqSASVJqpIBpc6zbpzUTwaUOs26cVJ/GVDqOuvGST3V+2nmO675RLF9PXr9RcX21VUV9mfRunHTpXmgyv/mTrM/u2WZ/7+8glLXnUjduDMZPRD5kYiY+d23NI9UDwNKXVesbpykuhhQ6jrrxkk9ZUCp0zLzMLBRN+4hRrP1HoiI6yJi93izq4E3RMR9wEexbpzUCb2fJKH+s26c1E9eQUmSqmRASZKqZEBJkqpkQEmSqmRALYHFTSVpPgOqZRY3laRmDKj2WdxUkhrwOaj2FS1uKkl95RVU+4oVN42ItYhYj4j1Q4es3COpXwyo9hUrbmrlbUl9ZkC1z+Kmql6DmabviYh7x3++EBHfXEY71W/eg2pZZh6OiI3iptuAfRvFTYH1zNzPqLjphyLiKkbDfxY3VWsmZpq+mtEV/z0RsX9c0xCAzLxqYvs3Aee23lD1ngG1BBY3VeWOzDQFiIiNmaYPHmP7PcA7WmqbBsQhPknTZs00PWPWhhHxLOBs4JPH+NyJPFqYASVpWpOZphsuAW7LzO/P+tCJPDoZBpSkaU1mmm64hNEikFJxBpSkaU1mmhIRzwGeAdzVcvs0EAaUpKNk5mFgY6bpQ4zqQj4QEddFxO6JTfcAtzjDVFvFWXySNpk303T8+to226Th8QpKklQlA0qdZ9UDqZ8c4lOnWfVA6i+voNR1TdbXmrQHp0VLnWBAqeuKVT0Yb2PlA6kSBpS6rljVA7DygVQTA0pdZ9UDqacMKHWdVQ+knmoUUA2m8a5ExJ0R8bmIuD8iLizfVGkzqx5I/TV3mnmTabzA2xgdGN4fETsZPYG+YwvaK21i1QPVLiJ2Ae9ltEjpjZl5/dTnK8CHgdPG21wz/l4PWpMrqCbTeBN42vjnp3PsewCSNCgTJ/kXADuBPeMT+UkbJ/nnMhqmfl+7raxTk4BqMo33WuB1EXGQ0Znsm2btyCm8kgbIk/wFNQmoJtN49wA3ZeaZwIXARyJi076dwitpgIqd5A9Nk4BqMo33cuBWgMy8C3gKsL1EA/vI2nHSoBQ7yR/aKFSTgGoyjfeLwKsAIuJ5jAKq/723gCbj0Zl5VWa+KDNfBPwp8Oftt1RSIcVO8oc2CjU3oBpO470aeENE3MfoQcjLnM57TNaOk4bFk/wFNapmPm8a73jK+cvLNq23Zo1Hnzdrw3m14yJiDVgDWFlZKdtKSUVk5uGI2DjJ3wbs2zjJB9Yzcz+jk/wPRcRVjIb/PMnH5TaWoVjtuMzcC+wFWF1dHfyXWaqVJ/mLsdRR+6wdJ0kNGFDts3acJDVgQLXM2nGS1Iz3oJbA2nGSNJ9XUJI2mfcw+Xib10bEgxHxQETc3HYb1X9eQUk6SpMVDCLiHOAtwMsz8/GI+LHltFZ95hWUpGlNHiZ/A3BDZj4OkJmPtdxGDYABJWlak+KmzwaeHRGfjoi7x+sdbTK02nEqy4CSNK3Jw+RPBs4BXsloxumNEXHapr80sNpxKsuAkjStycPkB4G/yszvZeZ/AQ8zCiypGANK0rQmD5P/JfBzABGxndGQ3yOttlK9Z0BJOkrDh8nvAL4eEQ8CdwK/l5lfX06L1VdOM1fnjW/Qv5dRpegbM/P6Gdu8ltGqpQncl5mXttrIjmlQ3DSBN4//SFvCgFKn+cyO1F8O8anrfGZH6ikDSl1X7Jkd8LkdqSYGlLqu2DM74HM7Uk0MKHWdz+xIPWVAqet8ZkfqKQNKneYzO1J/Oc1cneczO1I/eQUlSaqSAbUErlYqSfM5xNcyKx9IUjNeQbXPygeS1IAB1b6ilQ8kqa8c4mvfiVY+OBP454h4QWZ+86gdRawBawArKyvlWypJS+QVVPuKVT6wLI+kPjOg2mflA0lqwIBqmZUPJKkZ70EtgZUPJGk+r6AkSVUyoCRJVTKgJElVMqAkbTKvXmREXBYRhyLi3vGf31hGO9VvTpKQdJQm9SLHPpaZV7TeQA2GV1CSpjWpFyltOQNK0rQm9SIBfjki7o+I2yLirBmfSyfFgJI0rUm9yL8GdmTmC4F/AD48c0cRaxGxHhHrhw4dKtxM9V2jgHKBPWlQ5taLzMyvZ+Z3xy8/BLx41o6sF6mTMTegJm6YXgDsBPZExM6pbSYX2Hs+8Dtb0FZJ7ZhbLzIinjnxcjejsl06Bk/yF9PkCsoF9lQ1p0SX1bBe5JXjA+l9wJXAZctpbf08yV9ck2nms26Ynje1zbMBIuLTwDbg2sz8u+kduX6RSnNK9NZoUC/yLYwOqJrvyEk+QERsnORPfkc9yZ+hyRXUiS6wtwe4MSJO2/SXHI9WeU6JVu1cRXtBTQKq2AJ70hZwSrRqV+wkf2izIpsElAvsqWbFpkTD8A4AaoWraC9obkC5wJ4qV2xK9HjbQR0A1ApP8hfUqBafC+ypYkf+8QNfZvSP/9LJDSLimZn51fFLp0SrVZl5OCI2TvK3Afs2TvKB9czcP/7sF8Yn+d/Hk3zAYrHquIb/+K8cX+0fBr6BU6LVMk/yF2NAqfOcEi31k7X4lsAHSyVpPq+gWuaDpZLUjFdQ7fPBUklqwIBqX7EHS31mR1KfGVDtK/Zgqc/sSOozA6p9RR8slaS+MqDa51o7ktSAs/ha5oOlktSMAbUEPlgqSfM5xCdJqpIBJUmqkgElSaqSASVJqpIBJUmqkgElSaqSASVpk3lLwkxs95qIyIhYbbN9GgYDStJRJpaEuQDYCeyJiJ0ztnsqcCXwmXZbqKEwoCRNa7okzB8C7wS+02bjNBwGlKRpc5eEiYhzgbMy82+OtyOXhNHJMKAkTTvukjAR8STgPcDV83bkkjA6GQaUpGnzloR5KvAC4J8i4lHgZcB+J0qoNANKneeMs+KOuyRMZn4rM7dn5o7M3AHcDezOzPXlNFd9ZUCp05xxVl5mHgY2loR5CLh1Y0mY8TIwUitcbkNdd2TGGUBEbMw4e3Bqu40ZZ7/bbvO6ad6SMFPvv7KNNml4vIJS1xWbcTbe1llnUiUMKHVdsRln4KwzqSYGlLrOGWdSTxlQS+Css6KccSb1lAHVMmedleWMM6m/nMXXPmedFeaMM6mfvIJqX9FZZ5LUVwZU+4rNOnNKtKQ+M6DaV2zWmVOiJfWZAdU+Z51JUgMGVMucdSZJzTiLbwmcdSZJ83kFJUmqkgElSaqSASVJqlKjgLJ2nCQtzmPoYuYGlLXjJGlxHkMX1+QK6kjtuMx8AtioHTdto3bcdwq2T5K6zmPogpoEVLHacZbmkTRA1t9cUJOAKlY7ztI8kgbI+psLahJQrlgqSYuz/uaCmgSUteMkaXEeQxc0N6CsHScNz7xp0RHxmxHx7xFxb0T8y6xZaRrxGLq4RrX4rB0nDcfEtOhXMxqeuici9mfm5KrPN2fmB8bb7wbeDexqvbEd4TF0MVaSkDRt7rTozPyfiZenMnHTXyrFauaSps2aFn3e9EYR8UbgzcApwM/P2lFErAFrACsrK8Ubqn7zCkqd5/2S4o47LfrIG5k3ZOZPAL8PvG3WjoY260xlGVDqtIZlZG7OzJ/KzBcxelL/3S03s2vmTYuedgvwS1vaIg2SAaWu835JecedFg0QEedMvLwI+I8W26eB8B6Uuq7Y/ZLxdoO/Z5KZhyNiY1r0NmDfxrRoYD0z9wNXRMT5wPeAx4HXL6/F6isDSl3X+H4JcENEXMrofsnMA2pm7gX2Aqyurg72SmvetOjM/O3WG6XBcYhvCbypX5T3S6SeMqBa5k394rxfIvWUQ3ztO3JTHyAiNm7qH3lK35v6zXm/ROovA6p9PgRZmPdLpH5yiK99PgQpSQ0YUO3zpr4kNWBAtc+b+pLUgPegWuZNfUlqxoBaAm/qS9J8DvFJkqpkQEmSqmRASZKqZEBJkqrkJAlJvbLjmk8U29ej119UbF86cV5BSZKqZEBJkqpkQEmSqmRASZKqZEBJkqpkQEmSqmRASZKqZEBJ2iQidkXEwxFxICKumfH5myPiwYi4PyL+MSKetYx2qt8MKElHiYhtwA3ABcBOYE9E7Jza7HPAama+ELgNeGe7rdQQGFCSpr0UOJCZj2TmE4xWdb54coPMvDMzvz1+eTejlaGlogwodZ7DUcWdAXxp4vXB8XvHcjnwt1vaIg2SAaVOczhqS8SM93LmhhGvA1aBdx3j87WIWI+I9UOHDhVsoobAgFLXORxV3kHgrInXZwJfmd4oIs4H3grszszvztpRZu7NzNXMXD399NO3pLHqL6uZq+tmDUedd5ztjzscFRFrwBrAyspKifYdV6WVt+8BzomIs4EvA5cAl05uEBHnAh8EdmXmY6V+sTTJK6gl8J5JUcWGo8AzfoDMPAxcAdwBPATcmpkPRMR1EbF7vNm7gB8BPh4R90bE/iU1Vz3mFVTLJu6ZvJrR2f49EbE/Mx+c2Gzjnsm3I+K3GN0z+dX2W9sJJzoc9YpjDUfp/2Xm7cDtU++9feLn81tvlAbHK6j2ec+krCPDURFxCqPhqKPO5ieGo3Y7HCV1hwHVvmJTeJ0h5XCU1GcO8bVvkXsmr5j1eWbuBfYCrK6uztzHEDgcJfVToysob+oXVWwKr6Ru8Bi6mLkB5YOQxXnPRBoQj6GLa3IF5U39grxnIg2Ox9AFNbkHVexByLYfgqyV90ykQSn6MPmQNAkob+pL0uKKHUOHdpLfZIjPm/qStDhrGy6oSUB5U1+SFucxdEFzA8qb+pK0OI+hi2v0oK439SVpcR5DF2OpI0lSlQwoSVKVrMXXM5UugCdJJ8wrKElSlQwoSVKVDChJUpUMKElSlQwoSVKVDChJUpUMKElSlQwoSZs0WKL8ZyPi3yLicES8ZhltVP8ZUOo8D6ZlNVyi/IvAZcDN7bZOQ2JAqdM8mG6JJkuUP5qZ9wM/WEYDNQwG1BJ4xl+UB9PyZi1RfsYiO4qItYhYj4j1Q4cOFWmchsOAapln/MUVO5jqiMZLlM8ztBVgVZYB1T7P+MsqdjAFz/jHGi1RLm01A6p9Dp+UVfRg6hk/0GCJcqkNBlT7HD4py4NpYU2WKI+Il0TEQeBXgA9GxAPLa7H6yvWg2ufwSUGZeTgiNg6m24B9GwdTYD0z90fES4C/AJ4B/GJE/EFmPn+Jza5egyXK72H03ZW2jAHVviNn/MCXGZ3xX7rcJnWbB1Opnxzia5nDJ5LUjFdQS+AZvyTN5xWUJKlKBpQkqUoGlCSpSgaUJKlKBpQkqUoGlCSpSgaUJKlKBpQkqUpLf1B3xzWfKLavR6+/qNi+JEnL5RWUJKlKS7+CkiQtrs+jUF5BSZKqZEBJkqpkQEmSqmRASZKqZEBJkqpkQEmSqmRASZKqZEBJkqrUKKAiYldEPBwRByLimhmf/3BEfGz8+WciYkfphvaJ/VmW/VmefVqW/bmYuQEVEduAG4ALgJ3AnojYObXZ5cDjmfmTwHuAPyrd0L6wP8uyP8uzT8uyPxfX5ArqpcCBzHwkM58AbgEuntrmYuDD459vA14VEVGumb1if5Zlf5Znn5Zlfy6oSS2+M4AvTbw+CJx3rG0y83BEfAv4UeBrkxtFxBqwNn75vxHx8Am0dfv0/qbF4uccc/c9tf9nLfybhtGfJ7r/KvoTTqpPT/Q7dKJO9P9XFX3a5f6c2n/X+xM6eAxtElCzUjwX2IbM3AvsbfA7NzciYj0zVxf5u8vc96xfN+O9XvVnG/uf/FUz3luoP2HxPu1Rf0IF31H7c+Y21f6b36p9NxniOwicNfH6TOArx9omIp4MPB34RokG9pD9WZb9WZ59Wpb9uaAmAXUPcE5EnB0RpwCXAPunttkPvH7882uAT2bmzDNU2Z+F2Z/l2adl2Z+Lysy5f4ALgS8A/wm8dfzedcDu8c9PAT4OHAD+FfjxJvs9kT/AWul9trHvIfZn231qf/avT+3Pbn1Ht2rfMd65JElVsZKEJKlKBpQkqUrVB9S8EiEnue99EfFYRHy+5H5rtpX9Od6/fVp23/Zn2X3bn2X3vaX9WXVANSwRcjJuAnYV3F/VWuhPsE/9jp4E+7Osrvdn1QFFsxIhC8vMTzGsZw22tD/BPsXv6MmyP8vqdH/WHlCzSoScsaS29IH9WZ59Wpb9WVan+7P2gGpcokaN2J/l2adl2Z9ldbo/aw+oJiVC1Jz9WZ59Wpb9WVan+7P2gGpSIkTN2Z/l2adl2Z9ldbo/qw6ozDwMXAHcATwE3JqZD5Taf0R8FLgLeE5EHIyIy0vtu0Zb3Z9gn+J39KTYn2V1vT8tdSRJqlLVV1CSpOEyoCRJVTKgJElVMqAkSVUyoCRJVTKgJElVMqAkSVX6P8hlXAIqvYSoAAAAAElFTkSuQmCC\n", | |
| "text/plain": [ | |
| "<matplotlib.figure.Figure at 0x1a170f0d68>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "data": { | |
| "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAD+hJREFUeJzt3V+IpXd9x/H3x023Fxp6k7ko+8cJdildVAxOV6FQxaawaWRXaEo3RVFQFsHFSCx1RQllvUkjaAvdC7c2YP/INqa9mJqVQK1eeBHZiQbtZtk6DdvuNAXXP2hBNC5+ezGz6XGc3TmzmXnOd+a8X7Awv+f55Xm+eZgvn/OcOc/vpKqQJKmbl026AEmS1mJASZJaMqAkSS0ZUJKklgwoSVJLBpQkqSUDSpLUkgElSWrJgJIktXTbpE58xx131Ozs7KROL92Sp59++jtVNTPpOkbZS9puxu2jiQXU7OwsCwsLkzq9dEuS/Oeka1jNXtJ2M24f+RafJKklA0qS1JIBJUlqyYCSJLVkQEmSWjKgJEktGVCSpJYMKElSSwaUJKklA0qS1NLEljraDmZPPjHIeS4/fO8g55Gm1VC9DPbzZvIOSpLUkgElSWrJgJIktWRASZJaMqAkSS0ZUJKklgwoSVJLBpQkqSUDSpLUkgElSWrJgJIktWRASZJaMqAkSS0ZUJKklgwoSVJLBpQkqaWxAirJ4SSXkiwmOXmTefclqSRzm1eitDOs10dJ3pXkapJnVv69ZxJ1Sl2s+426SXYBp4HfBZaA80nmq+rZVfNuB94PfHUrCpW2s3H7CPiHqjoxeIFSQ+PcQR0CFqvquap6ATgLHF1j3seAR4Afb2J90k4xbh9JWjFOQO0BroyMl1a2vSjJXcC+qvr8zQ6U5HiShSQLV69e3XCx0ja2bh+t+P0k30jyeJJ9w5Qm9TROQGWNbfXizuRlwCeBD653oKo6U1VzVTU3MzMzfpXS9nfTPlrxz8BsVb0W+BfgMzc8mC/2NAXGCaglYPSV3F7g+ZHx7cCrgS8nuQy8EZj3gxLSz1mvj6iq71bVT1aGfwW8/kYH88WepsE4AXUeOJDkziS7gWPA/PWdVfWDqrqjqmarahZ4CjhSVQtbUrG0Pd20jwCS/OrI8AhwccD6pHbW/RRfVV1LcgJ4EtgFPFpVF5KcAhaqav7mR5A0Zh+9P8kR4BrwPeBdEytYamDdgAKoqnPAuVXbHrrB3De/9LKknWe9PqqqDwMfHrouqStXkpAktWRASZJaMqAkSS0ZUJKklgwoSVJLBpQkqSUDSpLUkgElSWrJgJIktWRASZJaMqAkSS0ZUJKklgwoSVJLBpQkqSUDSpLUkgElSWrJgJIktWRASZJaMqAkSS0ZUJKklgwoSVJLBpQkqSUDSpLUkgElSWrJgJIktWRASZJaMqAkSS0ZUJKklgwoSVJLBpQkqSUDSpLUkgElSWrJgJIktWRASZJaMqCkASU5nORSksUkJ28y774klWRuyPqkTgwoaSBJdgGngXuAg8D9SQ6uMe924P3AV4etUOrFgJKGcwhYrKrnquoF4CxwdI15HwMeAX48ZHFSNwaUNJw9wJWR8dLKthcluQvYV1Wfv9mBkhxPspBk4erVq5tfqdSAASUNJ2tsqxd3Ji8DPgl8cL0DVdWZqpqrqrmZmZlNLFHqw4CShrME7BsZ7wWeHxnfDrwa+HKSy8AbgXk/KKFpZUBJwzkPHEhyZ5LdwDFg/vrOqvpBVd1RVbNVNQs8BRypqoXJlCtNlgElDaSqrgEngCeBi8BjVXUhyakkRyZbndTPbZMuQJomVXUOOLdq20M3mPvmIWqSuvIOSpLU0lgBtd7T70nem+SbSZ5J8pW1Hj6UJGkj1g2oMZ9+/2xVvaaqXsfyA4af2PRKJUlTZZw7qHWffq+qH44MX87Isx2SJN2KcT4ksdbT729YPSnJ+4AHgd3AW9Y6UJLjwHGA/fv3b7RWSdIUGecO6qZPv7+4oep0Vb0K+BDw0bUO5NPvkqRxjRNQ6z39vtpZ4G0vpShJksYJqJs+/Q6Q5MDI8F7gW5tXoiRpGq37N6iqupbk+tPvu4BHrz/9DixU1TxwIsndwE+B7wPv3MqiJUk731grSaz39HtVPbDJdUmSppwrSUiSWjKgJEktGVCSpJYMKElSSwaUJKklA0qS1FLbLyycPfnEIOe5/PC9g5xHkrQxbQNKknYSX3RvnG/xSZJaMqAkSS0ZUJKklgwoSVJLBpQkqSUDSpLUkgElSWrJgJIktWRASZJaMqAkSS0ZUJKklgwoSVJLBpQkqSUDSpLUkgElSWrJgJIktWRASQNKcjjJpSSLSU6usf+9Sb6Z5JkkX0lycBJ1Sh0YUNJAkuwCTgP3AAeB+9cIoM9W1Wuq6nXAI8AnBi5TasOAkoZzCFisqueq6gXgLHB0dEJV/XBk+HKgBqxPauW2SRcgTZE9wJWR8RLwhtWTkrwPeBDYDbxlrQMlOQ4cB9i/f/+mFyp14B2UNJysse0X7pCq6nRVvQr4EPDRtQ5UVWeqaq6q5mZmZja5TKkHA0oazhKwb2S8F3j+JvPPAm/b0oqkxgwoaTjngQNJ7kyyGzgGzI9OSHJgZHgv8K0B65Na8W9Q0kCq6lqSE8CTwC7g0aq6kOQUsFBV88CJJHcDPwW+D7xzchVLk2VASQOqqnPAuVXbHhr5+YHBi5Ka8i0+SVJLBpQkqSUDSpLUkgElSWrJgJIktWRASZJaMqAkSS0ZUJKklgwoSVJLriSh9mZPPjHYuS4/fO9g55J0c95BSZJaMqAkSS2NFVBJDie5lGQxyck19j+Y5Nkk30jyxSSv3PxSJUnTZN2ASrILOA3cAxwE7k9ycNW0rwNzVfVa4HHgkc0uVJI0Xca5gzoELFbVc1X1Asvf8nl0dEJVfamqfrQyfIrlbwqVJOmWjRNQe4ArI+OllW038m7gC2vtSHI8yUKShatXr45fpSRp6owTUFljW605MXk7MAd8fK39VXWmquaqam5mZmb8KiVJU2ec56CWgH0j473A86snrXxN9UeAN1XVTzanPEnStBrnDuo8cCDJnUl2A8eA+dEJSe4CPgUcqapvb36ZkqRps25AVdU14ATwJHAReKyqLiQ5leTIyrSPA68APpfkmSTzNzicJEljGWupo6o6B5xbte2hkZ/v3uS6JElTzpUkJEktGVCSpJYMKElSSwaUJKklA0qS1JIBJUlqyYCSJLVkQEmSWjKgJEktGVCSpJYMKElSSwaUJKklA0oaSJLDSS4lWUxyco39DyZ5Nsk3knwxySsnUafUhQElDSDJLuA0cA9wELg/ycFV074OzFXVa4HHgUeGrVLqxYCShnEIWKyq56rqBeAscHR0QlV9qap+tDJ8iuVvr5amlgElDWMPcGVkvLSy7UbeDXxhSyuSmhvrCwslvWRZY1utOTF5OzAHvOmGB0uOA8cB9u/fvxn1Se14ByUNYwnYNzLeCzy/elKSu4GPAEeq6ic3OlhVnamquaqam5mZ2fRipQ4MKGkY54EDSe5Mshs4BsyPTkhyF/AplsPp2xOoUWrFgJIGUFXXgBPAk8BF4LGqupDkVJIjK9M+DrwC+FySZ5LM3+Bw0lTwb1DSQKrqHHBu1baHRn6+e/CipMa8g5IktWRASZJaMqAkSS0ZUJKklgwoSVJLBpQkqSUDSpLUkgElSWrJgJIktWRASZJaMqAkSS0ZUJKklgwoSVJLBpQkqSUDSpLUkgElSWrJgJIktWRASZJaMqAkSS0ZUJKklgwoSVJLBpQkqSUDSpLU0lgBleRwkktJFpOcXGP/byf5WpJrSe7b/DIlSdNm3YBKsgs4DdwDHATuT3Jw1bT/At4FfHazC5QkTafbxphzCFisqucAkpwFjgLPXp9QVZdX9v1sC2qUJE2hcd7i2wNcGRkvrWzbsCTHkywkWbh69eqtHEKSNCXGCaissa1u5WRVdaaq5qpqbmZm5lYOIUmaEuME1BKwb2S8F3h+a8qRJGnZOAF1HjiQ5M4ku4FjwPzWliVJmnbrBlRVXQNOAE8CF4HHqupCklNJjgAk+c0kS8AfAJ9KcmEri5Yk7XzjfIqPqjoHnFu17aGRn8+z/NafJEmbwpUkJEktGVCSpJYMKGkgLhkmbYwBJQ3AJcOkjRvrQxKSXjKXDJM2yDsoaRibtmQYuGyYpoMBJQ1j05YMA5cN03QwoKRhuGSYtEEGlDQMlwyTNsiAkgbgkmHSxvkpPmkgLhkmbYwBpZuaPfnEYOe6/PC9g51LUn++xSdJasmAkiS1ZEBJkloyoCRJLRlQkqSWDChJUksGlCSpJQNKktSSASVJasmAkiS1ZEBJklpyLT5JmhJDra25WetqegclSWrJgJIktWRASZJaMqAkSS0ZUJKklvwUn6Qt5bcy61Z5ByVJasmAkiS1ZEBJkloyoCRJLRlQkqSWDChJUksGlCSpJQNKktSSASVJasmAkiS1ZEBJkloyoCRJLRlQkqSWDChJUksGlCSppbG+DyrJYeAvgF3Ap6vq4VX7fxn4G+D1wHeBP6yqy5tb6nQa6rt0/B6dYdhL0vjWvYNKsgs4DdwDHATuT3Jw1bR3A9+vql8DPgn82WYXKm139pK0MePcQR0CFqvqOYAkZ4GjwLMjc44Cf7ry8+PAXyZJVdUm1iptdxPpJe/CtV1lvd/7JPcBh6vqPSvjdwBvqKoTI3P+bWXO0sr4P1bmfGfVsY4Dx1eGvw5c2qz/kRV3AN9Zd9bO53XYumvwyqqauZX/0F7adrwGy7biOozVR+PcQWWNbatTbZw5VNUZ4MwY57wlSRaqam6rjr9deB3aXgN7aRvxGiyb5HUY51N8S8C+kfFe4PkbzUlyG/ArwPc2o0BpB7GXpA0YJ6DOAweS3JlkN3AMmF81Zx5458rP9wH/6t+fpF9gL0kbsO5bfFV1LckJ4EmWPxr7aFVdSHIKWKiqeeCvgb9Nssjyq71jW1n0TWzZWx7bjNeh4TWwl7Ydr8GyiV2HdT8kIUnSJLiShCSpJQNKktTSjgmoJIeTXEqymOTkpOsZWpJ9Sb6U5GKSC0kemHRNk5RkV5KvJ/n8pGvZTqa9j8BeGjXpPtoRATXmEjI73TXgg1X1G8AbgfdN4TUY9QBwcdJFbCf20Yvspf830T7aEQHFyBIyVfUCcH0JmalRVf9TVV9b+fl/Wf6l2jPZqiYjyV7gXuDTk65lm5n6PgJ76boOfbRTAmoPcGVkvMQU/kJdl2QWuAv46mQrmZg/B/4E+NmkC9lm7KNVpryXJt5HOyWgxloeZhokeQXwj8AHquqHk65naEneCny7qp6edC3bkH00Ypp7qUsf7ZSAGmcJmR0vyS+x3FB/X1X/NOl6JuS3gCNJLrP8FtVbkvzdZEvaNuyjFfZSjz7aEQ/qrqxZ9u/A7wD/zfKSMn9UVRcmWtiAkgT4DPC9qvrApOvpIMmbgT+uqrdOupbtwD5aZi/9vEn20Y64g6qqa8D1JWQuAo9NW1Ox/IrnHSy/0nlm5d/vTboobR/20YvspSZ2xB2UJGnn2RF3UJKknceAkiS1ZEBJkloyoCRJLRlQkqSWDChJUksGlCSppf8DQLbf2t5Ife4AAAAASUVORK5CYII=\n", | |
| "text/plain": [ | |
| "<matplotlib.figure.Figure at 0x1a0e2e83c8>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Bag of words mapping function:\n", | |
| "{'like': 3, 'dog': 2, 'cat': 1, 'amy': 0, 'love': 4}\n", | |
| "******Collapsed Gibbs Sampling******\n", | |
| "Document topic proportion:\n", | |
| "[[ 1. 0. ]\n", | |
| " [ 0.8 0.2]\n", | |
| " [ 0. 1. ]\n", | |
| " [ 0. 1. ]\n", | |
| " [ 1. 0. ]]\n", | |
| "Topic word proportion\n", | |
| "[[ 0. 0. 0.66666667 0.33333333 0. ]\n", | |
| " [ 0.25 0.41666667 0.08333333 0. 0.25 ]]\n", | |
| "Visualization\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAE7hJREFUeJzt3X+IXel93/H3x1K2htaxQzWBoB/WtpVDhBuyqbp2MbSb2AbtGrSFOGEFJtmytQhEdohNqEzM1mz+SR1aQ0H5sQnLpgFbUfxHMk0UFpJsCAleV+PadSItSlRl6x22sJP11qWEZK3m2z/mStzO3tGcvXrm3Gdm3i8YmHPuo2e+fPe5+uy5c/ScVBWSJPXmTYsuQJKkWQwoSVKXDChJUpcMKElSlwwoSVKXDChJUpcMKElSlwwoSVKXDChJUpf2L+oHHzhwoI4ePbqoH39XvvSlL/1lVS0tuo5p9rM9e9qW/WxrL/RzYQF19OhRVlZWFvXj70qS/7HoGjayn+3Z07bsZ1t7oZ9+xCdJ6pIBJUnqkgElSeqSASVJ6pIBJUnqkgElSerSlgGV5KkkLyf5001eT5L/mOR6kq8m+d72Ze4uSU4muTbp2bkZrx9J8mySL096+tAi6twpXKNt2c+27Of8hlxBPQ2cvMPrDwLHJl9ngJ+/+7J2ryT7gPOs9+04cDrJ8Q3DPglcrKr7gEeAnxu3yh3naVyjLT2N/WzpaeznXLYMqKr6Q+DrdxjyMPCfat1zwNuSfEerAneh+4HrVXWjql4DLrDew2kFfOvk+7cCL41Y347jGm3LfrZlP+fX4ndQB4EXp45XJ+c025B+fQr4UJJV4BLwkXFK27Vco23Zz7bs5yZabHWUGedq5sDkDOuXsBw5cgSAo+d+u0EJ6174mQ80m2sbDenXaeDpqvr3Sf4Z8KtJ3llVf/v/TWQ/h7qrNbqdduh/r277CTuyp/4duokWV1CrwOGp40Ns8pFUVT1ZVSeq6sTSUlf7Lo5pSL8eAy4CVNUXgDcDBzZOZD8Hc422ZT/bsp+baBFQy8APT+5EeTfwjar6nw3m3a0uA8eS3JvkHtZvgljeMOZrwHsBknwX6wG1NmqVu4trtC372Zb93MSWH/El+RzwAHBg8juRfwt8C0BV/QLrvyN5CLgO/BXwr7ar2N2gqm4mOQs8A+wDnqqqK0meAFaqahn4OPBLSX6C9Uv9R6tq5iW/XKOt2c+27Of8tgyoqjq9xesF/FizivaAqrrE+qKcPvf41PdXgfeMXddO5Rpty362ZT/n504SkqQuGVCSpC4ZUJKkLhlQkqQuGVCSpC4ZUJKkLhlQkqQuGVCSpC4ZUJKkLhlQkqQuGVCSpC4ZUJKkLhlQkqQuGVCSpC4ZUJKkLhlQkqQuGVCSpC4ZUJKkLhlQkqQuGVCSpC4ZUJKkLhlQC5DkZJJrSa4nOTfj9c8k+crk68+S/K9F1ClJi7R/0QXsNUn2AeeB9wOrwOUky1V19daYqvqJqfEfAe4bvVBJWjCvoMZ3P3C9qm5U1WvABeDhO4w/DXxulMokqSMG1PgOAi9OHa9Ozr1OkrcD9wK/v8nrZ5KsJFlZW1trXqgkLZIBNb7MOFebjH0E+HxV/d9ZL1bVk1V1oqpOLC0tNStQknpgQI1vFTg8dXwIeGmTsY/gx3uS9igDanyXgWNJ7k1yD+shtLxxUJLvBL4N+MLI9UlSFwyokVXVTeAs8AzwPHCxqq4keSLJqamhp4ELVbXZx3+StKt5m/kCVNUl4NKGc49vOP7UmDVJUm+8gpIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1aVBADXjA3pEkzyb5cpKvJnmofanSbK7P9uxpW/ZzPlsG1NQD9h4EjgOnkxzfMOyTrG/Zcx/re8v9XOtCpVlcn+3Z07bs5/yGXEENecBeAd86+f6tbL47t9Sa67M9e9qW/ZzTkIAa8oC9TwEfSrLK+h5zH5k1kQ/Y0zZotj7BNTrhe74t+zmnIQE15AF7p4Gnq+oQ8BDwq0leN7cP2NM2aLY+wTU64Xu+Lfs5pyEBNeQBe48BFwGq6gvAm4EDLQqUtuD6bM+etmU/5zQkoIY8YO9rwHsBknwX683d/def6oHrsz172pb9nNOWATXwAXsfBz6c5L+x/ojyR33Qnsbg+mzPnrZlP+c36IGFWz1gr6quAu9pW5o0jOuzPXvalv2cjztJSJK6ZEBJkrpkQEmSumRALcBW+3JNxvxQkqtJriT57Ng1StKiDbpJQu1M7cv1ftb/fcTlJMuTX5LeGnMM+ATwnqp6Ncm3L6ZaSVocr6DGN2Rfrg8D56vqVYCqennkGiVp4Qyo8Q3Zl+sdwDuS/HGS55KcnDXRXtuXS9LeYkCNb8i+XPuBY8ADrO/R9ctJ3va6P7TH9uWStLcYUOMbsi/XKvCbVfXNqvoL4BrrgSVJe4YBNb4h+3L9BvB9AEkOsP6R341Rq5SkBTOgRjZwX65ngFeSXAWeBX6yql5ZTMWStBjeZr4AA/blKuBjky9J2pO8gpIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYBagCQnk1xLcj3JuRmvP5pkLclXJl//ehF1StIi7V90AXtNkn3AeeD9wCpwOclyVV3dMPTXqurs6AVKUie8ghrf/cD1qrpRVa8BF4CHF1yTJHXHgBrfQeDFqePVybmNfiDJV5N8PsnhWRMlOZNkJcnK2tradtQqSQtjQI0vM87VhuP/DBytqu8Gfhf4lVkTVdWTVXWiqk4sLS01LlOSFsuAGt8qMH1FdAh4aXpAVb1SVX8zOfwl4J+MVJskdWNQQG1119lkzA8luZrkSpLPti1zV7kMHEtyb5J7gEeA5ekBSb5j6vAU8PyI9e04rs/27Glb9nM+W97FN+SusyTHgE8A76mqV5N8+3YVvNNV1c0kZ4FngH3AU1V1JckTwEpVLQMfTXIKuAl8HXh0YQV3zvXZnj1ty37Ob8ht5rfvOgNIcuuus+nboj8MnK+qVwGq6uXWhe4mVXUJuLTh3ONT33+C9cWqrbk+27OnbdnPOQ35iG/IXWfvAN6R5I+TPJfk5KyJvOtM26DZ+gTX6ITv+bbs55yGBNSQu872A8eAB4DTwC8nedvr/pB3nam9ZusTXKMTvufbsp9zGhJQW951Nhnzm1X1zar6C+Aa682Wtpvrsz172pb9nNOQgNryrjPgN4DvA0hygPXL1RstC5U24fpsz562ZT/ntGVAVdVN4NZdZ88DF2/ddTa504zJa68kuQo8C/xkVb2yXUVLt7g+27OnbdnP+Q3aLHbAXWcFfGzyJY3K9dmePW3Lfs7HnSQkSV0yoCRJXTKgJEldMqAkSV0yoCRJXTKgJEldMqAkSV0yoCRJXTKgJEldMqAkSV0yoCRJXTKgJEldMqAkSV0yoCRJXTKgJEldMqAWIMnJJNeSXE9y7g7jPpikkpwYsz5J6oEBNbIk+4DzwIPAceB0kuMzxr0F+CjwxXErlKQ+GFDjux+4XlU3quo14ALw8IxxPw18GvjrMYuTpF4YUOM7CLw4dbw6OXdbkvuAw1X1W3eaKMmZJCtJVtbW1tpXKkkLZECNLzPO1e0XkzcBnwE+vtVEVfVkVZ2oqhNLS0sNS5SkxTOgxrcKHJ46PgS8NHX8FuCdwB8keQF4N7DsjRKS9hoDanyXgWNJ7k1yD/AIsHzrxar6RlUdqKqjVXUUeA44VVUriylXkhbDgBpZVd0EzgLPAM8DF6vqSpInkpxabHWS1I/9iy5gL6qqS8ClDece32TsA2PUJEm98QpKktQlA0qS1CUDSpLUJQNKktQlA0qS1CUDSpLUJQNKktQlA0qS1CUDSpLUJQNKktQlA0qS1CUDSpLUJQNKktQlA0qS1KVBAZXkZJJrSa4nOXeHcR9MUj79VWNyfbZnT9uyn/PZMqCS7APOAw8Cx4HTSY7PGPcW4KPAF1sXKW3G9dmePW3Lfs5vyBXU/cD1qrpRVa8BF4CHZ4z7aeDTwF83rE/aiuuzPXvalv2c05CAOgi8OHW8Ojl3W5L7gMNV9Vt3mijJmSQrSVbW1tbecLHSDM3W52Ssa9T3fGv2c05DAiozztXtF5M3AZ8BPr7VRFX1ZFWdqKoTS0tLw6uUNtdsfYJrdML3fFv2c05DAmoVODx1fAh4aer4LcA7gT9I8gLwbmDZX/JpJK7P9uxpW/ZzTkMC6jJwLMm9Se4BHgGWb71YVd+oqgNVdbSqjgLPAaeqamVbKt4FtrqjJ8mPJvmTJF9J8kezfqGq21yf7dnTtuznnLYMqKq6CZwFngGeBy5W1ZUkTyQ5td0F7jYD7+j5bFX946r6HtZ/afofRi5zx3B9tmdP27Kf89s/ZFBVXQIubTj3+CZjH7j7sna123f0ACS5dUfP1VsDqup/T43/u0x9Xq3Xc322Z0/bsp/zGRRQamrWHT3v2jgoyY8BHwPuAb5/1kRJzgBnAI4cOdK8UElaJLc6Gt8d7+i5faLqfFX9Q+DfAJ+cNdFeu6NH0t5iQI1vqzt6NroA/MttrUiSOmRAje+Od/QAJDk2dfgB4M9HrE+SuuDvoEZWVTeT3LqjZx/w1K07eoCVqloGziZ5H/BN4FXgRxZXsSQthgG1AFvd0VNVPz56UZLUGT/ikyR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgFqAJCeTXEtyPcm5Ga9/LMnVJF9N8ntJ3r6IOiVpkQyokSXZB5wHHgSOA6eTHN8w7MvAiar6buDzwKfHrVKSFs+AGt/9wPWqulFVrwEXgIenB1TVs1X1V5PD54BDI9coSQu3f9EF7EEHgRenjleBd91h/GPA78x6IckZ4AzAkSNHWtV3R0fP/XazuV74mQ80m0vS7uMV1Pgy41zNHJh8CDgB/Oys16vqyao6UVUnlpaWGpYoSYvnFdT4VoHDU8eHgJc2DkryPuCngH9RVX8zUm2S1A2voMZ3GTiW5N4k9wCPAMvTA5LcB/wicKqqXl5AjZK0cAbUyKrqJnAWeAZ4HrhYVVeSPJHk1GTYzwJ/D/j1JF9JsrzJdJK0a/kR3wJU1SXg0oZzj099/77Ri5KkzngFJUnqkgElSeqSASVJ6tKggHLvOPXM9dmePW3Lfs5ny4By7zj1zPXZnj1ty37Ob8gVlHvHqWeuz/bsaVv2c05DAmrW3nEH7zD+jnvHJVlJsrK2tja8SmlzzdYnuEYnfM+3ZT/nNCSg3DtOPWu2PsE1OuF7vi37Oach/1DXvePUM9dne/a0Lfs5pyFXUO4dp565Ptuzp23ZzzltGVDuHaeeuT7bs6dt2c/5DdqLz73j1DPXZ3v2tC37OR93kpAkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYCSJHXJgJIkdcmAkiR1yYBagAEPL/vnSf5rkptJPriIGiVp0QyokQ18eNnXgEeBz45bnST1Y9BWR2rq9sPLAJLcenjZ1VsDquqFyWt/u4gCJakHXkGN740+vEyS9iQDanyDH1625UR77OmakvYWA2p8gx5eNsRee7qmpL3FgBrflg8vkyQZUKMb8vCyJP80ySrwg8AvJrmyuIolaTG8i28BBjy87DLrH/1J0p7lFZQkqUsGlCSpSwaUJKlLBpQkqUsGlCSpSwaUJKlLBpQkqUsGlCSpSwaUJKlLBpQkqUsGlCSpSwaUJKlLBpQkqUsGlCSpSwaUJKlLBpQkqUsGlCSpSwaUJKlLBpQkqUsGlCSpSwaUJKlLgwIqyckk15JcT3Juxut/J8mvTV7/YpKjrQvdTexnW/azPXvalv2cz5YBlWQfcB54EDgOnE5yfMOwx4BXq+ofAZ8B/l3rQncL+9mW/WzPnrZlP+c35ArqfuB6Vd2oqteAC8DDG8Y8DPzK5PvPA+9NknZl7ir2sy372Z49bct+zmn/gDEHgRenjleBd202pqpuJvkG8PeBv5welOQMcGZy+H+SXHsDtR7YON9Gmf//Obace4O3z/2T9kY/3+j8XfQT7qqnb3QNvVFv9L9XFz3dyf2E/tZox+/5bfk7dEhAzUrxmmMMVfUk8OSAn/n6IpKVqjoxz59d5NyzftyMc7uqn2PMP/2jZpybq58wf093UT+hgzVqP2eO6fY9v11zD/mIbxU4PHV8CHhpszFJ9gNvBb7eosBdyH62ZT/bs6dt2c85DQmoy8CxJPcmuQd4BFjeMGYZ+JHJ9x8Efr+qZv4fquxnY/azPXvalv2cV1Vt+QU8BPwZ8N+Bn5qcewI4Nfn+zcCvA9eB/wL8gyHzvpEv4EzrOceYey/2c+ye2s/d11P7ubPW6HbNncnkkiR1xZ0kJEldMqAkSV3qPqC22iLkLud+KsnLSf605bw9285+Tua3p23ntp9t57afbefe1n52HVADtwi5G08DJxvO17UR+gn21DV6F+xnWzu9n10HFMO2CJlbVf0he+vfGmxrP8Ge4hq9W/azrR3dz94DatYWIQcXVMtuYD/bs6dt2c+2dnQ/ew+owVvUaBD72Z49bct+trWj+9l7QA3ZIkTD2c/27Glb9rOtHd3P3gNqyBYhGs5+tmdP27Kfbe3ofnYdUFV1EzgLPAM8D1ysqiut5k/yOeALwHcmWU3yWKu5e7Td/QR7imv0rtjPtnZ6P93qSJLUpa6voCRJe5cBJUnqkgElSeqSASVJ6pIBJUnqkgElSeqSASVJ6tL/A8RhU+u/7/cWAAAAAElFTkSuQmCC\n", | |
| "text/plain": [ | |
| "<matplotlib.figure.Figure at 0x1a170b7438>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "data": { | |
| "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAFvxJREFUeJzt3X+MXWed3/H3Z50mKwV2m924aje2sWFNwewioh2SldAChQRMs7KRGrpORZWotBZVXFhlt8URKFRGSFmQoP+4IumuJbTb1IRQtVMwivgRVqLbgCckC7Ijg2NS4hoJL0mhFZAw4ds/5ti9DNeeY3vO3Gfmvl/SyPec85x7vnc0jz4+557zPKkqJElqzS9NugBJksYxoCRJTTKgJElNMqAkSU0yoCRJTTKgJElN6hVQSbYnOZbkeJK9Y7Z/NMlj3c83k/zv5S9VkjRNstRzUEnWAd8EbgROAoeBW6rq6Dna/yvg2qr6Z8tcqyRpivQ5g7oOOF5VJ6rqOeAgsPM87W8B/tNyFCdJml6X9WhzDfDUyPJJ4PpxDZO8CNgCfPEc23cDuwGuvPLK33nZy152QcVKK+2RRx75m6paP+k6+rr66qtr8+bNky5DOq++/apPQGXMunNdF9wFPFBVz4/bWFX3AvcCzMzM1NzcXI/DS5OT5H9OuoYLsXnzZuxXal3fftXnEt9JYOPI8gbg1Dna7sLLe5KkZdAnoA4DW5NsSXI5CyE0u7hRkr8PXAX8j+UtUZI0jZYMqKqaB/YADwKPA/dX1ZEk+5LsGGl6C3CwHB5dkrQM+nwHRVUdAg4tWnfXouV/u3xlSZKmnSNJSJKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmtTrNnOtvM17P7Mix3ny7ptW5DgS+HetC+MZlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQ0oCTbkxxLcjzJ3vO0uzlJJZkZWXdnt9+xJG9emYqldnibuTSQJOuA/cCNLEz8eTjJbFUdXdTuhcC7gK+MrNvGwtxrrwB+A/h8kpeea7ZqaS3yDEoaznXA8ao6UVXPAQeBnWPafQD4EPCTkXU7WZhf7dmq+jZwvHs/aWoYUNJwrgGeGlk+2a07K8m1wMaq+vSF7jvyHruTzCWZO3369KVXLTXCgJKGkzHrzs44neSXgI8Cf3Sh+/7cyqp7q2qmqmbWr19/UYVKLfI7KGk4J4GNI8sbgFMjyy8Efgv4UhKAvwvMJtnRY19pzfMMShrOYWBrki1JLmfhpofZMxur6gdVdXVVba6qzcDDwI6qmuva7UpyRZItwFbgqyv/EaTJ8QxKGkhVzSfZAzwIrAMOVNWRJPuAuaqaPc++R5LcDxwF5oHbvYNP08aAkgZUVYeAQ4vW3XWOtq9ftPxB4IODFSc1zkt8kqQmGVCSpCYZUJKkJhlQkqQmGVCSpCb1Cqg+IzIn+cdJjiY5kuS+5S1TkjRtlrzNvM+IzEm2AncCr6mqZ5L8naEKliRNhz5nUH1GZP4XwP6qegagqr63vGVKkqZNn4DqM6ryS4GXJvnvSR5Osn3cGznqsiSprz4B1WdU5ctYGCvs9cAtwJ8m+du/sJOjLkuSeuoTUH1GVT4J/Neq+mk3udoxFgJLkqSL0iegzjsic+e/AP8AIMnVLFzyO7GchUqSpsuSAVVV88CZEZkfB+4/MyJzN28N3bbvJzkKPAT866r6/lBFS5LWvl6jmS81InNVFXBH9yNJ0iVzJAlJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoaUBLzQSQ5J1JvpHksSRfTrKtW785yY+79Y8l+djKVy9NVq/bzCVduD4zAQD3VdXHuvY7gI8AZ8ayfKKqXrWSNUst8QxKGs6SMwFU1Q9HFq/kF8e5lKaWASUNp89MACS5PckTwIeAd41s2pLk0SR/meT3znUQZwnQWmVAScPpMxMAVbW/ql4CvAd4X7f6u8CmqrqWhRFa7kvyK+MO4iwBWqsMKGk4fWYCGHUQeCtAVT17ZjzLqnoEeIKFQZilqWFAScNZciaAJKPT0twEfKtbv767yYIkL2Zh+hpnCNBU8S4+aSBVNZ/kzEwA64ADZ2YCAOaqahbYk+QG4KfAM8Ct3e6vBfYlmQeeB95ZVU+v/KeQJseAkgbUYyaAd59jv08Bnxq2OqltXuKTJDXJgJIkNcmAkiQ1yYCSJDXJgJIkNcmAkiQ1yYCSJDXJgJIkNcmAkiQ1yYCSJDXJgJIkNcmAkiQ1qVdAJdme5FiS40n2jtl+W5LTSR7rfv758pcqSZomS45m3s1Jsx+4kYUJ2A4nma2qo4uafqKq9gxQoyRpCvU5g7oOOF5VJ6rqORZm/dw5bFmSpGnXJ6CuAZ4aWT7ZrVvsHyX5epIHkmwcs12SpN76BFTGrKtFy/8N2FxVrwQ+D3x87Bslu5PMJZk7ffr0hVUqSZoqfQLqJDB6RrQBODXaoKq+X1XPdov/AfidcW9UVfdW1UxVzaxfv/5i6pUkTYk+AXUY2JpkS5LLgV3A7GiDJH9vZHEH8PjylShJmkZLBlRVzQN7gAdZCJ77q+pIkn1JdnTN3pXkSJK/Bt4F3DZUwdJq0uMRjXcm+Ub3eMaXk2wb2XZnt9+xJG9e2cqlyVvyNnOAqjoEHFq07q6R13cCdy5vadLq1vMRjfuq6mNd+x3AR4DtXVDtAl4B/Abw+SQvrarnV/RDSBPkSBLScJZ8RKOqfjiyeCX//wakncDBqnq2qr4NHO/eT5oavc6gJF2UcY9oXL+4UZLbgTuAy4E3jOz78KJ9xz3eIa1ZnkFJw+nziAZVtb+qXgK8B3jfhewLPr6htcuAkoaz5CMaixwE3nqh+/r4htYqA0oaTp9HNLaOLN4EfKt7PQvsSnJFki3AVuCrK1Cz1Ay/g5IGUlXzSc48orEOOHDmEQ1grqpmgT1JbgB+CjwD3NrteyTJ/cBRYB643Tv4NG0MKGlAPR7RePd59v0g8MHhqpPa5iU+SVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTegVUku1JjiU5nmTvedrdnKSSzCxfidLqtVTfSXJHkqNJvp7kC0leNLLt+SSPdT+zK1u5NHmXLdUgyTpgP3AjcBI4nGS2qo4uavdC4F3AV4YoVFptevadR4GZqvpRkn8JfAj4g27bj6vqVStatNSQPmdQ1wHHq+pEVT0HHAR2jmn3ARY610+WsT5pNVuy71TVQ1X1o27xYWDDCtcoNatPQF0DPDWyfLJbd1aSa4GNVfXpZaxNWu2W7DuLvAP47MjyLyeZS/JwkrcOUaDUsiUv8QEZs67Obkx+CfgocNuSb5TsBnYDbNq0qV+F0up13r7zcw2TtwMzwOtGVm+qqlNJXgx8Mck3quqJMfv27leb936mZ+kX78m7bxr8GLp4q+lvoM8Z1Elg48jyBuDUyPILgd8CvpTkSeB3gdlxN0pU1b1VNVNVM+vXr7/4qqXVYam+A0CSG4D3Ajuq6tkz66vqVPfvCeBLwLXjDmK/0lrVJ6AOA1uTbElyObALOHtHUVX9oKqurqrNVbWZhevoO6pqbpCKpdXjvH0Hzl4ev4eFPvO9kfVXJbmie3018Brg525Mkta6JS/xVdV8kj3Ag8A64EBVHUmyD5irKm9/lcbo2Xc+DLwA+GQSgO9U1Q7g5cA9SX7Gwn8k715856y01vX5DoqqOgQcWrTurnO0ff2llyWtDUv1naq64Rz7/RXw28NWJ7XNkSQkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElN6jXlu6bT5r2fGfwYT9590+DHkLQ6eQYlDSjJ9iTHkhxPsnfM9juSHE3y9SRfSPKikW23JvlW93PrylYuTZ4BJQ0kyTpgP/AWYBtwS5Jti5o9CsxU1SuBB4APdfv+GvB+4HrgOuD9Sa5aqdqlFhhQ0nCuA45X1Ymqeg44COwcbVBVD1XVj7rFh4EN3es3A5+rqqer6hngc8D2FapbaoIBJQ3nGuCpkeWT3bpzeQfw2YvcV1pzvElCGk7GrKuxDZO3AzPA6y5i393AboBNmzZdeJVSozyDkoZzEtg4srwBOLW4UZIbgPcCO6rq2QvZF6Cq7q2qmaqaWb9+/bIULrXAgJKGcxjYmmRLksuBXcDsaIMk1wL3sBBO3xvZ9CDwpiRXdTdHvKlbJ02NXgHV41bZdyb5RpLHknx5zJ1K0tSpqnlgDwvB8jhwf1UdSbIvyY6u2YeBFwCf7PrPbLfv08AHWAi5w8C+bp00NZb8DmrkVtkbWbjscDjJbFUdHWl2X1V9rGu/A/gI3nEkUVWHgEOL1t018vqG8+x7ADgwXHVS2/qcQfW5VfaHI4tXco4vcyVJ6qvPXXzjbne9fnGjJLcDdwCXA29YluokSVOrzxlUr9tdq2p/Vb0EeA/wvrFvlOxOMpdk7vTp0xdWqSRpqvQJqN63u3YOAm8dt8HbYSVJffUJqD63ym4dWbwJ+NbylShJmkZLfgdVVfNJztwquw44cOZWWWCuqmaBPd3Dhj8FngEceVmSdEl6DXXU41bZdy9zXZKkKedIEpKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUNKAkmxPcizJ8SR7x2x/bZKvJZlPcvOibc8neaz7mV28r7TW9RrNXNKFS7IO2A/cyMLEn4eTzFbV0ZFm3wFuA/54zFv8uKpeNXihUqMMKGk41wHHq+oEQJKDwE7gbEBV1ZPdtp9NokCpZV7ik4ZzDfDUyPLJbl1fv5xkLsnDSd66vKVJ7fMMShpOxqyrC9h/U1WdSvJi4ItJvlFVT/zCQZLdwG6ATZs2XVylUoM8g5KGcxLYOLK8ATjVd+eqOtX9ewL4EnDtOdrdW1UzVTWzfv36i69WaowBJQ3nMLA1yZYklwO7gF534yW5KskV3eurgdcw8t2VNA0MKGkgVTUP7AEeBB4H7q+qI0n2JdkBkOTVSU4CbwPuSXKk2/3lwFySvwYeAu5edPeftOb5HZQ0oKo6BBxatO6ukdeHWbj0t3i/vwJ+e/ACpYZ5BiVJapIBJUlqkgElSWqSASVJapIBJUlqUq+A6jEi8x1Jjib5epIvJHnR8pcqSZomSwbUyIjMbwG2Abck2bao2aPATFW9EngA+NByFypJmi59zqDOjshcVc8BZ0ZkPquqHqqqH3WLDzPmuQ5Jki5En4C60BGZ3wF89lKKkiSpz0gSvUdkTvJ2YAZ43Tm2O+qyJKmXPmdQvUZkTnID8F5gR1U9O+6NHHVZktRXn4BackTmJNcC97AQTt9b/jIlSdNmyYDqMyIz8GHgBcAnkzyWpNeUApIknUuv0cx7jMh8wzLXJUmaco4kIUlqkgElSWqSASVJapIBJUlqkgElSWqSASVJapIBJQ2ox1Q1r03ytSTzSW5etO3WJN/qfm5duaqlNhhQ0kB6TlXzHeA24L5F+/4a8H7gehZmFHh/kquGrllqiQElDafPVDVPVtXXgZ8t2vfNwOeq6umqegb4HLB9JYqWWmFAScO50KlqLmrfJLuTzCWZO3369EUVKrXIgJKG03uqmkvZ11kCtFYZUNJwek1VM8C+0ppgQEnDWXKqmvN4EHhTkqu6myPe1K2TpoYBJQ2kz1Q1SV6d5CTwNuCeJEe6fZ8GPsBCyB0G9nXrpKnRa7oNSRenx1Q1h1m4fDdu3wPAgUELlBrmGZQkqUkGlCSpSQaUJKlJfgclaWps3vuZFTnOk3fftCLHWes8g5IkNcmAkiQ1yYCSJDXJgJIkNcmAkiQ1yYCSJDXJgJIkNcmAkiQ1qVdAJdme5FiS40n2jtn+2iRfSzKf5OblL1OSNG2WDKgk64D9wFuAbcAtSbYtavYd4DbgvuUuUJI0nfoMdXQdcLyqTgAkOQjsBI6eaVBVT3bbfjZAjZKkKdTnEt81wFMjyye7dRcsye4kc0nmTp8+fTFvIUmaEn0CKmPW1cUcrKruraqZqppZv379xbyFJGlK9Amok8DGkeUNwKlhypEkaUGfgDoMbE2yJcnlwC5gdtiyJEnTbsmAqqp5YA/wIPA4cH9VHUmyL8kOgCSvTnISeBtwT5IjQxYtrRY9HtG4Isknuu1fSbK5W785yY+TPNb9fGyla5cmrdeEhVV1CDi0aN1dI68Ps3DpT1Jn5BGNG1m4VH44yWxVHR1p9g7gmar6zSS7gD8B/qDb9kRVvWpFi5Ya4kgS0nDOPqJRVc8BZx7RGLUT+Hj3+gHgjUnG3ZgkTR0DShpOn0c0zrbpLqf/APj1btuWJI8m+cskv3eug/j4htYqA0oaTp9HNM7V5rvApqq6FrgDuC/Jr4w7iI9vaK0yoKTh9HlE42ybJJcBvwo8XVXPVtX3AarqEeAJ4KWDVyw1xICShtPnEY1Z4Nbu9c3AF6uqkqzvbrIgyYuBrcCJFapbakKvu/gkXbiqmk9y5hGNdcCBM49oAHNVNQv8GfDnSY4DT7MQYgCvBfYlmQeeB95ZVU+v/KeQJseAkgbU4xGNn7Dw/ODi/T4FfGrwAqWGeYlPktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktSkXgGVZHuSY0mOJ9k7ZvsVST7Rbf9Kks3LXai0Gl1K30lyZ7f+WJI3r2TdUguWDKgk64D9wFuAbcAtSbYtavYO4Jmq+k3go8CfLHeh0mpzKX2na7cLeAWwHfj33ftJU6PPGdR1wPGqOlFVzwEHgZ2L2uwEPt69fgB4Y5IsX5nSqnQpfWcncLCqnq2qbwPHu/eTpsZlPdpcAzw1snwSuP5cbapqPskPgF8H/ma0UZLdwO5u8f8mOXYxRZ/H1YuPOWUu+PNnwue6Axx/uf8GXnQJ+15K37kGeHjRvteMO8jA/WrV/U0NUMOq/B0soyE+f69+1Segxp0J1UW0oaruBe7tccyLkmSuqmaGev/WTfvnh+Z+B5fSd3r1KRi2XzX2+5yIaf8dTPLz97nEdxLYOLK8ATh1rjZJLgN+FXh6OQqUVrFL6Tt99pXWtD4BdRjYmmRLkstZ+OJ2dlGbWeDW7vXNwBerauz/9qQpcil9ZxbY1d3ltwXYCnx1heqWmrDkJb7uuvge4EFgHXCgqo4k2QfMVdUs8GfAnyc5zsL//nYNWfR5DHb5cJWY9s8PDf0OLqXvdO3uB44C88DtVfX8BD5GM7/PCZr238HEPn880ZEktciRJCRJTTKgJElNWhMBtdRwMmtdko1JHkryeJIjSd496ZomIcm6JI8m+fSka1kL7Ff2K5hsv1r1AdVzOJm1bh74o6p6OfC7wO1T+DsAeDfw+KSLWAvsV4D96oyJ9atVH1D0G05mTauq71bV17rX/4eFP6axow6sVUk2ADcBfzrpWtYI+5X9auL9ai0E1LjhZKbqj2hUNxr2tcBXJlvJivt3wL8BfjbpQtYI+9UI+9Vk+tVaCKjeQ8KsdUleAHwK+MOq+uGk61kpSX4f+F5VPTLpWtYQ+1XHfjW5frUWAsohYYAkf4uFTvQfq+o/T7qeFfYaYEeSJ1m4FPWGJH8x2ZJWPfsV9ism3K9W/YO63fhl3wTeCPwvFoaX+SdVdWSiha2gbnqGjwNPV9UfTrqeSUryeuCPq+r3J13Lama/sl+NmlS/WvVnUFU1D5wZTuZx4P5p6kSd1wD/lIX/4TzW/fzDSRel1ct+BdivJm7Vn0FJktamVX8GJUlamwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSk/4fB8s4wJJ+FFUAAAAASUVORK5CYII=\n", | |
| "text/plain": [ | |
| "<matplotlib.figure.Figure at 0x1a18733d30>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Bag of words mapping function:\n", | |
| "{'like': 3, 'dog': 2, 'cat': 1, 'amy': 0, 'love': 4}\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "mylda = myLDA(N_K=2)\n", | |
| "test_doc = ['I like dog',\n", | |
| " 'dog dog dog I like dog',\n", | |
| " 'cat cat cat Amy love cat',\n", | |
| " 'Amy love cat Amy love',\n", | |
| " 'dog dog like I']\n", | |
| "tf_vectorizer_test = CountVectorizer()\n", | |
| "tf_test = tf_vectorizer_test.fit_transform(test_doc)\n", | |
| "(Theta_sgs, Beta_sgs) = mylda.lda_sgs(tf_test)\n", | |
| "print(\"******Standard Gibbs Sampling******\")\n", | |
| "print(\"Document topic proportion:\")\n", | |
| "print(Theta_sgs)\n", | |
| "print(\"Topic word proportion\")\n", | |
| "print(Beta_sgs)\n", | |
| "print(\"Visualization\")\n", | |
| "mylda.visualization(Theta_sgs,Beta_sgs)\n", | |
| "print(\"Bag of words mapping function:\")\n", | |
| "print(tf_vectorizer_test.vocabulary_)\n", | |
| "\n", | |
| "(Theta_cgs, Beta_cgs) = mylda.lda_cgs(tf_test)\n", | |
| "print(\"******Collapsed Gibbs Sampling******\")\n", | |
| "print(\"Document topic proportion:\")\n", | |
| "print(Theta_cgs)\n", | |
| "print(\"Topic word proportion\")\n", | |
| "print(Beta_cgs)\n", | |
| "print(\"Visualization\")\n", | |
| "mylda.visualization(Theta_cgs,Beta_cgs)\n", | |
| "print(\"Bag of words mapping function:\")\n", | |
| "print(tf_vectorizer_test.vocabulary_)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "- Comparison example" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 93, | |
| "metadata": { | |
| "scrolled": true | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Visualization for myLDA via collapsed Gibbs sampling\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAFfhJREFUeJzt3X+opFd9x/H31w2poNZKk4Ikua61Ubr+IMGrEQS1/igbQzeC1maDYDC6WBJjjRY3KJLqP9GAEtq0dROCUogxSn+svyq0Km3F2Kw12mYlGjXqxkJWTRUREzf99o+Zu52dO3vn7HPPPHNm5v2ChZ07T557OJnZzznnOc/3icxEkqTWPGreDZAkaRIDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktSk0+b1i88444zcuXPnvH79tnzlK1/5UWaeOe92jLI/67NP67I/61qF/pxbQO3cuZNDhw7N69dvS0R8b95tGGd/1mef1mV/1rUK/ekSnySpSQaUJKlJBpSkTSJid0TcExH3RsT+Ce+vRcTnI+KrEfH1iHj5PNqp5WZASTpBROwAbgQuBHYBeyNi19hh7wRuz8zzgUuAv+y3lVoFBpSkcc8F7s3M72Tmw8BtwMVjxyTw68O/Px74YY/t04owoCSNOwv4wcjrI8OfjboWeE1EHAE+Dbxp0okiYl9EHIqIQ0ePHp1FW7XEDChJ42LCz8Yfvb0X+FBmng28HPibiNj070lmHsjM9cxcP/PMpm4j0gIwoCSNOwKcM/L6bDYv4V0O3A6QmV8CHg2c0UvrtDIMKEnj7gTOjYgnR8TpDDZBHBw75vvASwAi4ncZBJRreKrKgJJ0gsw8BlwJfBb4BoPdendHxLsjYs/wsLcCb4iIrwEfAS7LzPFlQGlb5lbqaMPO/Z+qdq77rruo2rkWlf1Z3yr2aWZ+msHmh9GfvWvk74eB5/fdrhKr+P9rlubZn86gJElNMqAkSU0qCqiCsicfiIi7hn++GRH/U7+pkqRVMjWgSsqeZOZbMvO8zDwP+HPgb2fRWElaRNY27KZkBlVS9mTUXga7eqReOMNXy6xt2F3JLr5JZU8umHRgRDwJeDLwuZO8vw/YB7C2tnZKDZUmGfnyv4zBZ/POiDg43GUGDGb4I8e/CTi/94ZqlR0f5ANExMYg//DIMdY2nKBkBlVS9mTDJcDHM/ORSW9a9kQz4AxfrbO2YUclAVVS9mTDJfjlV79KvvzA9Bn+8JiV+gdAvbC2YUclAVVS9oSIeBrwBOBLdZu4fAqumVwWEUdHrpu8fh7tXBDVZviwev8AqBfWNuxoakAVlj2BwQjgNsudbK3wginARzd2Rmbmzb02crE4w1frrG3YUVGpo2llT4avr63XrKVWcsFU5Y5/+YH7GXz5Lx0/yBm+5iUzj0XExiB/B3DLxiAfOJSZBxnUNrwpIt7CYAXA2oY0UItvBZXuinxlRLwA+Cbwlsz8wfgB7oos/vKDM3zN0SLXNpwnA6p/JddMPgF8JDMfiog3Ah8GXrzpP8o8ABwAWF9fX9l/eJ3hS8vJWnz9m3rNJDN/nJkPDV/eBDy7p7ZJUjMMqP5NvWAaEU8cebmHweYUSVopLvH1rPCayVXDHZLHgJ8Al82twZI0JwbUHBRcML0GuKbvdklSS1zikyQ1yYCSJDXJgJIkNcmAkrSJz9hSC9wkIekEPmNLrXAGJWmcz9hSE5xB6ZTs3P+paue677qLqp1LVVV7ira0Hc6gJI2r9owtHwCp7TCgJI2r9owtHwCp7TCgJI3zKdpqggEl6QQ+RVutcJOEpE18xpZa4AxKktQkA0qS1CQDSpJmzNJR3XgNSpJmyNJR3TmDkqTZsnRURwaUJM3WpNJRZ006cFrpqFWrzFEUUNPWT4fHvDoiDkfE3RFxa91mStLCqlY6atUqc0wNqJH10wuBXcDeiNg1dsy5wDXA8zPz6cCfzKCt0kQOoNS4aqWjVk3JJonj66cAEbGxfnp45Jg3ADdm5oMAmflA7YZKk5RcgB4bQD0YEb81n9ZqRR0vHQXczyCELh0/yNJRm5Us8ZWsnz4VeGpEfDEi7oiI3ZNOtGrrp+pFyQVoB1CaG0tHdVcygypZPz0NOBd4EYPp679GxDMy84S9/Jl5ADgAsL6+7v8E1VDy7KKnAkTEF4EdwLWZ+Y/9NE+ydFRXJQFVsn56BLgjM38FfDci7mEQWHdWaaV0ctUGUDCY5QP7ANbW1uq2VNIpKVniKym9//fA7wFExBkMRqzfqdlQ6SRKB1D/kJm/yszvAhsDqE1WbZeU1LKpAVW4fvpZ4McRcRj4PPCnmfnjWTVaGuEASlpSRaWOpq2fDi/qXT38I/UmM49FxMYAagdwy8YACjiUmQeH7/3+cAD1CA6gpIVgLT4tPAdQ0nKy1JEkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUpE2sEK8WuM1c0gmsEK9WOIOag5LR6fC4V0VERsR6n+3TyrNCvJpgQPWs5AGQw+MeB1wFfLnfFkr1HrEjbYcB1b+S0SnAe4D3Ab/ss3ESp14hfi9wc0T8xqYT+Qw4bYMB1b+po9OIOB84JzM/udWJ/PJrRqpViLc6vLbDgOrflqPTiHgU8AHgrdNO5JdfM2KFeDXBgOrftNHp44BnAF+IiPuA5wEH3SihvviIHbXCbeb9Oz46Be5nMDq9dOPNzPwpcMbG64j4AvC2zDzUczu1wqwQrxY4g+pZ4ehUklaeM6g5mDY6Hfv5i/pok6TZGW7Dv4HBQzVvzszrJhzzauBaBtekv5aZl44fs2oMKEmaIStzdOcSnyTNlpU5OjKgJGm2qlXmWLV7Hw0oSZqtapU5Vu3eRwNKkmarWmWOVWNASdJsWZmjo6KAmvZ4iIi4LCKORsRdwz+vr99USVo8Vubobuo285ItkkMfzcwrZ9BGSVpoVubopmQGVfp4CGkunOFLy6kkoEq2SAK8MiK+HhEfj4hzJry/clskNXulD4BkMMM/b/jn5l4bKamTkoAq2SL5CWBnZj4L+Cfgw5NOtGpbJNULZ/jSkioJqKlbJDPzx5n50PDlTcCz6zRPmqraDB+c5UstKQmoqVskI+KJIy/3MNipIvWh2gwfnOVLLZm6iy8zj0XExhbJHcAtG1skgUOZeRC4arhd8hjwE+CyGbZZGlU0wx95eRPw3h7aJWmbiqqZF2yRvIZBJV6pb1s+ABIGM/zM/O/hS2f40oLwcRtaaM7wpeVlQGnhOcOXlpO1+CRJTTKgJElNMqAkbWL5KLXAa1CSTmCBaLXCGZSkcZaPUhMMKEnjLBCtJhhQksZZIFpNMKAkjbNAtJpgQEkaZ4FoNcFdfJJOYPkotcKAkrSJ5aPqiojdwA0MAv/mzLxu7P3LgOsZFDwG+Auf/GxASdJMeV9Zd16DkqTZ8r6yjgwoSZot7yvryICSpNnyvrKODChJmi3vK+vIgJKk2fK+so7cxSdJM+R9Zd0ZUHNQcE/EG4ErgEeAnwP7JmxJlbQgvK+sG5f4ejZyT8SFwC5gb0TsGjvs1sx8ZmaeB7wPeH/PzZSkuTOg+jf1nojM/NnIy8ewecePJC09l/j6N+meiAvGD4qIK4CrgdOBF/fTNElqR9EMKiJ2R8Q9EXFvROzf4rhXRURGxHq9Ji6dknsiyMwbM/MpwNuBd0480YrdtCdptUwNqMJrJkTE44CrgC/XbuSSmXpPxJjbgFdMemPVbto7GQdQ0nIqmUGV1pF6D4ML+r+s2L5lVHJPxLkjLy8CvtVj+xaKAyhpeZUE1NQ6UhFxPnBOZn5yqxO5JDW4JwLYuCfiG8DtG/dEDO+DALgyIu6OiLsYXId67ZyauwgcQElLqmSTxJbXTCLiUcAHKLixLDMPAAcA1tfXV3ZnWsE9EW/uvVGLa+qmk9EBVES8bauTRcQ+YB/A2tpa5aZKOhUlM6hp10weBzwD+EJE3Ac8DzjoOr96UjqAemvJybyuJ7WjJKC2vGaSmT/NzDMyc2dm7gTuAPZk5qGZtFg6kQMoaUlNDajCaybSvDiAkpZU0Y26066ZjP38RdtvllSmsBCnpAVkJQktPAdQ0nKyFp+kTbz5WS0woCSdwJuf1QoDStI4b35WEwwoSeOsHqMmGFCSxlW7+dkbn7UdBpSkcd78rCYYUJLGefNzZe6K7MaAknQCq8fU5a7I7rxRV9Im3vxc1fFdkQARsbEr8vDYcRu7IresuL9KnEFJ0my5K7IjA0qSZstdkR0ZUJI0W+6K7MiAkqTZcldkRwaUJM2QuyK7cxefJM2YuyK7cQYlSWqSASVJapIBJUlqkgElSWqSASVJapIBJUlqkgElSWpSUUBNe5ZJRLwxIv4zIu6KiH+bVEpekqRTMTWgCp9lcmtmPjMzz2NQLv791VsqnYQDKGk5lcygjj/LJDMfBjaeZXJcZv5s5OVjGKnUK82SAyhpeZUE1NRnmQBExBUR8W0G/wBcNelEq/YsE/XCAZS0pEoCastnmRz/QeaNmfkU4O3AOyedaNWeZaJeVBtADY9zECU1oiSgpj3LZNxtwCu20yjpFFQbQA2PcxAlNaIkoLZ8lglARJw78vIi4Fv1mihtyQGUtKSmPm4jM49FxMazTHYAt2w8ywQ4lJkHgSsj4qXAr4AHgdfOstHSiOMDKOB+BgOoS0cPiIhzM3Nj0OQASloQRc+DmvYsk8x8c+V2SUUcQM1GROwGbmDQpzdn5nVj778RuAJ4BPg5sC8zD/feUC01H1g4BwVf/quB1wPHgKPA6zLze703dEE4gKprZOv+yxgsod4ZEQfHAujWzPzr4fF7GGzd3917Y7XUlj6gdu7/VLVz3XfdRds+R+GX/6vAemb+IiL+mMHOsz/a9i+Xyhzfug8QERtb949/Rt26rz5Yi69/JfftfD4zfzF8eQeDC/9SX7z3UU0woPpX9OUfcTnwmUlv+OXXjHjvo5qw9Et8DSr68gNExGuAdeCFk97PzAPAAYD19XWXWBZQa0vQQ1227v9VrV++jNx00o0zqP4VffmHu87eAezJzId6apsE3vtYlfUiu3MG1b+S+3bOBz4I7M7MB/pvolaZW/erc9NJRwZUzwq//NcDjwU+FhEA38/MPXNrtFaOW/ermnTd+YLxgyLiCuBq4HTgxZNOFBH7gH0Aa2tr1RvaGgNqDgq+/C/tvVGSZqV40wlwY0RcymDTyaZZ6apdd/YalCTNlvUiOzKgJGm23HTSkUt8kjRDbjrpzoCSpBlz00k3LvFJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmlQUUBGxOyLuiYh7I2L/hPevjojDEfH1iPjniHhS/aZKk/n5lJbT1ICKiB3AjcCFwC5gb0TsGjvsq8B6Zj4L+DjwvtoNlSbx8yktr5IZ1HOBezPzO5n5MIOHaV08ekBmfj4zfzF8eQeDB3JJffDzKS2pksdtnAX8YOT1EeCCLY6/HPjMpDciYh+wD2Btba2widKWqn0+wc9oH3bu/1S1c9133UXVzqX2lMygYsLPcuKBEa8B1oHrJ72fmQcycz0z188888zyVkonV+3zCX5GpZaUBNQR4JyR12cDPxw/aPg0yHcAezLzoTrNk6by8zkDbjxRC0oC6k7g3Ih4ckScDlwCHBw9ICLOBz7I4Mv/QP1mSifl57MyN56oFVMDKjOPAVcCnwW+AdyemXdHxLsjYs/wsOuBxwIfi4i7IuLgSU4nVeXncybceKImlGySIDM/DXx67GfvGvn7Syu3Syrm57M6N0ZVFhG7gRuAHcDNmXnd2PtXA68HjgFHgddl5vd6b2hjrCQhaZwboypyybQ7A0rSODee1OWSaUcGlKRxbjypa9KS6VlbHL/lkmlEHIqIQ0ePHq3YxDYZUJJO4MaT6lwy7ahok4Sk1eLGk6pOdcn0hS6ZDjiDkqTZcsm0IwNKkmbIJdPuXOKTpBlzybQbZ1CSpCY5g5KkBbbMjy9xBiVJapIBNQcFjzJ4QUT8R0Qci4hXzaONkjRvBlTPCutyfR+4DLi139ZJUju8BtW/43W5ACJioy7X4Y0DMvO+4Xv/O48GSlILnEH171Trcp3UqtXlkrRaDKj+FdflmmbV6nJJWi0GVP+K6nJJ0qozoPo3tS6XJMmA6l1JXa6IeE5EHAH+EPhgRNw9vxZL0ny4i28OCupy3YlP1JS04pxBSZKaZEBJkppkQEmSmlQUUNaOU8v8fErLaWpAWTtOLfPzKS2vkl181o5Ty/x8SkuqZInP2nFqWbXPp/6fy6ZqQUlAWTtOLav2+QQHUeCy6SwY+N2UBJS149Syqp9PB1HAyLJpZj4MbCybHpeZ92Xm1wGXTacw8LsrCShrx6llfj7rc1m/LgO/o6kBZe04tczP50y4rF+Xgd9RUS0+a8epZX4+q3NZv66qgQ8cAFhfX+98rXVRWElC0jiXTesy8DsyoCSdwGXT6gz8jnzchqRNXDatJzOPRcRG4O8AbtkIfOBQZh6MiOcAfwc8AfiDiPizzHz6HJvdBANKkmbMwO/GJT5JUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSk4oCKiJ2R8Q9EXFvROyf8P6vRcRHh+9/OSJ21m7oMrE/67I/67NP67I/u5kaUBGxA7gRuBDYBeyNiF1jh10OPJiZvwN8AHhv7YYuC/uzLvuzPvu0Lvuzu5IZ1HOBezPzO5n5MHAbcPHYMRcDHx7+/ePASyIi6jVzqdifddmf9dmnddmfHZ1WcMxZwA9GXh8BLjjZMZl5LCJ+Cvwm8KPRgyJiH7Bv+PLnEXHPKbT1jPHzjYvuY46p5x47/5M6/6bV6M9TPX8T/Qnb6tNT/QydqlP9/9VEny5yf46df9H7Exbw39CSgJqU4tnhGDLzAHCg4HdubkTEocxc7/LfzvPck37dhJ8tVX/2cf7RXzXhZ536E7r36RL1JzTwGbU/Jx7T7Hd+VucuWeI7Apwz8vps4IcnOyYiTgMeD/ykRgOXkP1Zl/1Zn31al/3ZUUlA3QmcGxFPjojTgUuAg2PHHAReO/z7q4DPZebEEarsz8rsz/rs07rsz64yc+of4OXAN4FvA+8Y/uzdwJ7h3x8NfAy4F/h34LdLznsqf4B9tc/Zx7lXsT/77lP7c/n61P5crM/orM4dw5NLktQUK0lIkppkQEmSmtR8QE0rEbLNc98SEQ9ExH/VPG/LZtmfw/Pbp3XPbX/WPbf9WffcM+3PpgOqsETIdnwI2F3xfE3roT/BPvUzug32Z12L3p9NBxRlJUI6y8x/YbXuNZhpf4J9ip/R7bI/61ro/mw9oCaVCDlrTm1ZBvZnffZpXfZnXQvdn60HVHGJGhWxP+uzT+uyP+ta6P5sPaBKSoSonP1Zn31al/1Z10L3Z+sBVVIiROXsz/rs07rsz7oWuj+bDqjMPAZcCXwW+AZwe2beXev8EfER4EvA0yLiSERcXuvcLZp1f4J9ip/RbbE/61r0/rTUkSSpSU3PoCRJq8uAkiQ1yYCSJDXJgJIkNcmAkiQ1yYCSJDXJgJIkNen/ACQEI5U71AOTAAAAAElFTkSuQmCC\n", | |
| "text/plain": [ | |
| "<matplotlib.figure.Figure at 0x1a168f3ba8>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "data": { | |
| "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAGI9JREFUeJzt3X2QXXd93/H3BymWg1NsYpRO4oesqAWtTKiArU0bIBQVIocEwcQepHiIZ+qOwhRPk9IMkScTD3jyR9ymqO3ghiixE9cQbCpCuhOLOBQT2mbA9RqMbdm4WSsOFnZh/RBTcIwt+9s/zlnn+npXe313796jq/drZmfPw+/s/d6jPfrsPQ+/X6oKSZK65kXjLkCSpMUYUJKkTjKgJEmdZEBJkjrJgJIkdZIBJUnqJANKGrEk25Pck2QuyZ5F1m9Icn27/uYkU+3y70tyTZI7ktyd5NK1rl0aJwNKGqEk64ArgfOALcCuJFv6ml0MPFpVZwF7gSva5RcAG6rqx4DXAb+wEF7S8cCAkkbrHGCuqg5V1ZPAdcCOvjY7gGva6f3AtiQBCjgpyXrg+4EngW+vTdnS+BlQ0midBtzfM3+4XbZom6o6AjwGnEoTVt8FHgS+DvxmVT3S/wJJdieZbb92r/5bkMZj/bgL6Peyl72spqamxl2G9Kxbb731oaraOOTmWWRZf/9iS7U5B3ga+BHgpcD/TPLfq+rQcxpW7QP2QXP8TE9P//aQtUqrbiXHT+cCampqitnZ2XGXIT0ryV+tYPPDwBk986cDDyzR5nB7Ou9k4BHg54A/qaqngG8l+XNgGjjEEjx+1DUrOX4GOsW3gruQLkxyW8/XM0m2DlusdAy6BdicZFOSE4CdwExfmxngonb6fOCmanpx/jrwljROAl4PfG2N6pbGbtmAWsldSFX18araWlVbgfcA91XVbav5BqQua68pXQLcCNwNfLKqDia5PMk72mZXAacmmQPeDyz8EXgl8APAnTRB93tVdfuavgFpjAY5xffsXUgASRbuQrqrp80O4IPt9H7gI0lSzx3LYxfwiRVXLB1jquoAcKBv2WU900/Q3FLev913FlsuHS8GOcW3kruQer2bJQKq9y6k+fn5QeqWJE24QQJqJXchNSuTc4HHq+rOxV6gqvZV1XRVTW/cOOzNUpKkSTJIQL2Qu5DouwtpwU48vSdJegEGCaiV3IVEkhfRnEe/bnVKliQdD5a9SaKqjiRZuAtpHXD1wl1IwGxVzdDchXRtexfSIzQhtuBNwOH+hwslSTqagR7UHfYupHbdn9E8vyFJ0sDsi0/Hhak9NzC154ZxlyGN1bF2DBhQkqROMqAkSZ1kQEmSOsmAkiR1kgElSeokA0qS1EkGlCSpkwwoSVInGVCSpE4yoCRJnWRASZI6yYCSJHWSASVJ6iQDSpLUSQaUJKmTDChpxJJsT3JPkrkkexZZvyHJ9e36m5NMtcsvTHJbz9czSbaudf3SuBhQ0gglWQdcCZwHbAF2JdnS1+xi4NGqOgvYC1wBUFUfr6qtVbUVeA9wX1XdtnbVS+NlQEmjdQ4wV1WHqupJ4DpgR1+bHcA17fR+YFuS9LXZBXxipJVKHWNASaN1GnB/z/zhdtmibarqCPAYcGpfm3ezREAl2Z1kNsns/Pz8qhQtdYEBJY1W/ychgHohbZKcCzxeVXcu9gJVta+qpqtqeuPGjcNXKnWMASWN1mHgjJ7504EHlmqTZD1wMvBIz/qdeHpPxyEDShqtW4DNSTYlOYEmbGb62swAF7XT5wM3VVUBJHkRcAHNtSvpuLJ+3AVIk6yqjiS5BLgRWAdcXVUHk1wOzFbVDHAVcG2SOZpPTjt7fsSbgMNVdWita5fGzYCSRqyqDgAH+pZd1jP9BM2npMW2/TPg9aOsT+qqgU7xDfugYbvu1Um+mORgkjuSnLh65UuSJtWyAbWSBw3bC74fA95bVWcDbwaeWrXqJUkTa5BPUCt50PBtwO1V9VWAqnq4qp5endIlSZNskIBayYOGrwAqyY1JvpzkA4u9gA8aSpL6DRJQK3nQcD3wBuDC9vu7kmx7XkMfNJQk9RkkoFbyoOFh4AtV9VBVPU5zJ9NrV1q0JGnyDRJQK3nQ8Ebg1Ule3AbXTwB3rU7pkqRJtuxzUCt50LCqHk3yYZqQK+BAVd0wovciSZogAz2ou8IHDT9Gc6u5JEkDsy8+SVInGVCSpE4yoCRJnWRASZI6yYCSJHWSASVJ6iQDSpLUSQaUJKmTDChJUicZUJKkTjKgJEmdZEBJkjrJgJIkdZIBJY1Yku1J7kkyl2TPIus3JLm+XX9zkqmeda9O8sUkB5PckeTEtaxdGicDShqhJOuAK4HzgC3AriRb+ppdDDxaVWcBe4Er2m3X0wxV896qOht4M/DUGpUujZ0BJY3WOcBcVR2qqieB64AdfW12ANe00/uBbUkCvA24vaq+ClBVD1fV02tUtzR2BpQ0WqcB9/fMH26XLdqmqo4AjwGnAq8AKsmNSb6c5AOLvUCS3Ulmk8zOz8+v+huQxsWAkkYriyyrAdusB94AXNh+f1eSbc9rWLWvqqaranrjxo0rrVfqDANKGq3DwBk986cDDyzVpr3udDLwSLv8C1X1UFU9DhwAXjvyiqWOMKCk0boF2JxkU5ITgJ3ATF+bGeCidvp84KaqKuBG4NVJXtwG108Ad61R3dLYrR93AdIkq6ojSS6hCZt1wNVVdTDJ5cBsVc0AVwHXJpmj+eS0s9320SQfpgm5Ag5U1Q1jeSPSGBhQ0ohV1QGa03O9yy7rmX4CuGCJbT9Gc6u5dNzxFJ8kqZMMKElSJxlQkqROGiighu1LLMlUkr9Jclv79dHVLV+SNKmWvUmipy+xt9I8l3FLkpmq6r3d9dm+xJLspOlL7N3tunurausq1y1JmnCDfIJaSV9ikiQNZZCAWklfYgCbknwlyReSvHGF9UqSjhODPAe1kr7EHgTOrKqHk7wO+KMkZ1fVt5+zcbIb2A1w5plnDlCSJGnSDfIJaui+xKrqe1X1MEBV3QrcS9ND83PY2aUkqd8gATV0X2JJNrY3WZDk5cBm4NDqlC5JmmTLnuJbSV9iwJuAy5McAZ6mGRn0kVG8EUnSZBmoL75h+xKrqk8Bn1phjZKk45A9SUiSOsmAkiR1kgElSeokA0qS1EkGlCSpkwwoSVInGVCSpE4yoCRJnWRASZI6yYCSRswRqaXhDNTVkXSsmtpzw0Dr7/uNt4/k9R2RWhqen6Ck0XJEamlIBpQ0WiMfkTrJ7iSzSWbn5+dXt3ppjAwoabRWY0Tq1wDvB/4gyUue19ABPzWhDChptEY+IrU0qQwoabQckVoaknfxSSPkiNTS8AwoacQckVoajqf4JEmdZEBJkjrJgJIkdZIBJUnqJANKktRJBpQkqZMMKElSJxlQkqROGiighh1wrWf9mUm+k+SXV6dsSdKkWzagegZcOw/YAuxKsqWv2bMDrgF7aQZc67UX+MzKy5UkHS8G+QS1ogHXkryTpoPLg6tTsiTpeDBIQA094FqSk4BfAT50tBdwwDVJUr9BAmolA659CNhbVd852gs44Jokqd8gvZm/kAHXDvcOuAacC5yf5N8CpwDPJHmiqj6y4solSRNtkIB6dsA14Bs0Y9X8XF+bhQHXvkjPgGvAGxcaJPkg8B3DSZI0iGUDaoUDrkmSNJSBBiwcdsC1vvYfHKI+SdJxyp4kJEmdZEBJkjrJgJIkdZIBJUnqJANKktRJBpQ0Yo4GIA3HgJJGyNEApOEZUNJoORqANCQDShotRwOQhmRASaPlaADSkAbq6kjS0BwNQBqSASWNlqMBSEMyoKQRcjQAaXgGlDRijgYgDeeYu0lias8N4y7hWVN7buhEPV2p42hWq8Zj4b1KWh3HXEBJko4PBpQkqZMMKElSJxlQkqROMqAkSZ1kQEmSOsmAkiR1kgElSeokA0qS1EkGlCSpkwYKqCTbk9yTZC7JnkXWb0hyfbv+5iRT7fJzktzWfn01ybtWt3xJ0qRaNqCSrAOuBM4DtgC7kmzpa3Yx8GhVnQXsBa5ol98JTFfVVmA78NvteDeSJB3VIJ+gzgHmqupQVT0JXAfs6GuzA7imnd4PbEuSqnq8HcIa4ESeP5KoJEmLGiSgTgPu75k/3C5btE0bSI8BpwIkOTfJQeAO4L09gfWsJLuTzCaZnZ+ff+HvQpI0cQYJqCyyrP+T0JJtqurmqjob+EfApUlOfF7Dqn1VNV1V0xs3bhygJEnSpBskoA4DZ/TMnw48sFSb9hrTyTQjgz6rqu4Gvgu8athiJUnHj0EC6hZgc5JNSU6gGY56pq/NDHBRO30+cFNVVbvNeoAkPwq8ErhvVSqX9IJNymCPk/I+dHTL3lFXVUeSXALcCKwDrq6qg0kuB2araga4Crg2yRzNJ6ed7eZvAPYkeQp4BviXVfXQKN6IJGmyDHTLd1UdAA70LbusZ/oJ4IJFtrsWuHaFNUqSjkP2JCFJ6iQDSpLUSQaUNGJ2FSYNx4CSRsiuwqThGVDSaNlVmDQkA0oaLbsKk4ZkQEmjZVdh0pAMKGm07CpMGpIBJY2WXYVJQ/KOIGmE7CpMGp4BJY2YXYVJw/EUnySpkwwoSVInGVCSpE4yoCRJnWRASZI6yYCSJHWSASVJ6iQDSpLUST6ouwam9twAwH2/8faR/FxJmkR+gpIkdZIBJUnqJANKktRJBpQkqZMMKElSJw0UUEm2J7knyVySPYus35Dk+nb9zUmm2uVvTXJrkjva729Z3fIlSZNq2YBKsg64EjgP2ALsSrKlr9nFwKNVdRawF7iiXf4Q8DNV9WM0I4Y6to0kaSCDfII6B5irqkNV9SRwHbCjr80O4Jp2ej+wLUmq6itV9UC7/CBwYpINq1G4JGmyDRJQpwH398wfbpct2qaqjgCPAaf2tflZ4CtV9b3+F0iyO8lsktn5+flBa5ckTbBBAiqLLKsX0ibJ2TSn/X5hsReoqn1VNV1V0xs3bhygJEnSpBskoA4DZ/TMnw48sFSbJOuBk4FH2vnTgU8DP19V9660YEmrY6GrrNXoMmsU3W6NsiuvF/Kz7VLs6Ea5fwYJqFuAzUk2JTkB2AnM9LWZobkJAuB84KaqqiSnADcAl1bVn69W0ZKkybdsQLXXlC4BbgTuBj5ZVQeTXJ7kHW2zq4BTk8wB7wcWbkW/BDgL+LUkt7VfP7Tq70LqMB/TkIYzUG/mVXUAONC37LKe6SeACxbZ7teBX19hjdIxq+cxjbfSnAq/JclMVd3V0+zZxzSS7KS5Xvtu/vYxjQeSvIrmj8T+G5SkiWVPEtJo+ZiGNCQDShotH9OQhmRASaPlYxrSkAwoabR8TEMakgEljZaPaUhDMqCkEfIxDWl4A91mLml4PqYhDcdPUJKkTjKgJEmdZEBJkjrJgJIkdZIBJUnqJANKktRJBpQkqZMMKElSJxlQkqROMqAkSZ1kQEmSOsmAkiR1kgElSeokA0qS1EkGlCSpkwwoSVInGVCSpE4yoCRJnTRQQCXZnuSeJHNJ9iyyfkOS69v1NyeZapefmuTzSb6T5COrW7okaZItG1BJ1gFXAucBW4BdSbb0NbsYeLSqzgL2Ale0y58Afg345VWrWJJ0XBjkE9Q5wFxVHaqqJ4HrgB19bXYA17TT+4FtSVJV362q/0UTVJIkDWyQgDoNuL9n/nC7bNE2VXUEeAw4dTUKlCQdnwYJqCyyrIZos/QLJLuTzCaZnZ+fH3Qz6ZjgNVxpOIME1GHgjJ7504EHlmqTZD1wMvDIoEVU1b6qmq6q6Y0bNw66mdR5XsOVhjdIQN0CbE6yKckJwE5gpq/NDHBRO30+cFNVDfwJSppgXsOVhrR+uQZVdSTJJcCNwDrg6qo6mORyYLaqZoCrgGuTzNF8ctq5sH2S+4CXACckeSfwtqq6a/XfitRJi13DPXepNu3xtnAN96FBXiDJbmA3wJlnnrnSeqXOWDagAKrqAHCgb9llPdNPABcsse3UCuqTjnUjv4ZbVfuAfQDT09OeudDEsCcJabRGfg1XmlQGlDRaXsOVhjTQKT5Jw/EarjQ8A0oaMa/hSsPxFJ8kqZMMKElSJxlQkqROMqAkSZ1kQEmSOsmAkiR1kgElSeokA0qS1EkGlCSpkwwoSVInGVCSpE4yoCRJnWRASZI6yYCSJHWSASVJ6iQDSpLUSQaUJKmTDChJUicZUJKkTjKgJEmdZEBJkjrJgJIkddJAAZVke5J7kswl2bPI+g1Jrm/X35xkqmfdpe3ye5L85OqVLh0bPH6k4SwbUEnWAVcC5wFbgF1JtvQ1uxh4tKrOAvYCV7TbbgF2AmcD24H/3P486bjg8SMNb5BPUOcAc1V1qKqeBK4DdvS12QFc007vB7YlSbv8uqr6XlX9JTDX/jzpeOHxIw0pVXX0Bsn5wPaq+hft/HuAc6vqkp42d7ZtDrfz9wLnAh8EvlRVH2uXXwV8pqr2973GbmB3O/tK4J6Vv7XneRnw0Ah+7mrpcn1drg1GX9+PVtXGYTacoOPnaLry+9GFOrpQA3SrjpOGPX7WD9AmiyzrT7Wl2gyyLVW1D9g3QC1DSzJbVdOjfI2V6HJ9Xa4NOl/fRBw/R9OV/d+FOrpQQwfrmBp2+0FO8R0GzuiZPx14YKk2SdYDJwOPDLitNMk8fqQhDRJQtwCbk2xKcgLNRduZvjYzwEXt9PnATdWcO5wBdrZ3KW0CNgP/e3VKl44JHj/SkJY9xVdVR5JcAtwIrAOurqqDSS4HZqtqBrgKuDbJHM1ffjvbbQ8m+SRwF3AEeF9VPT2i97KcsZ0CGVCX6+tybdDh+ibo+Dmaruz/LtTRhRpgQupY9iYJSZLGwZ4kJEmdZEBJkjppogMqybokX0nyx+38prYrmb9ou5Y5YUx1nZJkf5KvJbk7yT9O8oNJPtvW9tkkLx1HbW19/zrJwSR3JvlEkhPHue+SXJ3kW+3zQgvLFt1fafyntnug25O8dq3qnHRJzkjy+fZ39mCSX2yXfzDJN5Lc1n791BrUcl+SO9rXm22XrekxlOSVPe/5tiTfTvJLa7E/unJMLFHHv2v/b7s9yaeTnNIun0ryNz375aPL/fyJDijgF4G7e+avAPZW1WbgUZouZsbhPwJ/UlV/H/iHNDXuAT7X1va5dn7NJTkN+FfAdFW9iubC/k7Gu+9+n6arn15L7a/zaO5220zz8OpvrVGNx4MjwL+pqn8AvB54X/6226a9VbW1/TqwRvX80/b1Fp73WdNjqKruWXjPwOuAx4FPt6tHvT9+n24cE4vV8VngVVX1auD/AJf2rLu3Z7+8d7kfPrEBleR04O3A77bzAd5C05UMNF3LvHMMdb0EeBPNnVtU1ZNV9dc8t7ubsdTWYz3w/WmeyXkx8CBj3HdV9T9o7m7rtdT+2gH8l2p8CTglyQ+vTaWTraoerKovt9P/j+YPq9PGW9VzjPMY2kbzn+9frcWLdeWYWKyOqvrTqjrSzn6J5vm9oUxsQAH/AfgA8Ew7fyrw1z077jDjObheDswDv9eefvzdJCcBf7eqHoTmPwLgh8ZQG1X1DeA3ga/TBNNjwK10Y9/1Wmp/nQbc39OuC7VOnDQ9rr8GuLlddEl7SufqNTo9XcCfJrk1TVdPMN5jaCfwiZ75td4f0M1j4p8Dn+mZ39T+v/eFJG9cbuOJDKgkPw18q6pu7V28SNNx3GO/Hngt8FtV9Rrgu4zpdN5i2oNpB7AJ+BHgJJpTBP26+nxCV/6dJ1aSHwA+BfxSVX2b5pTR3wO20vxR8+/XoIwfr6rX0vxuvi/Jm9bgNRfVXo99B/Bf20Xj2B9HM5ZjIsmv0pwW/ni76EHgzPb/vfcDf9CeUVrSRAYU8OPAO5LcR9N79FtoPlGd0p62gvF1G3MYOFxVC3957qcJrG8ufOxuv39rDLUB/DPgL6tqvqqeAv4Q+Cd0Y9/1Wmp/2T3QCCX5Pppw+nhV/SFAVX2zqp6uqmeA32ENelyvqgfa79+iue5zDuM7hs4DvlxV32xrWvP90erMMZHkIuCngQvbXlFoe+V/uJ2+FbgXeMXRfs5EBlRVXVpVp7edFO6k6TrmQuDzNF3JQNO1zH8bQ23/F7g/ySvbRdtoegro7e5mLLW1vg68PsmL2+t2C/WNfd/1WWp/zQA/39659HrgsYXTHlqZ9vfhKuDuqvpwz/Le6xnvAu7s33aV6zgpyd9ZmAbe1r7muI6hXfSc3lvr/dGjE8dEku3ArwDvqKrHe5ZvTDueWZKX09y0ceioP6yqJvoLeDPwx+30y2n6Mpuj+Ti+YUw1bQVmgduBPwJeSnON7HPAX7Tff3CM++xDwNdoDqxrgQ3j3Hc0B/+DwFM0fw1evNT+ojmdcSXNX2d30NyNOPbfw0n4At5Ac2roduC29uun2t+RO9rlM8APj7iOlwNfbb8OAr/aLl/zY4jmJqKHgZN7lo18f3TlmFiijjmaa14LvyMfbdv+bPvv9VXgy8DPLPfz7epIktRJE3mKT5J07DOgJEmdZEBJkjrJgJIkdZIBJUnqJANKktRJBpQkqZP+P70GC5nyfUbBAAAAAElFTkSuQmCC\n", | |
| "text/plain": [ | |
| "<matplotlib.figure.Figure at 0x1a1893fa90>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Visualization for sklearn LDA\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAEChJREFUeJzt3V+Infldx/H3t4lrL6wtmLmQJDarZovjH1gdtoVeuNIiyRaSm6KbUrQSGopG0S1CpLLKirC2F0Uh/knLEiu4Ie1FHWwkF3alILt2Z6kuzS6RMa5mGmGn27IgRWPk68WcrIezJznPnPzmzPc55/2CwDzP+e0zPz77e57P75x0TyMzkSSpmrfs9gQkSRrHgpIklWRBSZJKsqAkSSVZUJKkkiwoSVJJFpQkqSQLSpJUkgUlSSpp72794n379uWhQ4d269ffkxdeeOGbmbm02/MYZp7tmWlb5tnWIuS5awV16NAh1tbWduvX35OI+LfdnsMo82zPTNsyz7YWIU8/4lPvRcRTEfFqRHz9Dq9HRPxRRKxHxIsR8ZOznqOk7bOgNA/OA0fu8vpR4PDgzyngT2YwJwlwA3UvLCj1XmZ+BfjWXYYcBz6XW54D3hER3z+b2UluoKY1saBsf82B/cD1oeONwTmN4T3flhuo6XV5B3Ue278pHwAzF2POjf0/QouIUxGxFhFrm5ubOzytss7jPT9LnTdQi7Y+JxaU7b8jzuMDYJY2gINDxweAG+MGZua5zFzJzJWlpVL/q+KZ8Z6fuc4bqEVbny3+DsqPT7bJB8DMrQK/MHhn+h7g9cz8j92eVI+542+r8wZq0bQoKD8+aa/TA8A8t0TE08CzwLsiYiMiTkbExyLiY4Mhl4BrwDrwGeCXd2mq88Idf1tuoO6gxX+ou62PT4BzACsrK2MXtICODwDz3JKZJya8nsCvzGg6i8Ad/zYMNlAPA/siYgP4HeC7ADLzT9naQD3C1gbqO8Av7c5M62lRUKvA6Yi4ALwb278FHwCqzHt+G9xATW9iQe10+x8686XtzfguXnnyA82utcumfgCYZ3uLlmnfd/yL9u9rp+1mnhMLyvZvr+8PAM0373lVsWtfFrvIfABI0mR+1ZEkqSQLSpJUkgUlSSrJgpIklWRBSZJKsqAkSSVZUJKkkiwoSVJJFpQkqSQLSpJUkgUlSSrJgpIklWRBSZJKsqAkSSVZUJKkkiwoSVJJFpQkqSQLSpJUkgUlSSrJgpIklWRBSZJKsqAkSSVZUJKkkiwoSVJJFpQkqSQLSpJUkgUlSSrJgpIklWRBSZJKsqAkSSVZUJKkkiwoSVJJFpQkqSQLSpJUkgUlSSrJglLvRcSRiLgaEesRcWbM6z8QEc9ExNci4sWIeGQ35ilpeywo9VpE7AHOAkeBZeBERCyPDPtt4GJmPgg8CvzxbGepRecmajoWlPruIWA9M69l5k3gAnB8ZEwC3zv4+e3AjRnOr5d8oLbjJmp6nQrKxdqWeTa1H7g+dLwxODfsd4EPR8QGcAn41dlMrZ98oDbnJmpKEwvKxdqWeTYXY87lyPEJ4HxmHgAeAf4iIsau/Yg4FRFrEbG2ubnZeKq94QO1rWabqEVbn13eQblY2zLPtjaAg0PHB3hzXieBiwCZ+SzwVmDfuItl5rnMXMnMlaWlpR2Ybi/4QG2r2SZq0dZnl4JysbblR1JtPQ8cjoj7I+I+tt5xro6M+XfgfQAR8SNsFdTCLsAOfKC21XQTtUi6FJSLta1meVr4kJm3gNPAZeBltj4avRIRT0TEscGwjwMfjYh/Ap4GPpKZo5nr//lAbctN1JT2dhjTdbEega3FGhG3F+urLSY5Z5rlmZnngHMAKysrC/vAzcxLbL3THD73+NDPLwHvnfW8euyNByrwDbYeqB8aGXP7gXreB+rdZeatiLi9idoDPHV7EwWsZeYqW5uoz0TEb7C1YXUTRbeCcrG2ZZ4qzQdqe26ipjOxoFysbZmn+sAHqiro8g7KxdqYeUrSZH6ThCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEGp9yLiSERcjYj1iDhzhzE/FxEvRcSViPjLWc9R0vZZUOq1iNgDnAWOAsvAiYhYHhlzGPgt4L2Z+aPAr898olpobqKm06mgDFeFPQSsZ+a1zLwJXACOj4z5KHA2M78NkJmvzniOveM9346bqOlNLCjDbc+bv6n9wPWh443BuWEPAA9ExN9HxHMRcWRms+sh7/nm3ERNqcs7KMNtyJu/uRhzLkeO9wKHgYeBE8BnI+IdYy8WcSoi1iJibXNzs+lEe8R7vq1mm6hFW59dCspw2/Lmb2sDODh0fAC4MWbMX2Xm/2TmvwJX2SqsN8nMc5m5kpkrS0tLOzLhHvCeb6vZJmrR1meXgjLctrz523oeOBwR90fEfcCjwOrImC8CPwMQEfvYyvfaTGfZL97zbTXdRC2SLgVluG158zeUmbeA08Bl4GXgYmZeiYgnIuLYYNhl4LWIeAl4BvjNzHxtd2bcC97zbbmJmlKXgjLctrz5G8vMS5n5QGb+UGb+/uDc45m5Ovg5M/OxzFzOzB/PzAu7O+PyvOcbchM1vb2TBmTmrYi4He4e4Knb4QJrg4fAZeBnB+H+L4Z7N2/c/MA32Lr5PzQy5otsvXM6782vWfOeby8zLwGXRs49PvRzAo8N/mhgYkGB4bbkza8+8J5XBZ0KSm1580vSZH7VkSSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoNR7EXEkIq5GxHpEnLnLuA9GREbEyizn10dm2pZ5TqdTQRluW+bZTkTsAc4CR4Fl4ERELI8Z9zbg14B/mO0M+8dM2zLP6U0sKMNtyzybewhYz8xrmXkTuAAcHzPu94BPAv81y8n1lJm2ZZ5T6vIOynDbMs+29gPXh443BufeEBEPAgcz868nXSwiTkXEWkSsbW5utp1pfzTNVO3yXLT12aWgDLct82wrxpzLN16MeAvwaeDjXS6WmecycyUzV5aWlhpNsXeaZeoaBRrmuWjrs0tBGW5b5tnWBnBw6PgAcGPo+G3AjwF/FxGvAO8BVv17vbtqlqlrFHCNTq1LQRluW+bZ1vPA4Yi4PyLuAx4FVm+/mJmvZ+a+zDyUmYeA54Bjmbm2O9PtBTNtyzyn1KWgDLct82woM28Bp4HLwMvAxcy8EhFPRMSx3Z1dP5lpW+Y5vb2TBmTmrYi4He4e4Knb4QJrmbl69ytomHm2l5mXgEsj5x6/w9iHZzGnvjPTtsxzOhMLCgy3NfOUpMn8JglJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkqyoCRJJVlQkqSSLChJUkkWlCSpJAtKklSSBSVJKsmCkiSVZEFJkkrqVFARcSQirkbEekScGfP6YxHxUkS8GBF/GxHvbD/V+WGebZlne2balnlOZ2JBRcQe4CxwFFgGTkTE8siwrwErmfkTwBeAT7ae6Lwwz7bMsz0zbcs8p9flHdRDwHpmXsvMm8AF4PjwgMx8JjO/Mzh8DjjQdppzxTzbMs/2zLQt85xSl4LaD1wfOt4YnLuTk8DfjHshIk5FxFpErG1ubnaf5Xwxz7aa5QlmOuAabcs8p9SloGLMuRw7MOLDwArwqXGvZ+a5zFzJzJWlpaXus5wv5tlWszzBTAdco22Z55T2dhizARwcOj4A3BgdFBHvBz4B/HRm/neb6c0l82zLPNsz07bMc0pd3kE9DxyOiPsj4j7gUWB1eEBEPAj8GXAsM19tP825Yp5tmWd7ZtqWeU5pYkFl5i3gNHAZeBm4mJlXIuKJiDg2GPYp4HuAz0fEP0bE6h0ut/DMsy3zbM9M2zLP6XX5iI/MvARcGjn3+NDP7288r7lmnm2ZZ3tm2pZ5TsdvkpAklWRBSZJKsqAkSSVZUJKkkiwoSVJJFpQkqSQLSpJUkgUlSSrJgpIklWRBSZJKsqAkSSVZUJKkkiwoSVJJFpQkqSQLSpJUkgUlSSrJgpIklWRBSZJKsqAkSSVZUJKkkiwoSVJJFpQkqSQLSpJUkgUlSSrJgpIklWRBSZJKsqAkSSVZUJKkkiwoSVJJFpQkqSQLSpJUkgUlSSrJgpIklWRBSZJKsqAkSSVZUJKkkvbu9gSkRXbozJeaXeuVJz/Q7Frqj3leQ76DkiSV5DsoSTM1zzt+tdWpoCLiCPCHwB7gs5n55Mjr3w18Dvgp4DXg5zPzlbZTnU7Fm8E8t5hnXWbalnlOZ+JHfBGxBzgLHAWWgRMRsTwy7CTw7cz8YeDTwB+0nui8MM+2zLM9M23LPKfX5e+gHgLWM/NaZt4ELgDHR8YcB/588PMXgPdFRLSb5lwxz7bMsz0zbcs8p9TlI779wPWh4w3g3Xcak5m3IuJ14PuAbw4PiohTwKnB4X9GxNVtzHXf6PVGxfR7jonXHrn+O6f+TYuR53avXyJPuKdMt7uGtmu7/75KZNrnPEeu3/c8oYfP0C4FNa7Fc4oxZOY54FyH3/nmSUSsZebKNP/sbl573K8bc26u8pzF9Yd/1ZhzU+UJ02c6R3lCgTVqnmPHlL3nd+raXT7i2wAODh0fAG7caUxE7AXeDnyrxQTnkHm2ZZ7tmWlb5jmlLgX1PHA4Iu6PiPuAR4HVkTGrwC8Ofv4g8OXMHLtDlXk2Zp7tmWlb5jmtzJz4B3gE+GfgX4BPDM49ARwb/PxW4PPAOvBV4Ae7XHc7f4BTra85i2svYp6zztQ85y9T8+zXGt2pa8fg4pIkleJXHUmSSrKgJEkllS+oiDgSEVcjYj0izjS+9lMR8WpEfL3ldSvbyTwH1zfTttc2z7bXNs+2197RPEsXVMevCLkX54EjDa9X2gzyBDN1jd4D82yr73mWLii6fUXI1DLzKyzWf2uwo3mCmeIavVfm2Vav86xeUOO+ImT/Ls1lHphne2balnm21es8qxdU56+oUSfm2Z6ZtmWebfU6z+oF1eUrQtSdebZnpm2ZZ1u9zrN6QXX5ihB1Z57tmWlb5tlWr/MsXVCZeQs4DVwGXgYuZuaVVtePiKeBZ4F3RcRGRJxsde2KdjpPMFNco/fEPNvqe55+1ZEkqaTS76AkSYvLgpIklWRBSZJKsqAkSSVZUJKkkiwoSVJJFpQkqaT/A93ym5ILvNrsAAAAAElFTkSuQmCC\n", | |
| "text/plain": [ | |
| "<matplotlib.figure.Figure at 0x1a1806a358>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "data": { | |
| "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAEh1JREFUeJzt3X+MZWV9x/H3p7tlqbSCLmtj+eFg2dCutlW7Bdtak0q1i2lZTSFdbJSkNLSpm9ZaE5cYCRL/ob82baQ/aKHSrRFwK+0krEUjJk0bXRkUhRW3DviDAapLQSwaxNVv/7hn8OZyh73Lzpz7LPN+JTdzznOeO/c7l/Ps557nnHtIVSFJUmt+YNoFSJI0jgElSWqSASVJapIBJUlqkgElSWqSASVJapIBJUlqkgElSWqSASVJatLaaRcw6sQTT6yZmZlplyGNddtttz1YVRumXcckHEtq1aTjqLmAmpmZYW5ubtplSGMl+fK0a5iUY0mtmnQcOcUnSWqSASVJapIBJUlqkgElSWqSASVJapIBJUlqkgElSWqSASVJapIBJUlqkgElSWqSAaVVaWbHTdMuQVo2Mztuekbu0waUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpTUoyRbkuxPMp9kx5jt65Jc323fm2RmZPupSR5N8ra+apamxYCSepJkDXAlcA6wCbggyaaRbhcBD1fV6cBO4IqR7TuBD610rVILDCipP2cC81V1T1U9DlwHbB3psxW4tlveDZydJABJXgfcA+zrqV5pqgwoqT8nAfcOrS90bWP7VNVB4BFgfZLjgLcD73qqF0hycZK5JHMHDhxYtsKlaTCgpP5kTFtN2OddwM6qevSpXqCqrqqqzVW1ecOGDU+zTKkNa6ddgLSKLACnDK2fDNy/RJ+FJGuB44GHgLOA85L8CXAC8L0kj1XVe1a+bGk6DCipP7cCG5OcBtwHbAPeMNJnFrgQ+DhwHnBLVRXwS4sdklwGPGo46Zluoik+L42Vjlx3Tmk7cDNwF3BDVe1LcnmSc7tuVzM45zQPvBV40niTVotDHkENXRr7agbTD7cmma2qzw11e+LS2CTbGFwa+5tD2700VgKqag+wZ6Tt0qHlx4DzD/E7LluR4qTGTHIE5aWxkqTeTRJQXhorSerdJAHlpbGSpN5NchWfl8ZKkno3SUB5aawkqXeHDKiqOphk8dLYNcA1i5fGAnNVNcvg0thd3aWxDzEIMUmSnraJvqjrpbGSpL55Lz5JUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDCipR0m2JNmfZD7JjjHb1yW5vtu+N8lM135mktu7x2eSvL7v2qW+GVBST5KsAa4EzgE2ARck2TTS7SLg4ao6HdgJXNG13wlsrqqXAFuAv0uytp/KpekwoKT+nAnMV9U9VfU4cB2wdaTPVuDabnk3cHaSVNW3qupg134sUL1ULE2RASX15yTg3qH1ha5tbJ8ukB4B1gMkOSvJPuAO4PeGAusJSS5OMpdk7sCBAyvwJ0j9MaCk/mRM2+iR0JJ9qmpvVb0I+DngkiTHPqlj1VVVtbmqNm/YsOGIC5amaaKA8sSutCwWgFOG1k8G7l+qT3eO6XjgoeEOVXUX8E3gxStWqdSAQwaUJ3alZXMrsDHJaUmOAbYBsyN9ZoELu+XzgFuqqrrnrAVI8gLgDOBL/ZQtTcckR1Ce2JWWQTcWtgM3A3cBN1TVviSXJzm363Y1sD7JPPBWYHHG4hXAZ5LcDtwI/H5VPdjvXyD1a5KjmXEnds9aqk9VHUyyeGL3wSRnAdcALwDeuNSJXeBigFNPPfVw/wbpqFFVe4A9I22XDi0/Bpw/5nm7gF0rXqDUkEmOoDyxK0nq3SQB5YldSVLvJgkoT+xKknp3yHNQ3TmlxRO7a4BrFk/sAnNVNcvgxO6u7sTuQwxCDAYndnck+Q7wPTyxK0ma0ESXfHtiV5LUN+8kIUlqkgElSWqSASVJapIBJUlqkgElSWqSASVJapIBJUlqkgElSWqSASVJapIBJUlqkgElSWqSASVJapIBJUlqkgElSWqSASVJapIBJUlqkgElSWqSASVJapIBJUlqkgElSWqSAaVVZ2bHTYfVLmk6DChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoqSdJtiTZn2Q+yY4x29club7bvjfJTNf+6iS3Jbmj+/mqvmuXpsGAknqQZA1wJXAOsAm4IMmmkW4XAQ9X1enATuCKrv1B4Ner6qeAC4Fdy13fzI6b/F/eqzkGlNSPM4H5qrqnqh4HrgO2jvTZClzbLe8Gzk6Sqvp0Vd3fte8Djk2yrpeqpSmaKKCcmpCO2EnAvUPrC13b2D5VdRB4BFg/0uc3gE9X1bfHvUiSi5PMJZk7cODAshQuTcshA6r1qQnpKJExbXU4fZK8iMHY+t2lXqSqrqqqzVW1ecOGDU+rUKkVkxxBOTUhHbkF4JSh9ZOB+5fqk2QtcDzwULd+MnAj8KaqunvFq5UaMElArfjUhNMSWgVuBTYmOS3JMcA2YHakzyyDmQaA84BbqqqSnADcBFxSVf/VW8XSlE0SUCs+NeG0hJ7pug9u24GbgbuAG6pqX5LLk5zbdbsaWJ9kHngrsHi+dztwOvDOJLd3j+f1/CdIvVs7QZ/DmZpYcGpCGq+q9gB7RtouHVp+DDh/zPPeDbx7xQuUGjPJEZRTE5Kk3h0yoJyaWNpKfblxqd/Zxxcpj+Yva/plU+mZZZIpPqcmJEm9804SkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYddQE1s+OmscuSpGeWoy6gJEmrgwElSWqSASVJapIBJUlqkgElrVJH20VGh1PvzI6bpv73TaOGPl9z9LUW15ezBgNKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSupJki1J9ieZT7JjzPZ1Sa7vtu9NMtO1r0/ysSSPJnlP33VL0zJRQDmwpCOTZA1wJXAOsAm4IMmmkW4XAQ9X1enATuCKrv0x4J3A23oqV2rCIQPKgSUtizOB+aq6p6oeB64Dto702Qpc2y3vBs5Okqr6ZlX9J4PxJK0akxxBObCkI3cScO/Q+kLXNrZPVR0EHgHWH86LJLk4yVySuQMHDhxBudL0TRJQvQws6RkuY9rqafR5SlV1VVVtrqrNGzZsOJynSs2ZJKBWfGD5qU+rwAJwytD6ycD9S/VJshY4Hniol+qkBk0SUCs+sPzUp1XgVmBjktOSHANsA2ZH+swCF3bL5wG3VNVhHUFJzyRrJ+jzxMAC7mMwsN4w0mdxYH0cB5b0JFV1MMl24GZgDXBNVe1LcjkwV1WzwNXAriTzDD7gbVt8fpIvAc8GjknyOuA1VfW5vv8OqU+HDCgHlrQ8qmoPsGek7dKh5ceA85d47syKFic1aJIjKAeWJKl33klCktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUJANKktQkA0qS1CQDSpLUpIkCKsmWJPuTzCfZMWb7uiTXd9v3JpkZ2nZJ174/ya8uX+nS0cexJE3ukAGVZA1wJXAOsAm4IMmmkW4XAQ9X1enATuCK7rmbgG3Ai4AtwF93v09adRxL0uGZ5AjqTGC+qu6pqseB64CtI322Atd2y7uBs5Oka7+uqr5dVV8E5rvfJ61GjiXpMKSqnrpDch6wpap+p1t/I3BWVW0f6nNn12ehW78bOAu4DPhEVf1z13418KGq2j3yGhcDF3erZwD7j/xPe9pOBB6c4usvpdW6oN3aVqKuF1TVhqfzxFU4lsZpdV+BdmtrtS54+rVNNI7WTvCLMqZtNNWW6jPJc6mqq4CrJqhlxSWZq6rN065jVKt1Qbu1NVjXqhpL4zT43+QJrdbWal2w8rVNMsW3AJwytH4ycP9SfZKsBY4HHprwudJq4ViSDsMkAXUrsDHJaUmOYXCidnakzyxwYbd8HnBLDeYOZ4Ft3ZVJpwEbgU8uT+nSUcexJB2GQ07xVdXBJNuBm4E1wDVVtS/J5cBcVc0CVwO7kswz+LS3rXvuviQ3AJ8DDgJvrqrvrtDfslxanR5ptS5ot7am6lqFY2mcpv6bjGi1tlbrghWu7ZAXSUiSNA3eSUKS1CQDSpLUpFUbUElOSfKxJHcl2ZfkD7v2y5Lcl+T27vHaKdX3pSR3dDXMdW3PTfKRJF/ofj6n55rOGHpfbk/yjSRvmdZ7luSaJF/rvju02Db2PcrAX3W3Cvpskpf1UeNql+SEJLuTfL4baz8/7f24q+uPunF/Z5L3Jzm2u3hlb1fX9d2FLH3U0ux+vERtf9r99/xskhuTnDC0bVlvx7VqA4rBieY/rqqfBF4OvDnfv+3Mzqp6SffYM70S+eWuhsXvGewAPlpVG4GPduu9qar9i+8L8LPAt4Abu83TeM/ey+C2P8OWeo/OYXDl20YGX2T9m55qXO3+Evj3qvoJ4GeAu5jyfpzkJOAPgM1V9WIGF6xsY3BbqZ1dXQ8zuO1UH95Lu/vxuNo+Ary4qn4a+G/gEliZ23Gt2oCqqgeq6lPd8v8xGDgnTbeqQxq+Dc61wOumWMvZwN1V9eVpFVBV/8HgSrdhS71HW4F/qoFPACckeX4/la5OSZ4NvJLBlYlU1eNV9XXa2I/XAj/UfdfsWcADwKsY3F6q17pa3o/H1VZVH66qg93qJxh8J2+xtmW9HdeqDahhGdwx+qXA3q5pe3f4es00ph86BXw4yW0Z3L4G4Eer6gEYBCzwvCnVBoNPSu8fWm/hPYOl36OTgHuH+i3Q/geSo90LgQPAPyb5dJJ/SHIcU96Pq+o+4M+ArzAIpkeA24CvD/3DO+3942jZj38b+FC3vOy1rfqASvLDwL8Ab6mqbzA4ZP5x4CUMdt4/n1Jpv1hVL2NwSP/mJK+cUh1P0s3Nnwt8oGtq5T17KhPdKkjLai3wMuBvquqlwDfpeTpvnO4D1FbgNODHgOMYjLNRLe4fzezHSd7B4FTJ+xabxnQ7otpWdUAl+UEG4fS+qvogQFV9taq+W1XfA/6eKd0xuqru735+jcF5njOBry4eznc/vzaN2hgM5k9V1Ve7Gpt4zzpLvUfeKqh/C8BCVS3OTOxmEFjT3o9/BfhiVR2oqu8AHwR+gcF02eLNC6a9fzS9Hye5EPg14Lfq+1+mXfbaVm1AJQmDufG7quovhtqH53NfD9w5+tweajsuyY8sLgOv6eoYvg3OhcC/9V1b5wKGpvdaeM+GLPUezQJv6q6CejnwyOIUilZGVf0PcG+SM7qmsxncCWPa+/FXgJcneVb378BiXR9jcHupadU1rNn9OMkW4O3AuVX1raFNy387rqpalQ/gFQwOPz8L3N49XgvsAu7o2meB50+hthcCn+ke+4B3dO3rGVzR84Xu53OnUNuzgP8Fjh9qm8p7xiAkHwC+w+DT20VLvUcMph+uBO7uat087X1wNTwYTPvOdfvGvwLPaWQ/fhfweQYfpnYB67px90kGJ/c/AKzrqZZm9+MlaptncK5p8d/Nvx3q/46utv3AOUf6+t7qSJLUpFU7xSdJapsBJUlqkgElSWqSASVJapIBJUlqkgElSWqSASVJatL/A2b5KEliY9AzAAAAAElFTkSuQmCC\n", | |
| "text/plain": [ | |
| "<matplotlib.figure.Figure at 0x1a17086278>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| } | |
| ], | |
| "source": [ | |
| "mylda2 = myLDA(N_K=2)\n", | |
| "\n", | |
| "test_doc = ['Dropbox, the online file storage company, said on Monday that it hoped to raise as much as $648 million in its forthcoming stock market debut, setting up the final stage along its path to becoming the next big publicly traded Silicon Valley darling.',\n", | |
| " 'In an updated prospectus, Dropbox said that it planned to sell 36 million shares between $16 and $18 a share. At the midpoint of that range, the company would be valued at roughly $7.5 billion, factoring in restricted stock units and options. That is down from its most recent valuation of $10 billion from previous private investors.',\n", | |
| " 'Dropbox is likely to begin trading on the Nasdaq stock market — under the ticker symbol “DBX” — by the end of next week.',\n", | |
| " 'United had hoped the sweepstakes approach would “build excitement and a sense of accomplishment.” But after workers deluged the company with hostile comments, the airline said last week that it was “pressing the pause button on these changes.”',\n", | |
| " '“Lotteries in general may be more effective than fixed payments, as people tend to overweigh small probabilities in making decisions,” the paper noted.']\n", | |
| "\n", | |
| "tf_vectorizer2 = CountVectorizer()\n", | |
| "tf2 = tf_vectorizer2.fit_transform(test_doc)\n", | |
| "Theta, Beta = mylda2.lda_cgs(tf2,max_iter=100)\n", | |
| "print(\"Visualization for myLDA via collapsed Gibbs sampling\")\n", | |
| "mylda2.visualization(Theta,Beta,m=10)\n", | |
| "\n", | |
| "tf_vectorizer_sl = CountVectorizer()\n", | |
| "tf_sl = tf_vectorizer_sl.fit_transform(test_doc)\n", | |
| "lda_sl = LatentDirichletAllocation(n_components=2, learning_method='batch')\n", | |
| "lda_sl.fit(tf_sl)\n", | |
| "Theta_sl = lda_sl.transform(tf_sl) / lda_sl.transform(tf_sl).sum(axis=1)[:, np.newaxis]\n", | |
| "Beta_sl = lda_sl.components_ / lda_sl.components_.sum(axis=1)[:, np.newaxis]\n", | |
| "print(\"Visualization for sklearn LDA\")\n", | |
| "mylda2.visualization(Theta_sl,Beta_sl,m=10)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## references\n", | |
| "\n", | |
| "1. David M.Blei, Andrew Y.Ng, Micheal I.Jordan. Latent Dirichlet allocation. http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf\n", | |
| "2. Lecture of topic model delivered by David M.Blei. http://videolectures.net/mlss09uk_blei_tm/.\n", | |
| "3. Chandler May, Alex Clemmer and Benjamin Van Durme. Particle Filter Rejuvenation and Latent Dirichlet Allocation. http://www.aclweb.org/anthology/P14-2073.\n", | |
| "4. William M. Darling. A Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling. http://u.cs.biu.ac.il/~89-680/darling-lda.pdf." | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.6.3" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 2 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment