Misplaced Pages

Second-order co-occurrence pointwise mutual information

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.
Find sources: "Second-order co-occurrence pointwise mutual information" – news · newspapers · books · scholar · JSTOR (November 2010)
This article may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts, without removing the technical details. (January 2016) (Learn how and when to remove this message)
(Learn how and when to remove this message)

In computational linguistics, second-order co-occurrence pointwise mutual information is a semantic similarity measure. To assess the degree of association between two given words, it uses pointwise mutual information (PMI) to sort lists of important neighbor words of the two target words from a large corpus.

History

The PMI-IR method used AltaVista's Advanced Search query syntax to calculate probabilities. Note that the "NEAR" search operator of AltaVista is an essential operator in the PMI-IR method. However, it is no longer in use in AltaVista; this means that, from the implementation point of view, it is not possible to use the PMI-IR method in the same form in new systems. In any case, from the algorithmic point of view, the advantage of using SOC-PMI is that it can calculate the similarity between two words that do not co-occur frequently, because they co-occur with the same neighboring words. For example, the British National Corpus (BNC) has been used as a source of frequencies and contexts.

Methodology

The method considers the words that are common in both lists and aggregate their PMI values (from the opposite list) to calculate the relative semantic similarity. We define the pointwise mutual information function for only those words having f b ( t i , w ) > 0 {\displaystyle f^{b}(t_{i},w)>0} ,

f pmi ( t i , w ) = log 2 f b ( t i , w ) × m f t ( t i ) f t ( w ) , {\displaystyle f^{\text{pmi}}(t_{i},w)=\log _{2}{\frac {f^{b}(t_{i},w)\times m}{f^{t}(t_{i})f^{t}(w)}},}

where f t ( t i ) {\displaystyle f^{t}(t_{i})} tells us how many times the type t i {\displaystyle t_{i}} appeared in the entire corpus, f b ( t i , w ) {\displaystyle f^{b}(t_{i},w)} tells us how many times word t i {\displaystyle t_{i}} appeared with word w {\displaystyle w} in a context window and m {\displaystyle m} is total number of tokens in the corpus. Now, for word w {\displaystyle w} , we define a set of words, X w {\displaystyle X^{w}} , sorted in descending order by their PMI values with w {\displaystyle w} and taken the top-most β {\displaystyle \beta } words having f pmi ( t i , w ) > 0 {\displaystyle f^{\text{pmi}}(t_{i},w)>0} .

The set X w {\displaystyle X^{w}} , contains words X i w {\displaystyle X_{i}^{w}} ,

X w = { X i w } {\displaystyle X^{w}=\{X_{i}^{w}\}} , where i = 1 , 2 , , β {\displaystyle i=1,2,\ldots ,\beta } and
f pmi ( X 1 w , w ) f pmi ( X 2 w , w ) f pmi ( X β 1 w , w ) f pmi ( X β w , w ) {\displaystyle f^{\text{pmi}}(X_{1}^{w},w)\geq f^{\text{pmi}}(X_{2}^{w},w)\geq \cdots f^{\text{pmi}}(X_{\beta -1}^{w},w)\geq f^{\text{pmi}}(X_{\beta }^{w},w)}

A rule of thumb is used to choose the value of β {\displaystyle \beta } . The β {\displaystyle \beta } -PMI summation function of a word is defined with respect to another word. For word w 1 {\displaystyle w_{1}} with respect to word w 2 {\displaystyle w_{2}} it is:

f ( w 1 , w 2 , β ) = i = 1 β ( f pmi ( X i w 1 , w 2 ) ) γ {\displaystyle f(w_{1},w_{2},\beta )=\sum _{i=1}^{\beta }(f^{\text{pmi}}(X_{i}^{w_{1}},w_{2}))^{\gamma }}

where f pmi ( X i w 1 , w 2 ) > 0 {\displaystyle f^{\text{pmi}}(X_{i}^{w_{1}},w_{2})>0} which sums all the positive PMI values of words in the set X w 2 {\displaystyle X^{w_{2}}} also common to the words in the set X w 1 {\displaystyle X^{w_{1}}} . In other words, this function actually aggregates the positive PMI values of all the semantically close words of w 2 {\displaystyle w_{2}} which are also common in w 1 {\displaystyle w_{1}} 's list. γ {\displaystyle \gamma } should have a value greater than 1. So, the β {\displaystyle \beta } -PMI summation function for word w 1 {\displaystyle w_{1}} with respect to word w 2 {\displaystyle w_{2}} having β = β 1 {\displaystyle \beta =\beta _{1}} and the β {\displaystyle \beta } -PMI summation function for word w 2 {\displaystyle w_{2}} with respect to word w 1 {\displaystyle w_{1}} having β = β 2 {\displaystyle \beta =\beta _{2}} are

f ( w 1 , w 2 , β 1 ) = i = 1 β 1 ( f pmi ( X i w 1 , w 2 ) ) γ {\displaystyle f(w_{1},w_{2},\beta _{1})=\sum _{i=1}^{\beta _{1}}(f^{\text{pmi}}(X_{i}^{w_{1}},w_{2}))^{\gamma }}

and

f ( w 2 , w 1 , β 2 ) = i = 1 β 2 ( f pmi ( X i w 2 , w 1 ) ) γ {\displaystyle f(w_{2},w_{1},\beta _{2})=\sum _{i=1}^{\beta _{2}}(f^{\text{pmi}}(X_{i}^{w_{2}},w_{1}))^{\gamma }}

respectively.

Finally, the semantic PMI similarity function between the two words, w 1 {\displaystyle w_{1}} and w 2 {\displaystyle w_{2}} , is defined as

S i m ( w 1 , w 2 ) = f ( w 1 , w 2 , β 1 ) β 1 + f ( w 2 , w 1 , β 2 ) β 2 . {\displaystyle \mathrm {Sim} (w_{1},w_{2})={\frac {f(w_{1},w_{2},\beta _{1})}{\beta _{1}}}+{\frac {f(w_{2},w_{1},\beta _{2})}{\beta _{2}}}.}

The semantic word similarity is normalized, so that it provides a similarity score between 0 {\displaystyle 0} and 1 {\displaystyle 1} inclusively. The normalization of semantic similarity algorithm returns a normalized score of similarity between two words. It takes as arguments the two words, r i {\displaystyle r_{i}} and s j {\displaystyle s_{j}} , and a maximum value, λ {\displaystyle \lambda } , that is returned by the semantic similarity function, Sim(). For example, the algorithm returns 0.986 for words cemetery and graveyard with λ = 20 {\displaystyle \lambda =20} (for SOC-PMI method).

References

Categories: