Feeds:
Posts
Comments

Archive for December, 2010

Min hash outline

Min hash is an implementation of Local Sensitive Hash. We have a set of binary vectors from a high dimensional feature space. The similarity between two vectors is defined as Jaccard similarity: Jaccard(V1,V2)=| V1 & V2 | / | V1 | V2 |. We want to find pairs of vectors of high similarity.
To avoid the time consuming pairwise similarity calculation, a heuristic method is to generate a group of hash values for each vector and only calculate similarities of pairs of vectors which have at least one hash value in common. Min hash is such a hash function, Hmin(V)= min { i | V[P[i]]=1 }, where P[] is a random permutation of features. Min hash function has a good property that
Prob{ Hmin(V1)=Hmin(V2) }=Jaccard(V1,V2). The property guarantees that we wouldn’t miss many pairs.

Advertisements

Read Full Post »

Ubuntu下用tor翻墙

  1. Tor project下载一个tor浏览器。(需要翻墙……)
  2. 解压后用脚本start-tor-browser启动tor浏览器。
  3. 在出来的control  panel的Setup Relaying–>Network 勾选”My ISP blocks connections …”。
  4. 发送标题为”get bridges”的邮件到bridges@torproject.org获取三个bridges地址和端口。将bridge通过control panel地址加入配置。
  5. 连接成功后会出现tor浏览器自带的firefox浏览器。
  6. 可以使用其他浏览器,只需要配置代理服务器Localhost:8118
  7. 最好使用如下的自动代理配置脚本。
    function FindProxyForURL(url, host)
    {
    if (
    dnsDomainIs(host, “facebook.com“)||
    dnsDomainIs(host, “twitter.com“)||
    dnsDomainIs(host, “appspot.com“)||
    dnsDomainIs(host, “blogspot.com“)||
    dnsDomainIs(host, “wordpress.com“)||
    dnsDomainIs(host, “torproject.org“)||
    shExpMatch(url,”*wordpress.com“)||
    shExpMatch(url,”*torproject.org“)
    )
    return “PROXY 127.0.0.1:8118“;
    else
    return “DIRECT”;
    }

Read Full Post »