基于论文[From Word Embedding to Document Distance]的实验
发布日期:2021-05-07 23:34:42 浏览次数:25 分类:精选文章

本文共 1297 字,大约阅读时间需要 4 分钟。

WMD?EMD??????

????????

??Anaconda?????Python 3 64?????????gensim?????????????

conda install -c conda-forge gensim

?GitHub???WMD?????

git clone git@github.com:mkusner/wmd.git

??Google???Word2Vec?????????????1.5GB?tz????????????3GB????????????WMD??????

????????

??get_word_vectors.py????all_twitter_by_line.txt??????Twitter??????????BOW??

python get_word_vectors.py all_twitter_by_line.txt twitter_vec.pk twitter_vec.mat

????????????????????????????????????????????

????EMD??????

EMD???????????????Windows??????

  • ??Mingw-x64??????Visual Studio????VSCode?

  • ??SWIG??????forge??????????????PATH?

  • ??Python DLL?????

    • ?python37.dll????Python??DLL??????????
    • ??Mingw?mingw-w64.bat?????
  • ??Python???

    gendef python37.dll

    ??python37.def???

  • ??????

    dlltool --dllname python37.dll --def python37.def --output-lib libpython37.a

    ?libpython37.a???D:\ProgramData\Anaconda3\libs????

  • ??EMD??

    gcc -o emd.o -emd.c -fPIC -I D:/ProgramData/Anaconda3/include
    swig -python emd.i
    gcc -o emd.o -emd.c -fPIC -I D:/ProgramData/Anaconda3/include
    gcc -shared -L D:\ProgramData\Anaconda3\libs\ -o _emd.pyd emd.o emd_wrap.o -lpython37

    ?emd.py?_emd.pyd???????????

  • ?????????????????

    ???????WMD??

    ???????????WMD????????

    python wmd_distance.py twitter_vec.pk all_twitter_by_line.txt

    ????????20????????????

    ?????????????WMD?EMD???????????????

    上一篇:MatchZoo 文本匹配工具包
    下一篇:基于K-Means算法的应用 User Clustering

    发表评论

    最新留言

    感谢大佬
    [***.8.128.20]2025年05月08日 20时52分21秒