文件名称:IR

  • 所属分类:
  • JSP源码/Java
  • 资源属性:
  • [Java] [源码]
  • 上传时间:
  • 2012-11-26
  • 文件大小:
  • 3.64mb
  • 下载次数:
  • 0次
  • 提 供 者:
  • 相关连接:
  • 下载说明:
  • 别用迅雷下载,失败请重下,重下不扣分!

介绍说明--下载内容均来自于网络,请自行研究使用

索引词的选择



1、  切词及词频统计:利用已选择的分词软件对文档进行切词处理,并进行词频统计,形成DocIndex文件,结构为:文档号、频率、词。注意保留中间结果,建立合理的数据结构来存储。

2、  分配词权重: 采用词频标准化(tfi = tfi/Max(tf))和tf*idf两种方式分配词的权重。由DocIndex文件生成DocIndex(tf) 和DocIndex(tf*idf)文件。注意阈值的确定,词的取舍。

3、  形成倒置文档:将DocIndex(tf) 和DocIndex(tf*idf)文件转换为DocInvert(tf) 和DocInvert (tf*idf)文件。-Index word choice, the cut word and word frequency statistics: the use of the selected word segmentation software documentation the cut word processing, and word frequency statistics to the formation DocIndex file structure: document number, frequency, word. Note retain intermediate results, establish a reasonable data structure to store. 2, is assigned the term weight: the using word frequency Standardization (TFI = the TFI/Max (TF)) and tf* idf two ways to allocate the right of the word weight. Generated by DocIndex file DocIndex (tf) and DocIndex (tf* idf) files. Attention to the determination of the threshold, the word choice. 3, the formation of the inverted document: the DocIndex (tf) and DocIndex (tf* idf) files into DocInvert (tf) and DocInvert (tf* idf) files.
(系统自动生成,下载前可以参看下载内容)

下载文件列表

信息检索\ir_work1\.classpath

........\........\.project

........\........\.settings\org.eclipse.core.resources.prefs

........\........\.........\org.eclipse.jdt.core.prefs

........\........\bin\org\main\CreateIndexDocument.class

........\........\...\...\....\CreateInvertDocument.class

........\........\...\...\....\Util$1.class

........\........\...\...\....\Util.class

........\........\...\pojo\Token.class

........\........\src\org\main\CreateIndexDocument.java

........\........\...\...\....\CreateInvertDocument.java

........\........\...\...\....\Util.java

........\........\...\pojo\Token.java

........\paoding\analyzer.bat

........\.......\analyzer.sh

........\.......\build.bat

........\.......\build.xml

........\.......\classes\net\paoding\analysis\analyzer\estimate\Estimate$CToken.class

........\.......\.......\...\.......\........\........\........\Estimate$LinePrintGate.class

........\.......\.......\...\.......\........\........\........\Estimate$PrintGate.class

........\.......\.......\...\.......\........\........\........\Estimate$PrintGateToken.class

........\.......\.......\...\.......\........\........\........\Estimate$StringReaderEx.class

........\.......\.......\...\.......\........\........\........\Estimate.class

........\.......\.......\...\.......\........\........\........\TryPaodingAnalyzer.class

........\.......\.......\...\.......\........\........\impl\CompiledFileDictionaries$1.class

........\.......\.......\...\.......\........\........\....\CompiledFileDictionaries.class

........\.......\.......\...\.......\........\........\....\MaxWordLengthTokenCollector.class

........\.......\.......\...\.......\........\........\....\MostWordsModeDictionariesCompiler$1.class

........\.......\.......\...\.......\........\........\....\MostWordsModeDictionariesCompiler.class

........\.......\.......\...\.......\........\........\....\MostWordsTokenCollector$LinkedToken.class

........\.......\.......\...\.......\........\........\....\MostWordsTokenCollector.class

........\.......\.......\...\.......\........\........\....\SortingDictionariesCompiler.class

........\.......\.......\...\.......\........\........\PaodingAnalyzer.class

........\.......\.......\...\.......\........\........\PaodingAnalyzerBean.class

........\.......\.......\...\.......\........\........\PaodingTokenizer.class

........\.......\.......\...\.......\........\........\TokenCollector.class

........\.......\.......\...\.......\........\Constants.class

........\.......\.......\...\.......\........\dictionary\BinaryDictionary.class

........\.......\.......\...\.......\........\..........\Dictionary.class

........\.......\.......\...\.......\........\..........\DictionaryDelegate.class

........\.......\.......\...\.......\........\..........\HashBinaryDictionary$SubDictionaryWrap.class

........\.......\.......\...\.......\........\..........\HashBinaryDictionary.class

........\.......\.......\...\.......\........\..........\Hit.class

........\.......\.......\...\.......\........\..........\support\detection\Detector$1.class

........\.......\.......\...\.......\........\..........\.......\.........\Detector.class

........\.......\.......\...\.......\........\..........\.......\.........\Difference.class

........\.......\.......\...\.......\........\..........\.......\.........\DifferenceListener.class

........\.......\.......\...\.......\........\..........\.......\.........\ExtensionFileFilter.class

........\.......\.......\...\.......\........\..........\.......\.........\Node.class

........\.......\.......\...\.......\........\..........\.......\.........\Snapshot$InnerNode.class

........\.......\.......\...\.......\........\..........\.......\.........\Snapshot.class

........\.......\.......\...\.......\........\..........\.......\filewords\FileWordsReader.class

........\.......\.......\...\.......\........\..........\.......\.........\ReadListener.class

........\.......\.......\...\.......\........\..........\.......\.........\SimpleReadListener.class

........\.......\.......\...\.......\........\..........\.......\.........\SimpleRea

相关说明

  • 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
  • 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度更多...
  • 请直接用浏览器下载本站内容,不要使用迅雷之类的下载软件,用WinRAR最新版进行解压.
  • 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
  • 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
  • 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.

相关评论

暂无评论内容.

发表评论

*主  题:
*内  容:
*验 证 码:

源码中国 www.ymcn.org