搜索资源列表
mlct_public
- 这是一个基于Java的分词、N-gram统计、分段 、分句等功能的程序,支持多种语言-This is a Java-based segmentation, N-gram statistics, the sub-clause of the function procedures, multilingual support
SplitCNWord
- 一个中文分词的实现及演示程序,可用与中文和英文词组的分割.-a Chinese word achieve and demonstrate the procedure can be used with the Chinese and English phrase segmentation.
TextCategorization
- 基于朴素贝叶斯算法实现的中文文本分类程序。可以对中文文本进行分类识别,使用时先对分类器进行训练,然后进行识别。该Beta版本仅支持对3类文本进行分类,使用简单的中文分词方法,本程序尚不具备实用性,用于算法研究和改进。-based on Bayesian algorithms to achieve the Chinese text classification procedure. Can the Chinese text classif
chsegc
- chseg c语言写的汉字分词模块,调用方法和使用方法见程序内说明readme.txt。-chseg write Chinese characters Word module, called methods and procedures for use within see Note readme.txt.
findkey.c
- 此程序解决的问题:较好的, 并适应短字符串的中文分词算法.根据词库 发现以换行符分隔的众多标题中的 top N 关键字并以此更新词库.是一个分类分词算法 -this procedure to solve the problem : better, and adapt to the short string of Chinese Segmentation. According thesaurus found in the many s
wordppl
- 本程序采用正向 逆向最大匹配才实现汉字分词-the procedures being used in reverse to get the maximum matching Chinese Word
SQLET_split
- SQLET分词算法,一个C程序,供大家参考。-SQLET segmentation algorithm, a C program, for your reference.
clucene_src_for_chinese
- 汉化CLucene今天,把CLucene的程序改了一下,可以支持汉字了。1.在vc 6编译 2.还不支持分词,但支持汉字,要索引的文本词与词之间用空格隔开。3.只是匆匆改了一下,见demo/IndexFiles.cpp,有问题可以与我联系。有空时改完善些。 -finished CLucene today, CLucene procedures changed a bit in support of the Chinese
NewWord
- 新词自动登录 该程序完成在中文自动分词过程中对词典中没有的词条进行自动的登录功能-new words automatically logged the procedures are completed in the Chinese automatic segmentation of the process not in the dictionary entries for automatic Teng recorded function
cutword
- 用VB与数据库相连接实现中 文分词的程序,采用了正向 最大匹配算法。-using VB and database links, Chinese word segmentation procedures using the largest positive matching algorithm.
code1
- 我做了几个英文分词的程序,java实现的,这是第1个,共享给大家-I make a few words in English sub-procedures, java realize, this is the first one to share to the U.S.
segmenter
- 一个实现简单分词java程序,附有源代码,大家可以参考学习交流一下。-Realize a simple java segmentation procedures, with source code, everyone can refer to learn about.
ChineseSegmenter
- 自己写的一段分词程序,有2部分 第一个是词库的, 第二个是概略的-Their written word for some procedures, Part 2 has the first one is the thesaurus, and the second is a diagrammatic
WordSeg
- 中文分词C++程序,使用前先导入词典Lexicon_full.mdb-Chinese word segmentation C++ Procedures, the use of the dictionary before you import Lexicon_full.mdb
ThesaurusAnalyzer
- lucene中文分词代码 带有19万字的词典 本分词程序的效果取决与词库.您可以用自己的词库替换程序自带的词库.词库是一个文本文件,名称为word.txt. 每一行一个词语,以#开头表示跳过改行.最后保存为UTF-8的文本. -Chinese Segmentation Lucene code with 190,000-word dictionary sub-word depends on the effectiveness
HLSSplit
- 重新写的海量分词研究版的JNI程序,对以前的不能使用import的问题进行修改,没有使用时间限制,做了几个方便用的接口。-Massive re-write sub-word version of the JNI study procedures, should not use the previous issues of import to amend, without the use of time constraints, to d
ChineseTokenizer
- 中文分词源程序,用java开发的,内容比较详细。-Chinese word segmentation source, using java development, content in more detail.
CSharpFenCi
- 用CSharp编写的一个分词小程序,可以用来中文分次,比较好用-CSharp prepared with a sub-word applet can be used to the Chinese sub-times-to-use comparison
dartsplitter
- 这是关于中文分词的有关程序,有正在做中文分词的朋友可以拿去参考一下-This is the Chinese word segmentation on the relevant procedures, it is doing the Chinese word segmentation can be taken to refer to a friend
splittertest
- 这是有关中文分词的程序、若有需要的话可以参考一下,很有参考价值的哦-This is the Chinese word segmentation procedure, if necessary can refer to, oh, a good reference