CCKS2017病例标注
CCK2017病例标注,CCKS2017 Task2
数据格式说明:
每个病例分为4个域,分别存储在4个文件夹
一般项目
病史特征
诊疗过程
出院情况
每一个目录下存储两类文件
代码片段和文件信息
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
目录 0 2017-11-22 09:15 CCKS2017
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.git
文件 23 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitHEAD
目录 0 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitranches
文件 268 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitconfig
文件 73 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitdescription
目录 0 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githooks
文件 478 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githooksapplypatch-msg.sample
文件 896 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookscommit-msg.sample
文件 189 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookspost-update.sample
文件 424 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookspre-applypatch.sample
文件 1642 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookspre-commit.sample
文件 1348 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookspre-push.sample
文件 4898 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookspre-rebase.sample
文件 1239 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githooksprepare-commit-msg.sample
文件 3610 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githooksupdate.sample
文件 1960281 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitindex
目录 0 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitinfo
文件 240 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitinfoexclude
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
文件 187 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogsHEAD
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efs
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efsheads
文件 187 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efsheadsmaster
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efs
emotes
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efs
emotesorigin
文件 187 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efs
emotesoriginHEAD
目录 0 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitobjects
目录 0 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitobjectsinfo
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitobjectspack
............此处省略13886个文件信息
# coding:utf-8
import fio
import codecs
import sys
import os
import jieba.posseg as pseg
datadir = “../data2/training dataset v4“
area = [“病史特点“ “出院情况“ “一般项目“ “诊疗经过“]
class CRF_unit:
def __init__(self):
self.features = []
def test_into_aline(self filename):
self.features = []
sentences = fio.ReadFileUTF8(filename);
for sentence in sentences:
for token in sentence:
self.features.append(token)
def get_posTag(self sentence):
words = pseg.cut(sentence)
return words
def get_token(self filename):
self.features = []
sentences = fio.ReadFileUTF8(filename);
for sentence in sentences:
words = self.get_posTag(sentence)
for w in words:
for token in w.word:
feature = [token w.flag “N“]
self.features.append(feature)
def read_type(self itype):
itype = itype.encode(‘utf-8‘)
if itype == “症状和体征“:
return “SIGNS“
if itype == “检查和检验“:
return “CHECK“
if itype == “疾病和诊断“:
return “DISEASE“
if itype == “治疗“:
return “TREATMENT“
if itype == “身体部位“:
return “BODY“
def get_type(self filename):
sentences = fio.ReadFileUTF8(filename);
for sentence in sentences:
words = sentence.split()
print words[-3] + words[-2]
x = int(words[-3])
y = int(words[-2])
#if words[3].encode(‘utf-8‘) == “身体部位“:
itype = self.read_type(words[-1])
self.features[x][2] = “B-“ + itype
for j in range(x+1y+1):
self.features[j][2] = “I-“ + itype
if __name__ == ‘__main__‘:
extractor = CRF_unit()
x = 0;
“““
for i in range(1241):
filename = datadir + ‘/‘ + area[x] + ‘/‘ + area[x] + ‘-‘+ str(i) +‘.txtoriginal.txt‘
extractor.get_token(filename)
filename = datadir + ‘/‘ + area[x] + ‘/‘ + area[x] + ‘-‘+ str(i) +‘.txt‘
extractor.get_type(filename)
filename = datadir + ‘/result/‘ + area[x] + “/“ + ‘1-240_train.txt‘
fio.AddTrain(extractor.features filename)
“““
for i in range(241 301):
filename = datadir + ‘/‘ + area[x] + ‘/‘ + area[x] + ‘-‘+ str(i) +‘.txtoriginal.txt‘
extractor.test_into_aline(filename);
filename = datadir + ‘/result/‘ + area[x] + ‘.testt-‘ + str(i) + ‘.txt‘
fio.AddTest(extractor.features filename)
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
目录 0 2017-11-22 09:15 CCKS2017
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.git
文件 23 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitHEAD
目录 0 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitranches
文件 268 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitconfig
文件 73 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitdesc
目录 0 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githooks
文件 478 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githooksapplypatch-msg.sample
文件 896 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookscommit-msg.sample
文件 189 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookspost-update.sample
文件 424 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookspre-applypatch.sample
文件 1642 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookspre-commit.sample
文件 1348 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookspre-push.sample
文件 4898 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githookspre-reba
文件 1239 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githooksprepare-commit-msg.sample
文件 3610 2017-08-09 10:14 CCKS2017CCKS2017_dataset.githooksupdate.sample
文件 1960281 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitindex
目录 0 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitinfo
文件 240 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitinfoexclude
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
文件 187 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogsHEAD
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efs
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efsheads
文件 187 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efsheadsmaster
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efs
emotes
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efs
emotesorigin
文件 187 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitlogs
efs
emotesoriginHEAD
目录 0 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitob
目录 0 2017-08-09 10:14 CCKS2017CCKS2017_dataset.gitob
目录 0 2017-08-09 10:18 CCKS2017CCKS2017_dataset.gitob
............此处省略13886个文件信息
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件举报,一经查实,本站将立刻删除。
评论列表(条)