刑事技术 ›› 2015, Vol. 40 ›› Issue (5): 345-352. doi: 10.16467/j.1008-3650.2015.05.001

• 专题研究:法医遗传学 •    下一篇

2 DNA数据库数据挖掘应用研究

刘冰   

  1. 公安部物证鉴定中心, 北京 100038
  • 收稿日期:2015-07-23 出版日期:2015-10-25 发布日期:2015-10-26
  • 作者简介:刘 冰(1974—),男,黑龙江齐齐哈尔人,副主任法医师,硕士,研究方向为法医遗传学。
  • 基金资助:
    中央级公益性科研院所基本科研业务费项目(No.2013JB019)

Data Mining of the National DNA Database

LIU Bing   

  1. Institute of Forensic Science, Ministry of Public Security, Beijing 100038, China
  • Received:2015-07-23 Online:2015-10-25 Published:2015-10-26

摘要: 始建于2003年的全国公安机关DNA数据库目前已聚集了大量数据信息,除DNA分型等技术数据外,还包括犯罪的时间、空间、类别、手段以及涉案人员的地域、民族、行为等多个维度的海量数据。将数据挖掘引入DNA数据库的应用,通过分类、估计、预测、相关性分组、关联规则、聚类分析等方法,可以实现对DNA数据库中DNA分型、人员背景和行为、案件特征等复杂类型数据的进一步挖掘。本文采用聚类分析的方法,对DNA数据库中2011~2014年采集的数据信息从犯罪的时间、空间、类别等维度进行了初步分析,共超过45万起刑事案件、超过2000万个违法犯罪人员和超过100万条通报。包括:杀人、抢劫、盗窃、强奸等4类案件的时间、空间分布;数据库中4类涉案人员的地域分布情况分析;数据库人员重复采集情况分析等。文章同时对DNA数据库应用数据挖掘技术做了SWOT分析。虽然受到基础数据条件的限制,上述分析还存在诸多不足,但是数据挖掘是一个具有广阔应用前景和富有挑战性的新兴技术,将其引入DNA数据库的管理和应用中是信息化社会开放思维的体现,也是DNA数据库面对挑战,不断自我完善和发展的一种选择。随着DNA数据库的数据总量的增长、数据覆盖范围的扩大和数据质量的提高,通过数据挖掘,联机分析处理等相对成熟的信息化手段,文中的分析模式可以在动态条件下和更深层次中实现,如基于人员、案件背景信息分析的典型犯罪行为在时空中的分布呈现、演化及预测,基于DNA和身份信息查重的高危人群与时间、空间等维度的动态关系分析和预警等。DNA数据库数据挖掘的情报产品所具有的实时性、可靠性,特别是人员身份识别的识别精准性,使其在犯罪规律研究、犯罪动态分析、公共安全管理决策等领域具有特殊的潜力和价值。

关键词: 法医遗传学, DNA数据库, 数据挖掘, 聚类分析

Abstract: Until present, China national DNA database has already gathered tens of millions of data, including not only the DNA profiles but also a large amount of information related to the time, space, means, type of the committed crime and the residence, nationality, individual behavior of the suspect. With the growing needs of public security, the data are still in rapid accumulation and growth. From 2011 to 2013, the database collected relevant data covering over 79.25% of murder and 40.53% of rape cases filed. Currently, the main use of the DNA database is personal identification, not fully tapping its data value. Data mining can provide assistance in conceptual formation and accuracy, exploration on regularity and pattern, modeling and the other useful knowledge. Using the methods of classification, estimation, prediction, affinity grouping, association rules and cluster analysis, data mining can fulfill a deep analysis of the intricate data in the DNA database, like the DNA profiles, the relevant information of cases, the background and behaviors of individual suspects. By resorts of cluster analysis, this paper attempts to obtain a preliminary analysis at multiple dimensions of time, space, type of crime. The analyzed data covered over 0.45 million criminal cases, 20 million individuals and 1 million matched reports, which were collected and produced in the past four years. The analysis is made up of three parts: the distribution of four kinds of crime (murder, robbery, theft, rape);the residence distribution of the offenders involved into the four kinds of crime;the situation of offenders resampled in the national DNA database. This study also carried out a SWOT (strengths, weaknesses, opportunities, threats) analysis on the application of data mining in the national DNA database. Data mining is an emerging technology of wide prospect. Its usage into the management and application of the national DNA database conforms to the open-mindedness of the information society, in favor of the improvement and development of the database itself. However, the above analysis is not perfect due to the limitations of underlying conditions. Through the combined application of the established means of data mining plus online analytical processing (OLAP), the attempts hereof can be continuously elevated along with the other analyses under dynamic and deep-reaching conditions. Therefore, the criminal time and space distribution will be defined more clearly, evolution and prediction of typical crime given more timely based on the personal and crime background, and the dynamics and early detection of high-risk criminal groups tracked more tightly with the DNA hunting and ID checking. Ideally, the DNA database can provide real-time, reliable and accuracy-high personal identification intelligence, showing its particular potential and value in the study of criminal pattern and dynamics, public security management decision and other involved aspects.

Key words: forensic genetics, DNA database, data mining, clustering analysis

中图分类号: