转至繁体中文版     | 网站首页 | 图文教程 | 资源下载 | 站长博客 | 图片素材 | 武汉seo | 武汉网站优化 | 
最新公告:     敏韬网|教学资源学习资料永久免费分享站!  [mintao  2008年9月2日]        
您现在的位置: 学习笔记 >> 图文教程 >> 数据库 >> 其他 >> 正文
关于数据仓库的十个最长问的问题         ★★★★


作者:闵涛 文章来源:闵涛的学习笔记 点击数:1512 更新时间:2009/4/22 23:21:49
ver patterns along multiple dimensions. In fact, there are many cases where no single dimensional view can correctly represent the semantics of influence because the influence ratios will always be off regardless of how one aggregates. See the paper: OLAP & Data Mining: Bridging the Gap for a detailed discussion of this. Question 6: Types and Classes of Patterns Discovered a) How powerful and general are the patterns the system can discover and express? b) Can the system mix different pattern types, e.g. influence and affinity patterns? c) Can the system discover time-based patterns and trends? The format of the patterns discovered by the system is very general and goes far beyond decision trees or simple affinities. The advantage to this is that the general rules discovered are far more powerful than decision trees. Decision trees are very limited in that they cannot find all the information in a database. Being rule-based keeps the system from being constrained to one part of a search space and makes sure that many more clusters and patterns are found -- allowing the system to provide more information and better predictions. Question 7: System Initiative a) Does the system use its own initiative to perform discovery or is it guided by the user? b) Can the system discover unexpected patterns by itself? c) Can the system start-up by itself on a weekly or monthly basis and perform discovery? In some cases the user has to interact and guide the system, e.g. build a decision tree. However, a better approach is for the system to use its own initiative in the data mining process, forming hypothesis automatically based on the character of the data. The system should start-up by itself, select the significant patterns in the data and filter the unimportant trends. The analyses should be done routinely on a weekly or monthly basis. Question 8: Treatment of Data Types a) Are all data types handled in their own form or translated to other types? b) Can the system find numeric ranges in data by itself? c) Do a large number of non-numeric values cause problems for the system? The system should manage all data types in a uniform manner and in their native formats, i.e. numbers, dates and constants should remain numbers, dates and constants internally. Interesting ranges in the data should be discovered by the system, not requiring "number bin" construction by the user. A large number of constant values in the database should not choke the system. Question 9: Data Dependencies and Hierarchies a) Can the system be told about the functional dependencies in our database? b) Does the system understand the concept of data hierarchy? c) How does the system use dependencies and/or hierarchies for discovery? The system should be capable of using the functional (and other dependencies) that exist in a database. The use of these dependencies can significantly enhance the power of a discovery -- in fact ignoring them can lead to confusion. The system should understand the concept of hierarchy and should be able to use it for discovery along multiple dimensions. Question 10: Flexibility and Noise Sensitivity a) How brittle is the system when dealing with noisy data? b) How well does the system cope with data exceptions and low quality data? c) Can the system provide statements with flexible numeric ranges discovered by itself in the data? The system should not be sensitive to noise and should internally use fuzzy logic to smooth data brittleness. As the data gathers noise, the system should only reduce the level on confidence associated with the results provided, not suddenly change direction in discovery. However, the system should still produce the most significant findings from the data set, even if noise is present.  

上一页  [1] [2] 

教程录入:mintao    责任编辑:mintao 
  • 上一篇教程:

  • 下一篇教程:
  • 【字体: 】【发表评论】【加入收藏】【告诉好友】【打印此文】【关闭窗口
      注:本站部分文章源于互联网,版权归原作者所有!如有侵权,请原作者与本站联系,本站将立即删除! 本站文章除特别注明外均可转载,但需注明出处! [MinTao学以致用网]

    · Sql Server  · MySql
    · Access  · ORACLE
    · SyBase  · 其他
    热门推荐 更多内容
  • 没有教程
  • 赞助链接
    闵涛博文 更多关于武汉SEO的内容
    500 - 内部服务器错误。

    500 - 内部服务器错误。


    | 设为首页 |加入收藏 | 联系站长 | 友情链接 | 版权申明 | 广告服务

    Copyright @ 2007-2012 敏韬网(敏而好学,文韬武略--MinTao.Net)(学习笔记) Inc All Rights Reserved.
    闵涛 投放广告、内容合作请Q我! E_mail:admin@mintao.net(欢迎提供学习资源)

    站长:MinTao ICP备案号:鄂ICP备11006601号-18
