关于数据仓库的十个最长问的问题_其他_学习笔记★闵涛★计算机学习电脑编程软硬件技巧


	转至繁体中文版	\| 网站首页 \| 图文教程 \| 资源下载 \| 站长博客 \| 图片素材 \| 武汉seo \| 武汉网站优化 \|

关于数据仓库的十个最长问的问题

作者：闵涛文章来源：闵涛的学习笔记点击数：2769 更新时间：2009/4/22 23:21:49

Although there are various approaches to data mining that seem to offer distinct features and benefits, many may not be powerful enough to meet your corporate knowledge discovery needs. But in fact just a few fundamental questions can quickly clarify the business benefits and the power of a data mining system, setting its advantages in a clear perspective. These questions need to be asked both from the view points of business and technical users. However, please note that these questions refer to data mining -- please also see the many benefits of the knowledge access paradigm which uses the patterns discovered by data mining within a PatternWarehouseTM. Here are two sets of "Top Ten Data Mining Questions" from business and technical perspectives. Each question has three parts that together highlight one specific aspect of a data mining system's power and capability. The Top Ten Data Mining Business Questions The top ten business question should be asked by business users about the benefits, quality and usability of the system. They are: Question 1: Business Benefits a) How will this system help us? b) How well does this system work for our industry-specific applications? c) What information can we get that we do not already have? It is essential to ask this question again and again. You should, of course, get new refined information, but it is not enough just to know something -- you should have information that allows you to "act" within the context of your industry. And, you should measure the bottom-line dollar benefits delivered by a data mining system. See the paper "Measuring the Dollar Value f Mined Information" for a framework for this. Question 2: Technical Know-how a) How technically sophisticated do we need to be to use it? b) Can business users operate it without calling the IS group all the time? c) Is it as easy to use as an internet browser? Business users should be empowered with direct, on-demand access to refined knowledge. They should not have to know statistics, yet should be given consistent and correct answers. The system interface should be as easy to use as a web-browser. Question 3: Understandability and Explanations a) Are the results intuitive or difficult to understand? b) Do we get clear explanations for any information item presented? c) Will the explanations be in technical statistical terms or in a form that we can understand? Results should be presented to business users in plain English, accompanied with graphs. The system should be able to explain each piece of information it presents in clear, English-like terms that business users can easily comprehend and use. Question 4: Follow-up Questions a) What kinds of follow-up questions can we ask from the system? b) Do we need to go to an analyst for further question answering? c) How fast can we drill-down on the fly to see more patterns? Response to follow-up questions must be immediate. Business users should not need to use intermediaries such as analysts to get more information after they have seen some results. If follow-up questions take time and involve intermediaries, the business users effectiveness will be impacted. Business users should get refined information, as they need it, when they need it. Question 5: Business Users a) How many business users can this system support? b) Can the business users tailor their own questions for the system? c) Can users utilize the knowledge for day-to-day decision making? The system should be able to use the same fundamental knowledge to support a few hundred business users, each with a different group-perspective. Yet, all of these users must be given consistent answers as they ask their own questions. The information must be presented such that can be utilized for day-to-day actions. Question 6: Accuracy, Completeness and Consistency a) How accurate are the results the system delivers? b) Can some patterns be missed by the system? c) Are the results always consistent or can 100 users get 100 different answers? The system must cover a wide range of patterns and should provide high quality, information. The knowledge provided to business users should be derived from the entire data set (and not samples) in order to increase accuracy. All business users should access the same knowledge so that they all receive consistent answers, increasing the quality of corporate information. Question 7: Incremental Analysis a) Can we automatically analyze weekly / monthly data as it becomes available? b) Can the system compare the "month to month" results and patterns by itself? c) Can we get automatic pattern detection over time, every week or month? The system should analyze data as it becomes available every week or month and perform on-going trend analysis, highlighting the key items and influence factors that impact significant changes. The incremental analysis should be performed automatically in the background, informing the user of significant trends and the underlying causes. Question 8: Data Handling a) How much data can the system deal with? b) Can it work directly on our database, or do we need to extract data? c) If it works on extracts, how do we know that some patterns are not missed? The system should handle moderate to large volumes of data on a powerful server -- of course, large data volumes should not be expected to be managed on small servers. The system should work directly on the SQL database, without extracts so that patterns are not missed and performance is improved. Question 9: Integration a) How will it integrate into our computing environment? b) Will it just work on our existing SQL database? c) How easily will the system work on our intranet? The system should run smoothly on existing open server platforms (e.g. Unix) and popular DBMS engines (e.g. Oracle, Sybase Informix, etc.) on the server. The system should present results to users on the corporate intranet. The absence of data conditioning requirements and extract files will make integration much easier. Question 10: Support Staff a) What staff do I need to keep this system installed and running? b) How do we get support and training to get started? c) What happens after we install the system? After the initial system design, the support personnel for the system should be kept minimal. One database administrator should be able to manage the DBMS, and one analyst should occasionally help in setting up discovery models, etc. Thereafter, business users should be able to use the system on their own. There should be no need for a large number of resident support analyst to act as intermediaries for the business users. The Top Ten Data Mining Technical Questions The top ten technical question should be asked by technical users about the architecture, power and the scalability of the system. They are: Question 1: Architecture a) How are computations distributed between the client and the server? b) Is any data brought from the server to the client? c) Can the system run in a three tiered architecture? The best option is for the discovery to take place entirely on the server. Any attempt to bring data to the client will seriously limit the applicability of the system to larger databases. The best architecture is a thin-client, three-tiered system that uses the power of a large server-based SQL engine but operates on an intranet. Question 2: Access to Real Data a) Does the system work on the real SQL database or on samples and extracts? b) If it samples or extracts, how do we know that it is accurate? c) If it builds flat files, who manages this activity and cleans up for on-going analyses, and how can it sample across several tables? The best option is for a data mining system to work on the real databases and not on samples, extracts and/or flat files. Working on the real database uses the SQL engine's power (e.g. parallel execution) and provide much more accurate results. And, the system should be able to access database tables in their native form, reaching across tables by itself. Question 3: Performance and Scalability a) How large of a database can the system analyze? b) How long does it take to perform discovery on a large database? c) Can the system run in parallel on a multi-processor server? The system should work on databases with a large number of records. It should derive its capabilities from the power of the server and the SQL engine, whenever possible. The system should be able to use the built-in parallelism of the SQL engine, but should also be able to use multiple processors for its own parallel non-SQL computations. Question 4: Multi-Table Databases a) Does the system work on a single table only or can it analyze multiple tables? b) Does the system need to perform a huge join to access all of our tables? c) If it works on a single table, how can we feed it our existing data schema? The real world is full of multi-table databases which can not be joined and meshed into a single view. In fact, the theory of normalization came about because data needs to be in more than one table. Using single tables is an affront to a decade of work on database design. If you challenge the DBA of a really large database to put things in a single table you will either get a laugh or a blank stare -- in many cases the database size will balloon beyond control. The system should be able to mine large multi-table databases directly by itself on the server. Question 5: Multi-Dimensional Analysis a) Does the system analyze data along a single dimension only? b) How are multi-dimensional patterns discovered and expressed by the system? c) How do we specify the dimensional structure of our data to the system? The OLAP phenomenon has conclusively demonstrated that the business world's data is not single-dimensional. Hence a data mining system should be able to automatically disco

[1] [2] 下一页

没有相关教程

教程录入：mintao 责任编辑：mintao

上一篇教程：联机分析处理系统概述

下一篇教程：如何编写InterBaseUDF之一

【字体：小大】【发表评论】【加入收藏】【告诉好友】【打印此文】【关闭窗口】

注：本站部分文章源于互联网，版权归原作者所有！如有侵权，请原作者与本站联系，本站将立即删除！本站文章除特别注明外均可转载，但需注明出处！ [MinTao学以致用网]

　网友评论：（只显示最新10条。评论内容只代表网友观点，与本站立场无关！）

同类栏目

· Sql Server  · MySql
· Access  · ORACLE
· SyBase  · 其他

热门推荐

没有教程

赞助链接

闵涛博文

500 - 内部服务器错误。

您查找的资源存在问题，因而无法显示。

鄂公网安备 42011102001154号

站长：MinTao ICP备案号：鄂ICP备11006601号-18

闵涛站盟:医药大全-武穴网。A打造B、C、D……