The 10 longest questions about the Data Warehouse

Source: Internet
Author: User
Tags numeric one table ranges access database
Data | Problem Although there are various approaches to data mining that seem to offer distinct features and benefits, many may no T is powerful enough to meet your corporate knowledge discovery. But in fact just a few fundamental questions can quickly clarify the business benefits and the power of a data mining Syst EM, setting its advantages in a clear perspective. These questions need to is asked both from the view points of business and technical users. However, please note this questions refer to data mining--Please also the many benefits s paradigm which uses the patterns discovered by data mining within a Patternwarehousetm. Here are two sets of ' top Ten Data Mining Questions ' from business and technical perspectives. Each question has three parts this together highlight one specific aspect of a data mining system ' s power and capability. The top Ten Data Mining Business Questions The top Ten Business question should is asked by Business users aboutThe benefits, quality and usability of the system. They are:question 1:business Benefits A) How would this system help us? b How are you does this system work to our industry-specific applications? c) What information can we don't already have? It is essential to ask this question again and again. You are should, of course, get new refined information, but it isn't enough just to know something--you should have Informa tion that allows for "act" within the context of your industry. And, you should measure the bottom-line dollar benefits delivered by a data mining system. "The paper" Measuring the Dollar Value F mined Information "for a framework for this. Question 2:technical Know-how a) How technically sophisticated does we need to is to use it? b Can business users operate it without calling the "is" group all? C Is it it as easy to use as an Internet browser? Business users should be empowered with direct, on-demand access to refined knowledge. They should not have To know statistics, yet should is given consistent and correct answers. The system interface should is as easy to use as a web-browser. Question 3:understandability and explanations a) Are the results intuitive or difficult to understand? b Do we have clear explanations for any information item presented? c) Would the explanations be in technical statistical terms or in a form that we can understand? Results should is presented to business users in plain 中文版, accompanied with graphs. The system should is able to explain all piece of information it presents in clear, english-like terms that business user S can easily comprehend and use. Question 4:follow-up Questions a) What kinds of follow-up Questions can we ask from the system? b does we need to go to a analyst for further question answering? C How fast can we drill-down on the "fly to" more patterns? Response to follow-up questions must is immediate. Business users should not need to use intermediaries such as analysts to get mOre information after they have seen some. If follow-up questions take time and involve intermediaries, the business users effectiveness would be impacted. Business users should get refined information, as they need it, when they need it. Question 5:business users a) How many Business the users can this system support? b Can the business users tailor their own questions for the system? c) Can users utilize the knowledge for day-to-day decision making? The system should is able to use the same fundamental knowledge to support a few hundred business users, each with a Diffe Rent group-perspective. Yet, all of these users must is given consistent answers as they ask their own. The information must is presented such that can is utilized for day-to-day actions. Question 6:accuracy, completeness and consistency a) how accurate are the results the system delivers? b Can Some patterns be missed by the system? c) Are the results always consistent or can users get-MB differentAnswers? The system must cover a wide range of patterns and should provide high quality, information. The knowledge provided to business users should is derived from the entire data set (and not samples) in order to increase accuracy. All business users should access the same knowledge so and they all receive consistent answers, increasing the quality of Corporate information. Question 7:incremental analysis a) Can we automatically analyze weekly/monthly data as it becomes available? b Can The system compare the ' month to month ' results and patterns by itself? c) Can We get automatic pattern detection over time, every week or month? The system should analyze data as it becomes available every week or month and perform analysis, on-going Ng the key items and influence factors that impact significant changes. The incremental analysis should is performed automatically in the background, informing the user of significant trends and The underlying causes. Question 8:data HAndling a) How much of the data can the system deal with? b Can It work directly on our database, or does we need to extract data? c) If It works in extracts, how does we know that some patterns are not missed? The system should handle moderate to large volumes of data on a powerful server--of course, large data volumes should no T is expected to is managed on small servers. The system should work directly on the SQL database, without extracts so this patterns not are and missed is I Mproved. Question 9:integration a) how would it integrate into our computing environment? B would it just work on our existing SQL database? C how easily would the system work on our intranet? The system should run smoothly on existing open server platforms (e.g. Unix) and popular DBMS engines (e.g. Oracle, Sybase Informix, etc.) On the server. The system should present results to users on the corporate intranet. The absence of data conditioning requirements and extract files to make integration much EasiEr. Question 10:support Staff a) What Staff do I need to keep this system installed and running? b How do we have support and training to get started? c) What happens after we install the system? After the initial system design, the support personnel for the system should is kept minimal. One database administrator should be able to manage the DBMS, and one analyst should occasionally help at setting up disco Very models, etc. Thereafter, business users should be able to use the system on their own. There should is no need for a large number of resident support analyst to act as intermediaries for the business users. The top Ten Data Mining technical questionsthe Top Ten technical question should is asked by technical users about the arc Hitecture, power and the scalability of the system. They are:question 1:architecture A) How are computations distributed the client and the server? b is any data brought from the server to the client? c) Can the system run in a three tiered archItecture? The best option is for the discovery to take place entirely on the server. Any attempt to bring data to the client would seriously limit the applicability of the system to larger databases. The best architecture is a thin-client, three-tiered system This uses the power of a large server-based SQL engine but Rates on an intranet. Question 2:access to real Data a) does the "System work on" the real SQL database or on the samples and extracts? b If It samples or extracts, how do we know the it is accurate? c) If It builds flat files, who manages the activity and cleans up for on-going analyses, and how can it sample across SE Veral tables? The best option is for a data mining system to work on the real databases and isn't on samples, extracts and/or flat files. Working on the real database uses the SQL engine ' s power (e.g. parallel execution) and provide much more accurate results. And, the system should is able to access database tables in their native form, reaching across tables BY itself. Question 3:performance and scalability a) how large of a database can the system analyze? b How long does it take to perform discovery on a large database? c) Can the system run in parallel on a multi-processor server? The system should work on databases with a large number of records. It should derive its capabilities the "server" and "SQL engine, whenever possible. The system should is able to use the built-in parallelism of the SQL engine, but should the also is able to use multiple Ssors for its own parallel non-sql computations. Question 4:multi-table Databases a) does the system work on a single Table only or can it analyze multiple tables? b does the system need to perform a huge join to access all my tables? c) If It works in a single table, how can we feed it our existing data schema? The real world was full of multi-table databases which can don't be joined and meshed to a single view. In fact, the theory of normalization came about because DATA needs to is in the more than one table. Using a affront to a decade of work in database design. If you challenge the DBA's a really large database to put things in a single table you'll either get a laugh or a blank Stare-In many cases the database size would balloon beyond control. The system should is able to mine large multi-table databases from directly on the server. Question 5:multi-dimensional analysis A) does the system analyze data along a single dimension only? b How are multi-dimensional patterns discovered and expressed by the system? c) How do we specify the "dimensional structure" of our data to the system? The OLAP phenomenon has conclusively demonstrated that of the business world ' s data are not single-dimensional. Hence A data mining system should is able to automatically discover and patterns along multiple. In fact, there are many cases where no single dimensional view can correctly represent the semantics of influence because The InfluenCE ratios'll always is off regardless the how one aggregates. The Paper:olap & Data mining:bridging The Gap for a detailed discussion of this. Question 6:types and Classes of Patterns discovered a) how powerful and general are the Patterns the system can discover and express? b Can the system mix different pattern types, e.g influence and affinity patterns? c) Can The system discover time-based patterns and trends? The format of the patterns discovered by the system was very general and goes far beyond decision trees or simple affinitie S. The advantage the general rules discovered are far more powerful than decision trees. Decision trees are very limited in, they cannot find all the information in a database. Being rule-based keeps the system from being constrained to one part of a search spaces and makes sure that many more clust ERs and patterns are found-allowing the system to provide more information and better predictions. Question 7:system Initiative a) Does the system use its own initiative to perform discovery or are it guided by the user? b Can The system discover unexpected patterns by itself? c) Can the system start-up by itself on a weekly or monthly basis and perform discovery? In some cases the user has to interact and guide the system, e.g. builds a decision tree. However, a better approach is for the system to use it own initiative in the data mining process, forming hypothesis auto Matically based on the character of the data. The system should start-up by itself, selects the significant patterns in the data and filter the unimportant trends. The analyses should is done routinely on a weekly or monthly basis. Question 8:treatment of data Types a) Are all data Types handled in their own form or translated to other Types? b Can The system find numeric ranges into data by itself? c) do a large number of non-numeric values cause problems for the system? The system should manage all data types in a uniform manner and in their Native formats, i.e. numbers, dates and constants should remain, numbers and dates constants. Interesting ranges in the data should is discovered by the system, not requiring "number bin" construction by the user. A large number of constant values in the database should not choke the system. Question 9:data dependencies and hierarchies a) Can the system be told to about the functional dependencies in our database? b Does the system understand the concept of data hierarchy? C How does is the system use dependencies and/or hierarchies for discovery? The system should is capable of the using the functional (and other dependencies) that exist in a database. The use of this dependencies can significantly enhance the power of a discovery-in fact ignoring them can leads to Conf Usion. The system should understand the concept of hierarchy and should is able to use it for discovery along multiple . Question 10:flexibility and noise sensitivity a) How brittle are the system when deAling with noisy data? b How are you does the system cope with data exceptions and quality data? c) Can The system provide statements with flexible numeric ranges discovered through itself in the data? The system should not is sensitive to noise and should internally use fuzzy logic to smooth data brittleness. As the data gathers noise, the system should only to reduce the level of confidence associated with the results provided, not Suddenly change direction in discovery. However, the system should still produce the most significant findings from the data set, even if noise is present.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.