Data Preparation for Data Mining
Most data mining books focus on what various algorithms do, and how to apply them to data that's already prepared. This book provides a proven method to improve model performance or speed (or both) by applying data preparation techniques. It also provides a conceptual overview of the data exploration process for business managers and anyone new to the subject.
About the Download
Contains a suite of C source files. They can be compiled into a DOS-based, command-line-driven toolkit. A DOS command-line compiled version is included as dp10.exe.
Four datasets are provided: CREDIT, SHOE, CARS, HOUSE. These are based on or extracted from actually modeled datasets. They're prepared only inasmuch as they're in a format suitable to be read by the compiled demonstration code. Otherwise they're unprepared and contain all of the problems discussed in the book. Some of the datasets have types of problems discussed in the book but are not illustrated there.
Caveats
This is demo code only. No point-and-click interface.
It's not intended to have the functionality of a commercial product.
It's not intended to be fast, robust or fully optimized.
The toolkit requires the data to be in a specific format and to take a steering file.
Requires a PC running Windows 95 or later.
Questions about the Book
Email me: dpyle at model and mine dot com
