Feature/generic ensemble#366
Conversation
…Tree, RandomForest and kNN only. 7 new methods, a lot of tests
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## development #366 +/- ##
===============================================
- Coverage 45.59% 44.27% -1.33%
===============================================
Files 93 96 +3
Lines 8034 8195 +161
===============================================
- Hits 3663 3628 -35
- Misses 4371 4567 +196 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Around 10 days of work. I'll fix 2 failing builds soon. |
|
wow this is great! thanks. it would be nice to have also #365 fixed with this so we can bump to v0.5.0 |
…ation warnings fixed. Also, the Generic Ensemble documentation examples are now ignored.
…. facepalm. Fmt again.
|
Well, I have done my work on #365 in accordance with how I personally got the idea. Few thoughts. Also, I did not really got the idea on how the changes being published. On website there is version 0.2 in the latest release info, and I haven't found any "news" page. Just would be cool to have an ability to show the work to the people. |
|
the changes go to crates.io and are used by anybody using You can see your past contributions in history |
|
what are all the ignore flags added in the doctests? |
this PR does 2 things:
Part 1
"The what and the why, mr Anderson?"
It allows a user to build his own custom ensemble models. More than that, since I used box dyn predictor, a user can even combine models of a different kind!
In my project I only use 18 kNNs, still I have been only interested in creation of universal ensemble, so... Again here is my attempt.
And here we come to two limitations
Allright, with that being said...
🔑 Key Features
🔄 Allows creation of heterogeneous predictor ensembles: mix KNN, Random Forest, Decision Tree are now the only supported models.
(Almost) any type implementing
Predictor<X, Y>can be a member.⚖️ Two voting strategies, uniform or weighted: simple majority or confidence-based aggregation.
Switch strategies at runtime with
set_voting_strategy(); weights are validated on insertion.🎛️ Dynamic enable/disable of members at runtime: toggle models without retraining.
Useful for A/B testing, fallback logic, or excluding underperforming models on-the-fly. My own idea!
🏷️ Metadata: descriptions, tags...: document and organize your ensemble.
Attach human-readable notes, group models by tags, I have no idea if anyone will use it, but implementing this was too fun and easy
⚖️ Set weights at anythime
Adjust voting influence with
set_weight()✂️ Feature slicing via
predict_using_names(): different inputs per model.Train models on disjoint feature subsets and combine predictions — ideal for multi-view learning.
Again, it was cruicial in my project, that's why I threw it into Smartcore, but unsure whether it is really useful
📊 Built-in scoring: quick accuracy evaluation with
score().Equivalent to
accuracy(y, predict(x))— and just for being more sklearn-ishDocumentation
📦 Model Management
🔄 Heterogeneous ensembles: Mix KNN, Random Forest, Decision Tree, SVM, or any custom model implementing
Predictor<X, Y>.No common base class required — trait-based composition.
🎯 Three ways to add models (3 public methods total for model management):
add(model)add_named(name, model)add_with_params(name?, model, weight?, desc?, tags?)🏷️ Rich metadata: Attach descriptions, tags, and voting weights to each member. Query voting weight via
weight(name).⚙️ Dynamic runtime control: Enable/disable individual models without retraining via
enable(),disable(),enabled(). Perfect for A/B testing, fallback logic, or excluding underperformers on-the-fly.🗳️ Voting Strategies
⚖️ Uniform or Weighted voting: Simple majority or confidence-based aggregation. Switch at runtime with
set_voting_strategy().🛡️ Rust-style strictness in Weighted mode:
🔧 Weight management: Set or update weights anytime via
set_weight(). Weights are validated on insertion and on strategy switch.🔮 Prediction & Evaluation
predict(&x)Xfor all modelspredict_using_names(&HashMap<String, X>)Xvia namescore(&x, &y) -> f64X+ labelsY📊 Built-in scoring:
score()returns accuracy in[0.0, 1.0]— equivalent toaccuracy(y, predict(x)), but convenient for cross-validation loops and hyperparameter tuning.✅ Type-safe predictions: All models in an ensemble must share the same
X: Array2<f64>andY: Array1<i32> + Clone, enforced at compile time via generics +PhantomData.🧰 Introspection & Utilities
🔍 Ensemble state:
names(),len(),is_empty(),strategy(),get_ensemble_info()— query structure and configuration anytime.🏷️ Metadata queries:
weight(name)— get voting weight for a member.🔄 Strategy switching:
set_voting_strategy()validates all weights when switching toWeighted, ensuring consistency.📚 Usage Guide: From Simple to Advanced
🎯 Scenario 1: The "Just Works" Way (3 lines)
Wanna go on ease? No problem! Just do:
✅ That's it. No weights, no names, no config.
🎯 Scenario 2: Name Your Models
Use
add_named()when you want explicit control and better observability in your ensemble. Meaningful names make it easier to:🎯 Scenario 3: Control Voting — Full Lifecycle
Step-by-step: Uniform → assign weights → switch to Weighted
🎯 Scenario 4: Feature Slicing — Different Inputs per Model
For advanced use-cases like training on different feature subsets (multi-view learning).
🎯 Scenario 5: Full Control — Metadata, Tags, Dynamic Management
🏭 Real-World Usage Patterns
Those are of my SAAN project.
Pattern 1: Auto-disable underperforming models
Pattern 2: Compare voting strategies on the same ensemble
Pattern 3: Dynamically add a strong model and boost its influence
📊 Interpreting Ensemble Logs
Again, the output shown just as an example from a side project:
🔍 How to read this:
Active=13/19means 6 models were disabled due to low precision💡 Pro tips:
get_ensemble_info()before/after major changesenabled()to verify which models actually contributed to a prediction🧪 Testing Philosophy
Our test suite covers:
add(),add_named(), auto-namespredict_using_names()with per-model inputsenable()/disable()affecting predictionsscore()validity across model additions/removalsAll tests use minimal, reproducible dummy data and verify both success and failure paths.
📋 Public API Summary
new(),with_strategy()add(),add_named(),add_with_params()set_weight(),set_description(),weight()enable(),disable(),enabled()predict(),predict_using_names(),score()names(),len(),is_empty(),strategy(),get_ensemble_info()set_voting_strategy()add_with_params()checksHashMapkeys; fails fastpredict_using_names()Array2trait enforces shape;Failederror on mismatchenabled()filter applied automatically inpredict()Ensemble<X, Y>enforce same input/output types at compile time🚀 What's Next? (Roadmap)
predict_proba()supportdescription()andtags()Nonewhen switching Weighted → UniformPart 2
Done:
Ready for review. 🦀✨