I've just released version 1.0.0 of category_encoders on pypi, you can check out the source here:
https://github.com/wdm0006/categorical_encoding
In two previous posts (here and here), we discussed and examined the differences between encoding methods for categorical variables. It turns out they all are a bit different and make different assumptions, and so you end up with different results from each. For a practitioner, all of them have a time and place when they are useful, and as such I've packaged them all into scikit-learn compatible transformers so that you can use them in your machine learning pipelines easily.
To install, just:
pip install category_encoders
Then to use:
from sklearn import linear_model, pipeline from category_encoders import HashingEncoder ppl = pipeline.Pipeline([ ('encoder', HashingEncoder(cols=[...])), ('clf', linear_model.LogisticRegression()) ])
Included in the library (see previous posts for more detail on them) are:
- Ordinal
- One-Hot
- Binary
- Helmert Contrast
- Sum Contrast
- Polynomial Contrast
- Backward Difference Contrast
- Simple Hashing Trick
So try it out, send me an issue on github if you run into any trouble, and if you'd like to contribute let me know.
https://github.com/wdm0006/categorical_encoding
The post Beyond One-Hot: Sklearn transformers and pip release appeared first on Will's Noise.