
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/compose/plot_column_transformer_mixed_types.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_compose_plot_column_transformer_mixed_types.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_compose_plot_column_transformer_mixed_types.py:


===================================
Column Transformer with Mixed Types
===================================

.. currentmodule:: sklearn

This example illustrates how to apply different preprocessing and feature
extraction pipelines to different subsets of features, using
:class:`~compose.ColumnTransformer`. This is particularly handy for the
case of datasets that contain heterogeneous data types, since we may want to
scale the numeric features and one-hot encode the categorical ones.

In this example, the numeric data is standard-scaled after mean-imputation. The
categorical data is one-hot encoded via ``OneHotEncoder``, which
creates a new category for missing values. We further reduce the dimensionality
by selecting categories using a chi-squared test.

In addition, we show two different ways to dispatch the columns to the
particular pre-processor: by column names and by column data types.

Finally, the preprocessing pipeline is integrated in a full prediction pipeline
using :class:`~pipeline.Pipeline`, together with a simple classification
model.

.. GENERATED FROM PYTHON SOURCE LINES 27-32

.. code-block:: default


    # Author: Pedro Morales <part.morales@gmail.com>
    #
    # License: BSD 3 clause








.. GENERATED FROM PYTHON SOURCE LINES 33-46

.. code-block:: default

    import numpy as np

    from sklearn.compose import ColumnTransformer
    from sklearn.datasets import fetch_openml
    from sklearn.feature_selection import SelectPercentile, chi2
    from sklearn.impute import SimpleImputer
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import RandomizedSearchCV, train_test_split
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import OneHotEncoder, StandardScaler

    np.random.seed(0)








.. GENERATED FROM PYTHON SOURCE LINES 47-48

Load data from https://www.openml.org/d/40945

.. GENERATED FROM PYTHON SOURCE LINES 48-56

.. code-block:: default

    X, y = fetch_openml(
        "titanic", version=1, as_frame=True, return_X_y=True, parser="pandas"
    )

    # Alternatively X and y can be obtained directly from the frame attribute:
    # X = titanic.frame.drop('survived', axis=1)
    # y = titanic.frame['survived']








.. GENERATED FROM PYTHON SOURCE LINES 57-75

Use ``ColumnTransformer`` by selecting column by names

We will train our classifier with the following features:

Numeric Features:

* ``age``: float;
* ``fare``: float.

Categorical Features:

* ``embarked``: categories encoded as strings ``{'C', 'S', 'Q'}``;
* ``sex``: categories encoded as strings ``{'female', 'male'}``;
* ``pclass``: ordinal integers ``{1, 2, 3}``.

We create the preprocessing pipelines for both numeric and categorical data.
Note that ``pclass`` could either be treated as a categorical or numeric
feature.

.. GENERATED FROM PYTHON SOURCE LINES 75-95

.. code-block:: default


    numeric_features = ["age", "fare"]
    numeric_transformer = Pipeline(
        steps=[("imputer", SimpleImputer(strategy="median")), ("scaler", StandardScaler())]
    )

    categorical_features = ["embarked", "sex", "pclass"]
    categorical_transformer = Pipeline(
        steps=[
            ("encoder", OneHotEncoder(handle_unknown="ignore")),
            ("selector", SelectPercentile(chi2, percentile=50)),
        ]
    )
    preprocessor = ColumnTransformer(
        transformers=[
            ("num", numeric_transformer, numeric_features),
            ("cat", categorical_transformer, categorical_features),
        ]
    )








.. GENERATED FROM PYTHON SOURCE LINES 96-98

Append classifier to preprocessing pipeline.
Now we have a full prediction pipeline.

.. GENERATED FROM PYTHON SOURCE LINES 98-107

.. code-block:: default

    clf = Pipeline(
        steps=[("preprocessor", preprocessor), ("classifier", LogisticRegression())]
    )

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

    clf.fit(X_train, y_train)
    print("model score: %.3f" % clf.score(X_test, y_test))





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    model score: 0.798




.. GENERATED FROM PYTHON SOURCE LINES 108-112

HTML representation of ``Pipeline`` (display diagram)

When the ``Pipeline`` is printed out in a jupyter notebook an HTML
representation of the estimator is displayed:

.. GENERATED FROM PYTHON SOURCE LINES 112-114

.. code-block:: default

    clf






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-container-id-60 {color: black;}#sk-container-id-60 pre{padding: 0;}#sk-container-id-60 div.sk-toggleable {background-color: white;}#sk-container-id-60 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-60 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-60 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-60 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-60 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-60 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-60 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-60 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-60 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-60 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-60 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-60 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-60 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-60 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-60 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-60 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-60 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-60 div.sk-item {position: relative;z-index: 1;}#sk-container-id-60 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-60 div.sk-item::before, #sk-container-id-60 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-60 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-60 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-60 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-60 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-60 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-60 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-60 div.sk-label-container {text-align: center;}#sk-container-id-60 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-60 div.sk-text-repr-fallback {display: none;}</style><div id="sk-container-id-60" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;preprocessor&#x27;,
                     ColumnTransformer(transformers=[(&#x27;num&#x27;,
                                                      Pipeline(steps=[(&#x27;imputer&#x27;,
                                                                       SimpleImputer(strategy=&#x27;median&#x27;)),
                                                                      (&#x27;scaler&#x27;,
                                                                       StandardScaler())]),
                                                      [&#x27;age&#x27;, &#x27;fare&#x27;]),
                                                     (&#x27;cat&#x27;,
                                                      Pipeline(steps=[(&#x27;encoder&#x27;,
                                                                       OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                                                                      (&#x27;selector&#x27;,
                                                                       SelectPercentile(percentile=50,
                                                                                        score_func=&lt;function chi2 at 0x7fca8a172e80&gt;))]),
                                                      [&#x27;embarked&#x27;, &#x27;sex&#x27;,
                                                       &#x27;pclass&#x27;])])),
                    (&#x27;classifier&#x27;, LogisticRegression())])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-248" type="checkbox" ><label for="sk-estimator-id-248" class="sk-toggleable__label sk-toggleable__label-arrow">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[(&#x27;preprocessor&#x27;,
                     ColumnTransformer(transformers=[(&#x27;num&#x27;,
                                                      Pipeline(steps=[(&#x27;imputer&#x27;,
                                                                       SimpleImputer(strategy=&#x27;median&#x27;)),
                                                                      (&#x27;scaler&#x27;,
                                                                       StandardScaler())]),
                                                      [&#x27;age&#x27;, &#x27;fare&#x27;]),
                                                     (&#x27;cat&#x27;,
                                                      Pipeline(steps=[(&#x27;encoder&#x27;,
                                                                       OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                                                                      (&#x27;selector&#x27;,
                                                                       SelectPercentile(percentile=50,
                                                                                        score_func=&lt;function chi2 at 0x7fca8a172e80&gt;))]),
                                                      [&#x27;embarked&#x27;, &#x27;sex&#x27;,
                                                       &#x27;pclass&#x27;])])),
                    (&#x27;classifier&#x27;, LogisticRegression())])</pre></div></div></div><div class="sk-serial"><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-249" type="checkbox" ><label for="sk-estimator-id-249" class="sk-toggleable__label sk-toggleable__label-arrow">preprocessor: ColumnTransformer</label><div class="sk-toggleable__content"><pre>ColumnTransformer(transformers=[(&#x27;num&#x27;,
                                     Pipeline(steps=[(&#x27;imputer&#x27;,
                                                      SimpleImputer(strategy=&#x27;median&#x27;)),
                                                     (&#x27;scaler&#x27;, StandardScaler())]),
                                     [&#x27;age&#x27;, &#x27;fare&#x27;]),
                                    (&#x27;cat&#x27;,
                                     Pipeline(steps=[(&#x27;encoder&#x27;,
                                                      OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                                                     (&#x27;selector&#x27;,
                                                      SelectPercentile(percentile=50,
                                                                       score_func=&lt;function chi2 at 0x7fca8a172e80&gt;))]),
                                     [&#x27;embarked&#x27;, &#x27;sex&#x27;, &#x27;pclass&#x27;])])</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-250" type="checkbox" ><label for="sk-estimator-id-250" class="sk-toggleable__label sk-toggleable__label-arrow">num</label><div class="sk-toggleable__content"><pre>[&#x27;age&#x27;, &#x27;fare&#x27;]</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-251" type="checkbox" ><label for="sk-estimator-id-251" class="sk-toggleable__label sk-toggleable__label-arrow">SimpleImputer</label><div class="sk-toggleable__content"><pre>SimpleImputer(strategy=&#x27;median&#x27;)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-252" type="checkbox" ><label for="sk-estimator-id-252" class="sk-toggleable__label sk-toggleable__label-arrow">StandardScaler</label><div class="sk-toggleable__content"><pre>StandardScaler()</pre></div></div></div></div></div></div></div></div><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-253" type="checkbox" ><label for="sk-estimator-id-253" class="sk-toggleable__label sk-toggleable__label-arrow">cat</label><div class="sk-toggleable__content"><pre>[&#x27;embarked&#x27;, &#x27;sex&#x27;, &#x27;pclass&#x27;]</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-254" type="checkbox" ><label for="sk-estimator-id-254" class="sk-toggleable__label sk-toggleable__label-arrow">OneHotEncoder</label><div class="sk-toggleable__content"><pre>OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-255" type="checkbox" ><label for="sk-estimator-id-255" class="sk-toggleable__label sk-toggleable__label-arrow">SelectPercentile</label><div class="sk-toggleable__content"><pre>SelectPercentile(percentile=50, score_func=&lt;function chi2 at 0x7fca8a172e80&gt;)</pre></div></div></div></div></div></div></div></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-256" type="checkbox" ><label for="sk-estimator-id-256" class="sk-toggleable__label sk-toggleable__label-arrow">LogisticRegression</label><div class="sk-toggleable__content"><pre>LogisticRegression()</pre></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 115-123

Use ``ColumnTransformer`` by selecting column by data types

When dealing with a cleaned dataset, the preprocessing can be automatic by
using the data types of the column to decide whether to treat a column as a
numerical or categorical feature.
:func:`sklearn.compose.make_column_selector` gives this possibility.
First, let's only select a subset of columns to simplify our
example.

.. GENERATED FROM PYTHON SOURCE LINES 123-127

.. code-block:: default


    subset_feature = ["embarked", "sex", "pclass", "age", "fare"]
    X_train, X_test = X_train[subset_feature], X_test[subset_feature]








.. GENERATED FROM PYTHON SOURCE LINES 128-129

Then, we introspect the information regarding each column data type.

.. GENERATED FROM PYTHON SOURCE LINES 129-132

.. code-block:: default


    X_train.info()





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <class 'pandas.core.frame.DataFrame'>
    Index: 1047 entries, 1118 to 684
    Data columns (total 5 columns):
     #   Column    Non-Null Count  Dtype   
    ---  ------    --------------  -----   
     0   embarked  1045 non-null   category
     1   sex       1047 non-null   category
     2   pclass    1047 non-null   int64   
     3   age       841 non-null    float64 
     4   fare      1046 non-null   float64 
    dtypes: category(2), float64(2), int64(1)
    memory usage: 35.0 KB




.. GENERATED FROM PYTHON SOURCE LINES 133-138

We can observe that the `embarked` and `sex` columns were tagged as
`category` columns when loading the data with ``fetch_openml``. Therefore, we
can use this information to dispatch the categorical columns to the
``categorical_transformer`` and the remaining columns to the
``numerical_transformer``.

.. GENERATED FROM PYTHON SOURCE LINES 140-145

.. note:: In practice, you will have to handle yourself the column data type.
   If you want some columns to be considered as `category`, you will have to
   convert them into categorical columns. If you are using pandas, you can
   refer to their documentation regarding `Categorical data
   <https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html>`_.

.. GENERATED FROM PYTHON SOURCE LINES 145-163

.. code-block:: default


    from sklearn.compose import make_column_selector as selector

    preprocessor = ColumnTransformer(
        transformers=[
            ("num", numeric_transformer, selector(dtype_exclude="category")),
            ("cat", categorical_transformer, selector(dtype_include="category")),
        ]
    )
    clf = Pipeline(
        steps=[("preprocessor", preprocessor), ("classifier", LogisticRegression())]
    )


    clf.fit(X_train, y_train)
    print("model score: %.3f" % clf.score(X_test, y_test))
    clf





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    model score: 0.798


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-container-id-61 {color: black;}#sk-container-id-61 pre{padding: 0;}#sk-container-id-61 div.sk-toggleable {background-color: white;}#sk-container-id-61 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-61 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-61 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-61 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-61 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-61 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-61 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-61 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-61 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-61 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-61 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-61 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-61 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-61 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-61 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-61 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-61 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-61 div.sk-item {position: relative;z-index: 1;}#sk-container-id-61 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-61 div.sk-item::before, #sk-container-id-61 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-61 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-61 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-61 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-61 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-61 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-61 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-61 div.sk-label-container {text-align: center;}#sk-container-id-61 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-61 div.sk-text-repr-fallback {display: none;}</style><div id="sk-container-id-61" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;preprocessor&#x27;,
                     ColumnTransformer(transformers=[(&#x27;num&#x27;,
                                                      Pipeline(steps=[(&#x27;imputer&#x27;,
                                                                       SimpleImputer(strategy=&#x27;median&#x27;)),
                                                                      (&#x27;scaler&#x27;,
                                                                       StandardScaler())]),
                                                      &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fca54c8a850&gt;),
                                                     (&#x27;cat&#x27;,
                                                      Pipeline(steps=[(&#x27;encoder&#x27;,
                                                                       OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                                                                      (&#x27;selector&#x27;,
                                                                       SelectPercentile(percentile=50,
                                                                                        score_func=&lt;function chi2 at 0x7fca8a172e80&gt;))]),
                                                      &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fcadeb2e2d0&gt;)])),
                    (&#x27;classifier&#x27;, LogisticRegression())])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-257" type="checkbox" ><label for="sk-estimator-id-257" class="sk-toggleable__label sk-toggleable__label-arrow">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[(&#x27;preprocessor&#x27;,
                     ColumnTransformer(transformers=[(&#x27;num&#x27;,
                                                      Pipeline(steps=[(&#x27;imputer&#x27;,
                                                                       SimpleImputer(strategy=&#x27;median&#x27;)),
                                                                      (&#x27;scaler&#x27;,
                                                                       StandardScaler())]),
                                                      &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fca54c8a850&gt;),
                                                     (&#x27;cat&#x27;,
                                                      Pipeline(steps=[(&#x27;encoder&#x27;,
                                                                       OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                                                                      (&#x27;selector&#x27;,
                                                                       SelectPercentile(percentile=50,
                                                                                        score_func=&lt;function chi2 at 0x7fca8a172e80&gt;))]),
                                                      &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fcadeb2e2d0&gt;)])),
                    (&#x27;classifier&#x27;, LogisticRegression())])</pre></div></div></div><div class="sk-serial"><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-258" type="checkbox" ><label for="sk-estimator-id-258" class="sk-toggleable__label sk-toggleable__label-arrow">preprocessor: ColumnTransformer</label><div class="sk-toggleable__content"><pre>ColumnTransformer(transformers=[(&#x27;num&#x27;,
                                     Pipeline(steps=[(&#x27;imputer&#x27;,
                                                      SimpleImputer(strategy=&#x27;median&#x27;)),
                                                     (&#x27;scaler&#x27;, StandardScaler())]),
                                     &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fca54c8a850&gt;),
                                    (&#x27;cat&#x27;,
                                     Pipeline(steps=[(&#x27;encoder&#x27;,
                                                      OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                                                     (&#x27;selector&#x27;,
                                                      SelectPercentile(percentile=50,
                                                                       score_func=&lt;function chi2 at 0x7fca8a172e80&gt;))]),
                                     &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fcadeb2e2d0&gt;)])</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-259" type="checkbox" ><label for="sk-estimator-id-259" class="sk-toggleable__label sk-toggleable__label-arrow">num</label><div class="sk-toggleable__content"><pre>&lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fca54c8a850&gt;</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-260" type="checkbox" ><label for="sk-estimator-id-260" class="sk-toggleable__label sk-toggleable__label-arrow">SimpleImputer</label><div class="sk-toggleable__content"><pre>SimpleImputer(strategy=&#x27;median&#x27;)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-261" type="checkbox" ><label for="sk-estimator-id-261" class="sk-toggleable__label sk-toggleable__label-arrow">StandardScaler</label><div class="sk-toggleable__content"><pre>StandardScaler()</pre></div></div></div></div></div></div></div></div><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-262" type="checkbox" ><label for="sk-estimator-id-262" class="sk-toggleable__label sk-toggleable__label-arrow">cat</label><div class="sk-toggleable__content"><pre>&lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fcadeb2e2d0&gt;</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-263" type="checkbox" ><label for="sk-estimator-id-263" class="sk-toggleable__label sk-toggleable__label-arrow">OneHotEncoder</label><div class="sk-toggleable__content"><pre>OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-264" type="checkbox" ><label for="sk-estimator-id-264" class="sk-toggleable__label sk-toggleable__label-arrow">SelectPercentile</label><div class="sk-toggleable__content"><pre>SelectPercentile(percentile=50, score_func=&lt;function chi2 at 0x7fca8a172e80&gt;)</pre></div></div></div></div></div></div></div></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-265" type="checkbox" ><label for="sk-estimator-id-265" class="sk-toggleable__label sk-toggleable__label-arrow">LogisticRegression</label><div class="sk-toggleable__content"><pre>LogisticRegression()</pre></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 164-167

The resulting score is not exactly the same as the one from the previous
pipeline because the dtype-based selector treats the ``pclass`` column as
a numeric feature instead of a categorical feature as previously:

.. GENERATED FROM PYTHON SOURCE LINES 167-170

.. code-block:: default


    selector(dtype_exclude="category")(X_train)





.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    ['pclass', 'age', 'fare']



.. GENERATED FROM PYTHON SOURCE LINES 171-174

.. code-block:: default


    selector(dtype_include="category")(X_train)





.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    ['embarked', 'sex']



.. GENERATED FROM PYTHON SOURCE LINES 175-187

Using the prediction pipeline in a grid search

Grid search can also be performed on the different preprocessing steps
defined in the ``ColumnTransformer`` object, together with the classifier's
hyperparameters as part of the ``Pipeline``.
We will search for both the imputer strategy of the numeric preprocessing
and the regularization parameter of the logistic regression using
:class:`~sklearn.model_selection.RandomizedSearchCV`. This
hyperparameter search randomly selects a fixed number of parameter
settings configured by `n_iter`. Alternatively, one can use
:class:`~sklearn.model_selection.GridSearchCV` but the cartesian product of
the parameter space will be evaluated.

.. GENERATED FROM PYTHON SOURCE LINES 187-197

.. code-block:: default


    param_grid = {
        "preprocessor__num__imputer__strategy": ["mean", "median"],
        "preprocessor__cat__selector__percentile": [10, 30, 50, 70],
        "classifier__C": [0.1, 1.0, 10, 100],
    }

    search_cv = RandomizedSearchCV(clf, param_grid, n_iter=10, random_state=0)
    search_cv






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-container-id-62 {color: black;}#sk-container-id-62 pre{padding: 0;}#sk-container-id-62 div.sk-toggleable {background-color: white;}#sk-container-id-62 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-62 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-62 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-62 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-62 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-62 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-62 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-62 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-62 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-62 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-62 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-62 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-62 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-62 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-62 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-62 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-62 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-62 div.sk-item {position: relative;z-index: 1;}#sk-container-id-62 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-62 div.sk-item::before, #sk-container-id-62 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-62 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-62 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-62 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-62 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-62 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-62 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-62 div.sk-label-container {text-align: center;}#sk-container-id-62 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-62 div.sk-text-repr-fallback {display: none;}</style><div id="sk-container-id-62" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>RandomizedSearchCV(estimator=Pipeline(steps=[(&#x27;preprocessor&#x27;,
                                                  ColumnTransformer(transformers=[(&#x27;num&#x27;,
                                                                                   Pipeline(steps=[(&#x27;imputer&#x27;,
                                                                                                    SimpleImputer(strategy=&#x27;median&#x27;)),
                                                                                                   (&#x27;scaler&#x27;,
                                                                                                    StandardScaler())]),
                                                                                   &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fca54c8a850&gt;),
                                                                                  (&#x27;cat&#x27;,
                                                                                   Pipeline(steps=[(&#x27;encoder&#x27;,
                                                                                                    OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                                                                                                   (&#x27;s...
                                                                                                                     score_func=&lt;function chi2 at 0x7fca8a172e80&gt;))]),
                                                                                   &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fcadeb2e2d0&gt;)])),
                                                 (&#x27;classifier&#x27;,
                                                  LogisticRegression())]),
                       param_distributions={&#x27;classifier__C&#x27;: [0.1, 1.0, 10, 100],
                                            &#x27;preprocessor__cat__selector__percentile&#x27;: [10,
                                                                                        30,
                                                                                        50,
                                                                                        70],
                                            &#x27;preprocessor__num__imputer__strategy&#x27;: [&#x27;mean&#x27;,
                                                                                     &#x27;median&#x27;]},
                       random_state=0)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-266" type="checkbox" ><label for="sk-estimator-id-266" class="sk-toggleable__label sk-toggleable__label-arrow">RandomizedSearchCV</label><div class="sk-toggleable__content"><pre>RandomizedSearchCV(estimator=Pipeline(steps=[(&#x27;preprocessor&#x27;,
                                                  ColumnTransformer(transformers=[(&#x27;num&#x27;,
                                                                                   Pipeline(steps=[(&#x27;imputer&#x27;,
                                                                                                    SimpleImputer(strategy=&#x27;median&#x27;)),
                                                                                                   (&#x27;scaler&#x27;,
                                                                                                    StandardScaler())]),
                                                                                   &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fca54c8a850&gt;),
                                                                                  (&#x27;cat&#x27;,
                                                                                   Pipeline(steps=[(&#x27;encoder&#x27;,
                                                                                                    OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                                                                                                   (&#x27;s...
                                                                                                                     score_func=&lt;function chi2 at 0x7fca8a172e80&gt;))]),
                                                                                   &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fcadeb2e2d0&gt;)])),
                                                 (&#x27;classifier&#x27;,
                                                  LogisticRegression())]),
                       param_distributions={&#x27;classifier__C&#x27;: [0.1, 1.0, 10, 100],
                                            &#x27;preprocessor__cat__selector__percentile&#x27;: [10,
                                                                                        30,
                                                                                        50,
                                                                                        70],
                                            &#x27;preprocessor__num__imputer__strategy&#x27;: [&#x27;mean&#x27;,
                                                                                     &#x27;median&#x27;]},
                       random_state=0)</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-267" type="checkbox" ><label for="sk-estimator-id-267" class="sk-toggleable__label sk-toggleable__label-arrow">estimator: Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[(&#x27;preprocessor&#x27;,
                     ColumnTransformer(transformers=[(&#x27;num&#x27;,
                                                      Pipeline(steps=[(&#x27;imputer&#x27;,
                                                                       SimpleImputer(strategy=&#x27;median&#x27;)),
                                                                      (&#x27;scaler&#x27;,
                                                                       StandardScaler())]),
                                                      &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fca54c8a850&gt;),
                                                     (&#x27;cat&#x27;,
                                                      Pipeline(steps=[(&#x27;encoder&#x27;,
                                                                       OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                                                                      (&#x27;selector&#x27;,
                                                                       SelectPercentile(percentile=50,
                                                                                        score_func=&lt;function chi2 at 0x7fca8a172e80&gt;))]),
                                                      &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fcadeb2e2d0&gt;)])),
                    (&#x27;classifier&#x27;, LogisticRegression())])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-serial"><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-268" type="checkbox" ><label for="sk-estimator-id-268" class="sk-toggleable__label sk-toggleable__label-arrow">preprocessor: ColumnTransformer</label><div class="sk-toggleable__content"><pre>ColumnTransformer(transformers=[(&#x27;num&#x27;,
                                     Pipeline(steps=[(&#x27;imputer&#x27;,
                                                      SimpleImputer(strategy=&#x27;median&#x27;)),
                                                     (&#x27;scaler&#x27;, StandardScaler())]),
                                     &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fca54c8a850&gt;),
                                    (&#x27;cat&#x27;,
                                     Pipeline(steps=[(&#x27;encoder&#x27;,
                                                      OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                                                     (&#x27;selector&#x27;,
                                                      SelectPercentile(percentile=50,
                                                                       score_func=&lt;function chi2 at 0x7fca8a172e80&gt;))]),
                                     &lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fcadeb2e2d0&gt;)])</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-269" type="checkbox" ><label for="sk-estimator-id-269" class="sk-toggleable__label sk-toggleable__label-arrow">num</label><div class="sk-toggleable__content"><pre>&lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fca54c8a850&gt;</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-270" type="checkbox" ><label for="sk-estimator-id-270" class="sk-toggleable__label sk-toggleable__label-arrow">SimpleImputer</label><div class="sk-toggleable__content"><pre>SimpleImputer(strategy=&#x27;median&#x27;)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-271" type="checkbox" ><label for="sk-estimator-id-271" class="sk-toggleable__label sk-toggleable__label-arrow">StandardScaler</label><div class="sk-toggleable__content"><pre>StandardScaler()</pre></div></div></div></div></div></div></div></div><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-272" type="checkbox" ><label for="sk-estimator-id-272" class="sk-toggleable__label sk-toggleable__label-arrow">cat</label><div class="sk-toggleable__content"><pre>&lt;sklearn.compose._column_transformer.make_column_selector object at 0x7fcadeb2e2d0&gt;</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-273" type="checkbox" ><label for="sk-estimator-id-273" class="sk-toggleable__label sk-toggleable__label-arrow">OneHotEncoder</label><div class="sk-toggleable__content"><pre>OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-274" type="checkbox" ><label for="sk-estimator-id-274" class="sk-toggleable__label sk-toggleable__label-arrow">SelectPercentile</label><div class="sk-toggleable__content"><pre>SelectPercentile(percentile=50, score_func=&lt;function chi2 at 0x7fca8a172e80&gt;)</pre></div></div></div></div></div></div></div></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-275" type="checkbox" ><label for="sk-estimator-id-275" class="sk-toggleable__label sk-toggleable__label-arrow">LogisticRegression</label><div class="sk-toggleable__content"><pre>LogisticRegression()</pre></div></div></div></div></div></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 198-201

Calling 'fit' triggers the cross-validated search for the best
hyper-parameters combination:


.. GENERATED FROM PYTHON SOURCE LINES 201-206

.. code-block:: default

    search_cv.fit(X_train, y_train)

    print("Best params:")
    print(search_cv.best_params_)





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Best params:
    {'preprocessor__num__imputer__strategy': 'mean', 'preprocessor__cat__selector__percentile': 30, 'classifier__C': 100}




.. GENERATED FROM PYTHON SOURCE LINES 207-208

The internal cross-validation scores obtained by those parameters is:

.. GENERATED FROM PYTHON SOURCE LINES 208-210

.. code-block:: default

    print(f"Internal CV score: {search_cv.best_score_:.3f}")





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Internal CV score: 0.786




.. GENERATED FROM PYTHON SOURCE LINES 211-212

We can also introspect the top grid search results as a pandas dataframe:

.. GENERATED FROM PYTHON SOURCE LINES 212-226

.. code-block:: default

    import pandas as pd

    cv_results = pd.DataFrame(search_cv.cv_results_)
    cv_results = cv_results.sort_values("mean_test_score", ascending=False)
    cv_results[
        [
            "mean_test_score",
            "std_test_score",
            "param_preprocessor__num__imputer__strategy",
            "param_preprocessor__cat__selector__percentile",
            "param_classifier__C",
        ]
    ].head(5)






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>mean_test_score</th>
          <th>std_test_score</th>
          <th>param_preprocessor__num__imputer__strategy</th>
          <th>param_preprocessor__cat__selector__percentile</th>
          <th>param_classifier__C</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>7</th>
          <td>0.786015</td>
          <td>0.031020</td>
          <td>mean</td>
          <td>30</td>
          <td>100</td>
        </tr>
        <tr>
          <th>0</th>
          <td>0.785063</td>
          <td>0.030498</td>
          <td>median</td>
          <td>30</td>
          <td>1.0</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.785063</td>
          <td>0.030498</td>
          <td>mean</td>
          <td>30</td>
          <td>1.0</td>
        </tr>
        <tr>
          <th>4</th>
          <td>0.785063</td>
          <td>0.030498</td>
          <td>mean</td>
          <td>10</td>
          <td>10</td>
        </tr>
        <tr>
          <th>3</th>
          <td>0.783149</td>
          <td>0.030462</td>
          <td>mean</td>
          <td>30</td>
          <td>0.1</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 227-231

The best hyper-parameters have be used to re-fit a final model on the full
training set. We can evaluate that final model on held out test data that was
not used for hyperparameter tuning.


.. GENERATED FROM PYTHON SOURCE LINES 231-235

.. code-block:: default

    print(
        "accuracy of the best model from randomized search: "
        f"{search_cv.score(X_test, y_test):.3f}"
    )




.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    accuracy of the best model from randomized search: 0.798





.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.738 seconds)


.. _sphx_glr_download_auto_examples_compose_plot_column_transformer_mixed_types.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.3.X?urlpath=lab/tree/notebooks/auto_examples/compose/plot_column_transformer_mixed_types.ipynb
        :alt: Launch binder
        :width: 150 px



    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_column_transformer_mixed_types.py <plot_column_transformer_mixed_types.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_column_transformer_mixed_types.ipynb <plot_column_transformer_mixed_types.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
