sách gpt4 ai đã đi

Đường ống: nhiều bộ phân loại?

In lại 作者:行者123 更新时间:2023-12-04 16:57:52 34 4
mua khóa gpt4 Nike

我在 Python 中阅读了有关 Pipelines 和 GridSearchCV 的以下示例:
http://www.davidsbatista.net/blog/2017/04/01/document_classification/

逻辑回归:

pipeline = Pipeline([
('tfidf', TfidfVectorizer(stop_words=stop_words)),
('clf', OneVsRestClassifier(LogisticRegression(solver='sag')),
])
parameters = {
'tfidf__max_df': (0.25, 0.5, 0.75),
'tfidf__ngram_range': [(1, 1), (1, 2), (1, 3)],
"clf__estimator__C": [0.01, 0.1, 1],
"clf__estimator__class_weight": ['balanced', None],
}

支持向量机:
pipeline = Pipeline([
('tfidf', TfidfVectorizer(stop_words=stop_words)),
('clf', OneVsRestClassifier(LinearSVC()),
])
parameters = {
'tfidf__max_df': (0.25, 0.5, 0.75),
'tfidf__ngram_range': [(1, 1), (1, 2), (1, 3)],
"clf__estimator__C": [0.01, 0.1, 1],
"clf__estimator__class_weight": ['balanced', None],
}

有没有一种方法可以将 Logistic 回归和 SVM 合并到一个 Pipeline 中?比如说,我有一个 TfidfVectorizer 并且喜欢针对多个分类器进行测试,然后每个分类器都输出最佳模型/参数。

1 Câu trả lời

这是优化任何分类器和每个分类器的任何参数设置的简单方法。

创建适用于任何估算器的切换器类

from sklearn.base import BaseEstimator
class ClfSwitcher(BaseEstimator):

def __init__(
self,
estimator = SGDClassifier(),
):
"""
A Custom BaseEstimator that can switch between classifiers.
:param estimator: sklearn object - The classifier
"""

self.estimator = estimator


def fit(self, X, y=None, **kwargs):
self.estimator.fit(X, y)
return self


def predict(self, X, y=None):
return self.estimator.predict(X)


def predict_proba(self, X):
return self.estimator.predict_proba(X)


def score(self, X, y):
return self.estimator.score(X, y)

现在您可以为 estimator 参数传入任何内容。您可以为您传入的任何估算器优化任何参数,如下所示:

执行超参数优化
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

pipeline = Pipeline([
('tfidf', TfidfVectorizer()),
('clf', ClfSwitcher()),
])

parameters = [
{
'clf__estimator': [SGDClassifier()], # SVM if hinge loss / logreg if log loss
'tfidf__max_df': (0.25, 0.5, 0.75, 1.0),
'tfidf__stop_words': ['english', None],
'clf__estimator__penalty': ('l2', 'elasticnet', 'l1'),
'clf__estimator__max_iter': [50, 80],
'clf__estimator__tol': [1e-4],
'clf__estimator__loss': ['hinge', 'log', 'modified_huber'],
},
{
'clf__estimator': [MultinomialNB()],
'tfidf__max_df': (0.25, 0.5, 0.75, 1.0),
'tfidf__stop_words': [None],
'clf__estimator__alpha': (1e-2, 1e-3, 1e-1),
},
]

gscv = GridSearchCV(pipeline, parameters, cv=5, n_jobs=12, return_train_score=False, verbose=3)
gscv.fit(train_data, train_labels)

如何解读 clf__estimator__loss clf__estimator__loss被解释为 loss任何参数 estimator是,其中 estimator = SGDClassifier()在最上面的例子中,它本身就是 clf 的参数这是一个 ClfSwitcher目的。

关于python - 管道:多个分类器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50285973/

34 4 0
行者123
Hồ sơ cá nhân

Tôi là một lập trình viên xuất sắc, rất giỏi!

Nhận phiếu giảm giá Didi Taxi miễn phí
Mã giảm giá Didi Taxi
Giấy chứng nhận ICP Bắc Kinh số 000000
Hợp tác quảng cáo: 1813099741@qq.com 6ren.com