qianfan.evaluation package

Library aimed to helping developer to evaluate their model on qianfan

class qianfan.evaluation.EvaluationManager(*, local_evaluators: Optional[List[LocalEvaluator]] = None, qianfan_evaluators: Optional[List[QianfanEvaluator]] = None, task_id: Optional[str] = None)[source]

Bases: BaseModel

logic control center of evaluation

eval(llms: Sequence[Union[Model, Service]], dataset: Dataset, **kwargs: Any) Optional[EvaluationResult][source]

Evaluate the performance of models on the dataset.

Args:
llms (List[Union[Model, Service]]):

List of models or service to be evaluated.

dataset (Dataset):

The dataset on which models will be evaluated.

**kwargs (Any):

Other keyword arguments.

Returns:

Optional[EvaluationResult]: Evaluation result of models on the dataset.

local_evaluators: Optional[List[LocalEvaluator]]
qianfan_evaluators: Optional[List[QianfanEvaluator]]
task_id: Optional[str]
class qianfan.evaluation.EvaluationResult(result_dataset: Optional[Dataset] = None, metrics: Optional[Dict[str, Dict[str, Any]]] = None)[source]

Bases: object

Evaluation Result

Submodules

qianfan.evaluation.consts module

constants of evaluation

qianfan.evaluation.evaluation_manager module

manager which manage whole procedure of evaluation

class qianfan.evaluation.evaluation_manager.EvaluationManager(*, local_evaluators: Optional[List[LocalEvaluator]] = None, qianfan_evaluators: Optional[List[QianfanEvaluator]] = None, task_id: Optional[str] = None)[source]

Bases: BaseModel

logic control center of evaluation

eval(llms: Sequence[Union[Model, Service]], dataset: Dataset, **kwargs: Any) Optional[EvaluationResult][source]

Evaluate the performance of models on the dataset.

Args:
llms (List[Union[Model, Service]]):

List of models or service to be evaluated.

dataset (Dataset):

The dataset on which models will be evaluated.

**kwargs (Any):

Other keyword arguments.

Returns:

Optional[EvaluationResult]: Evaluation result of models on the dataset.

local_evaluators: Optional[List[LocalEvaluator]]
qianfan_evaluators: Optional[List[QianfanEvaluator]]
task_id: Optional[str]

qianfan.evaluation.evaluation_result module

The result of a evaluation

class qianfan.evaluation.evaluation_result.EvaluationResult(result_dataset: Optional[Dataset] = None, metrics: Optional[Dict[str, Dict[str, Any]]] = None)[source]

Bases: object

Evaluation Result

qianfan.evaluation.evaluator module

collection of evaluator

class qianfan.evaluation.evaluator.Evaluator[source]

Bases: BaseModel, ABC

an class for evaluating single entry

abstract evaluate(input: Union[str, List[Dict[str, Any]]], reference: str, output: str) Dict[str, Any][source]

evaluate one entry

class qianfan.evaluation.evaluator.LocalEvaluator[source]

Bases: Evaluator, ABC

bass class for evaluator running locally

class qianfan.evaluation.evaluator.ManualEvaluatorDimension(*, dimension: str, description: Optional[str] = None)[source]

Bases: BaseModel

dimension used for manual mode

description: Optional[str]
dimension: str
class qianfan.evaluation.evaluator.OpenCompassLocalEvaluator[source]

Bases: LocalEvaluator

class qianfan.evaluation.evaluator.QianfanEvaluator[source]

Bases: Evaluator

empty implementation base class for qianfan evaluator

evaluate(input: Union[str, List[Dict[str, Any]]], reference: str, output: str) Dict[str, Any][source]

evaluate one entry

class qianfan.evaluation.evaluator.QianfanManualEvaluator(*, evaluation_dimensions: List[ManualEvaluatorDimension] = [ManualEvaluatorDimension(dimension='满意度', description=None)])[source]

Bases: QianfanEvaluator

qianfan manual evaluator config class

classmethod dimension_validation(input_dict: Any) Any[source]
evaluation_dimensions: List[ManualEvaluatorDimension]
class qianfan.evaluation.evaluator.QianfanRefereeEvaluator(*, app_id: int, prompt_metrics: str = '综合得分', prompt_steps: str = '\n1.仔细阅读所提供的问题,确保你理解问题的要求和背景。\n2.仔细阅读所提供的标准答案,确保你理解问题的标准答案\n3.阅读答案,并检查是否用词不当\n4.检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n', prompt_max_score: int = 5)[source]

Bases: QianfanEvaluator

qianfan referee evaluator config class

app_id: int
prompt_max_score: int
prompt_metrics: str
prompt_steps: str
class qianfan.evaluation.evaluator.QianfanRuleEvaluator(*, using_similarity: bool = False, using_accuracy: bool = False, stop_words: Optional[str] = None)[source]

Bases: QianfanEvaluator

qianfan rule evaluator config class

stop_words: Optional[str]
using_accuracy: bool
using_similarity: bool