qianfan.evaluation package
Library aimed to helping developer to evaluate their model on qianfan
- class qianfan.evaluation.EvaluationManager(*, local_evaluators: Optional[List[LocalEvaluator]] = None, qianfan_evaluators: Optional[List[QianfanEvaluator]] = None)[source]
Bases:
BaseModellogic control center of evaluation
- eval(llms: List[Union[Model, Service]], dataset: Dataset, **kwargs: Any) Optional[EvaluationResult][source]
Evaluate the performance of models on the dataset.
- Args:
- llms (List[Union[Model, Service]]):
List of models or service to be evaluated.
- dataset (Dataset):
The dataset on which models will be evaluated.
- **kwargs (Any):
Other keyword arguments.
- Returns:
Optional[EvaluationResult]: Evaluation result of models on the dataset.
- local_evaluators: Optional[List[LocalEvaluator]]
- qianfan_evaluators: Optional[List[QianfanEvaluator]]
- class qianfan.evaluation.EvaluationResult(result_dataset: Optional[Dataset] = None, metrics: Optional[Dict[str, Dict[str, Any]]] = None)[source]
Bases:
objectEvaluation Result
Submodules
qianfan.evaluation.consts module
constants of evaluation
qianfan.evaluation.evaluation_manager module
manager which manage whole procedure of evaluation
- class qianfan.evaluation.evaluation_manager.EvaluationManager(*, local_evaluators: Optional[List[LocalEvaluator]] = None, qianfan_evaluators: Optional[List[QianfanEvaluator]] = None)[source]
Bases:
BaseModellogic control center of evaluation
- eval(llms: List[Union[Model, Service]], dataset: Dataset, **kwargs: Any) Optional[EvaluationResult][source]
Evaluate the performance of models on the dataset.
- Args:
- llms (List[Union[Model, Service]]):
List of models or service to be evaluated.
- dataset (Dataset):
The dataset on which models will be evaluated.
- **kwargs (Any):
Other keyword arguments.
- Returns:
Optional[EvaluationResult]: Evaluation result of models on the dataset.
- local_evaluators: Optional[List[LocalEvaluator]]
- qianfan_evaluators: Optional[List[QianfanEvaluator]]
qianfan.evaluation.evaluation_result module
The result of a evaluation
qianfan.evaluation.evaluator module
collection of evaluator
- class qianfan.evaluation.evaluator.Evaluator[source]
Bases:
BaseModel,ABCan class for evaluating single entry
- class qianfan.evaluation.evaluator.LocalEvaluator[source]
Bases:
Evaluator,ABCbass class for evaluator running locally
- class qianfan.evaluation.evaluator.ManualEvaluatorDimension(*, dimension: str, description: Optional[str] = None)[source]
Bases:
BaseModeldimension used for manual mode
- description: Optional[str]
- dimension: str
- class qianfan.evaluation.evaluator.OpenCompassLocalEvaluator[source]
Bases:
LocalEvaluator
- class qianfan.evaluation.evaluator.QianfanEvaluator[source]
Bases:
Evaluatorempty implementation base class for qianfan evaluator
- class qianfan.evaluation.evaluator.QianfanManualEvaluator(*, evaluation_dimensions: List[ManualEvaluatorDimension] = [ManualEvaluatorDimension(dimension='满意度', description=None)])[source]
Bases:
QianfanEvaluatorqianfan manual evaluator config class
- evaluation_dimensions: List[ManualEvaluatorDimension]
- class qianfan.evaluation.evaluator.QianfanRefereeEvaluator(*, app_id: int, prompt_metrics: str = '综合得分', prompt_steps: str = '\n1.仔细阅读所提供的问题,确保你理解问题的要求和背景。\n2.仔细阅读所提供的标准答案,确保你理解问题的标准答案\n3.阅读答案,并检查是否用词不当\n4.检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n', prompt_max_score: int = 5)[source]
Bases:
QianfanEvaluatorqianfan referee evaluator config class
- app_id: int
- prompt_max_score: int
- prompt_metrics: str
- prompt_steps: str