qianfan.evaluation package
Library aimed to helping developer to evaluate their model on qianfan
- class qianfan.evaluation.EvaluationManager(*, local_evaluators: Optional[List[LocalEvaluator]] = None, qianfan_evaluators: Optional[List[QianfanEvaluator]] = None, task_id: Optional[str] = None)[source]
Bases:
BaseModellogic control center of evaluation
- eval(llms: Sequence[Union[Model, Service]], dataset: Dataset, **kwargs: Any) Optional[EvaluationResult][source]
Evaluate the performance of models on the dataset.
- Args:
- llms (List[Union[Model, Service]]):
List of models or service to be evaluated.
- dataset (Dataset):
The dataset on which models will be evaluated.
- **kwargs (Any):
Other keyword arguments.
- Returns:
Optional[EvaluationResult]: Evaluation result of models on the dataset.
- local_evaluators: Optional[List[LocalEvaluator]]
- qianfan_evaluators: Optional[List[QianfanEvaluator]]
- task_id: Optional[str]
- class qianfan.evaluation.EvaluationResult(result_dataset: Optional[Dataset] = None, metrics: Optional[Dict[str, Dict[str, Any]]] = None)[source]
Bases:
objectEvaluation Result
Submodules
qianfan.evaluation.consts module
constants of evaluation
qianfan.evaluation.evaluation_manager module
manager which manage whole procedure of evaluation
- class qianfan.evaluation.evaluation_manager.EvaluationManager(*, local_evaluators: Optional[List[LocalEvaluator]] = None, qianfan_evaluators: Optional[List[QianfanEvaluator]] = None, task_id: Optional[str] = None)[source]
Bases:
BaseModellogic control center of evaluation
- eval(llms: Sequence[Union[Model, Service]], dataset: Dataset, **kwargs: Any) Optional[EvaluationResult][source]
Evaluate the performance of models on the dataset.
- Args:
- llms (List[Union[Model, Service]]):
List of models or service to be evaluated.
- dataset (Dataset):
The dataset on which models will be evaluated.
- **kwargs (Any):
Other keyword arguments.
- Returns:
Optional[EvaluationResult]: Evaluation result of models on the dataset.
- local_evaluators: Optional[List[LocalEvaluator]]
- qianfan_evaluators: Optional[List[QianfanEvaluator]]
- task_id: Optional[str]
qianfan.evaluation.evaluation_result module
The result of a evaluation
qianfan.evaluation.evaluator module
collection of evaluator
- class qianfan.evaluation.evaluator.Evaluator[source]
Bases:
BaseModel,ABCan class for evaluating single entry
- class qianfan.evaluation.evaluator.LocalEvaluator[source]
Bases:
Evaluator,ABCBass class for evaluator running locally
For user who want to implement their own LocalEvaluator, they should overwrite function evaluate, in which input represents input string or chat history, reference as standard answer of input, and output for llm output string.
And the return value should be a Dict containing evaluation metrics and metric values for single llm output.
- class qianfan.evaluation.evaluator.ManualEvaluatorDimension(*, dimension: str, description: Optional[str] = None)[source]
Bases:
BaseModeldimension used for manual mode
- description: Optional[str]
- dimension: str
- class qianfan.evaluation.evaluator.QianfanEvaluator[source]
Bases:
Evaluatorempty implementation base class for qianfan evaluator
- class qianfan.evaluation.evaluator.QianfanManualEvaluator(*, evaluation_dimensions: List[ManualEvaluatorDimension] = [ManualEvaluatorDimension(dimension='满意度', description=None)])[source]
Bases:
QianfanEvaluatorqianfan manual evaluator config class
- evaluation_dimensions: List[ManualEvaluatorDimension]
- class qianfan.evaluation.evaluator.QianfanRefereeEvaluator(*, app_id: int, prompt_metrics: str = '综合得分', prompt_steps: str = '\n1.仔细阅读所提供的问题,确保你理解问题的要求和背景。\n2.仔细阅读所提供的标准答案,确保你理解问题的标准答案\n3.阅读答案,并检查是否用词不当\n4.检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n', prompt_max_score: int = 5)[source]
Bases:
QianfanEvaluatorqianfan referee evaluator config class
- app_id: int
- prompt_max_score: int
- prompt_metrics: str
- prompt_steps: str
qianfan.evaluation.opencompass_evaluator module
opencompass evaluator evaluator
- class qianfan.evaluation.opencompass_evaluator.OpenCompassLocalEvaluator[source]
Bases:
LocalEvaluator