We use a custom metric reflecting the needs of our use case: the automation rate vs accuracy on automatically coded samples curve
API serving
Text classification model served through a containerized REST API:
Simplicity for end users
Standard query format
Scalable
Modular and portable
Simple design thanks to the MLFLow wrapper
Continuous deployment with Argo CD
@router.post("/", response_model=List[PredictionResponse])asyncdef predict( credentials: Annotated[HTTPBasicCredentials, Depends(get_credentials)], request: Request, forms: BatchForms, ... num_workers: int=0, batch_size: int=1,):""" Endpoint for predicting batches of data. Args: credentials (HTTPBasicCredentials): The credentials for authentication. forms (Forms): The input data in the form of Forms object. num_workers (int, optional): Number of CPU for multiprocessing in Dataloader. Defaults to 1. batch_size (int, optional): Size of a batch for batch prediction. For single predictions, we recommend keeping num_workers and batch_size to 1 for better performance. For batched predictions, consider increasing these two parameters (num_workers can range from 4 to 12, batch size can be increased up to 256) to optimize performance. Returns: list: The list of predicted responses. """ input_data = forms.forms ... output = request.app.state.model.predict(input_data, params=params_dict)return [out.model_dump() for out in output]
API serving
API serving
asyncfunctiontransformToPost(description, top_k) {// Base URL with query parametersconst baseUrl =`https://codification-ape2025-pytorch.lab.sspcloud.fr/predict/?nb_echos_max=${top_k}&prob_min=0.01&num_workers=0&batch_size=1`;// Build the request body according to the expected schemaconst body = {forms: [ {description_activity: description } ] };// Send the POST requestconst response =awaitfetch(baseUrl, {method:"POST",headers: {"Content-Type":"application/json" },body:JSON.stringify(body) });// Parse and return the JSON responsereturn response.json();}