Application 1

Cette application illustrera certains apports des outils du NLP pour la codification automatique des déclarations d’activité dans la nomenclature des activités françaises. On pourra coder dans un notebook au sein de l’environnement SSP Cloud suivant:

Onyxia

Exploration du jeu de données

Ce tutoriel se propose d’illustrer la problématique de la classification automatique par le biais de l’algorithme d’apprentissage supervisé fastText à partir des données issues des déclarations Sirene.

Le code pour lire les données est directement fourni:

import matplotlib.pyplot as plt
import pandas as pd
from wordcloud import WordCloud

DATA_PATH = "https://minio.lab.sspcloud.fr/projet-formation/diffusion/mlops/data/firm_activity_data.parquet"
NAF_PATH = "https://www.insee.fr/fr/statistiques/fichier/2120875/naf2008_liste_n5.xls"
naf = pd.read_excel(NAF_PATH, skiprows = 2)
naf['Code'] = naf['Code'].str.replace(".","")
train = pd.read_parquet(DATA_PATH)
train = train.merge(naf, left_on = "nace", right_on = "Code")
train.head(5)
nace text Code Libellé
0 8220Z MISSIONS PONCTUELLES A L AIDE D UNE PLATEFORME 8220Z Activités de centres d'appels
1 8553Z INSPECTEUR AUTOMOBILE 8553Z Enseignement de la conduite
2 5520Z LA LOCATION TOURISTIQUE DE LOGEMENTS INSOLITES... 5520Z Hébergement touristique et autre hébergement d...
3 4791A COMMERCE DE TOUT ARTICLES ET PRODUITS MARCHAND... 4791A Vente à distance sur catalogue général
4 9499Z REGROUPEMENT RETRAITE 9499Z Autres organisations fonctionnant par adhésion...

Le premier exercice a vocation à illustrer la manière classique de rentrer dans un corpus de données textuelles. La démarche n’est pas particulièrement originale mais permet d’illustrer les enjeux du nettoyage de texte.

Dans une démarche exploratoire, le plus simple est de commencer par compter les mots de manière indépendante (approche sac de mot). Par exemple, de manière naturelle, nous avons beaucoup plus de déclarations liées à la boulangerie que liées à la data science:

filter_train_data(train, "data science").head(5)
filter_train_data(train, "boulanger").head(5)
Nombre d'occurrences de la séquence 'data science': 54
Nombre d'occurrences de la séquence 'boulanger': 1928
nace text Code Libellé
90 1071C BOULANGERIE PATISSERIE VIENNOISERIE 1071C Boulangerie et boulangerie-pâtisserie
107 1071C BOULANGERIE PATISSERIE FABRICATION 1071C Boulangerie et boulangerie-pâtisserie
153 1071C BOULANGERIE PATISSERIE GLACES CONFISERIES BOIS... 1071C Boulangerie et boulangerie-pâtisserie
314 1071C BOULANGERIE PATISSERIE VIENNOISERIE CONFISE... 1071C Boulangerie et boulangerie-pâtisserie
487 1071C BOULANGERIE PATISSERIE ACHAT VENTE ET MAINT... 1071C Boulangerie et boulangerie-pâtisserie

Les wordclouds peuvent servir à rapidement visualiser la structure d’un corpus. On voit ici que notre corpus est très bruité car nous n’avons pas nettoyé celui-ci:

Pour commencer à se faire une idée sur les spécificités des catégories, on peut représenter le corpus de certaines d’entre elles ? Arrivez-vous à inférer la catégorie de la NAF en question ? Si oui, vous utilisez sans doute des heuristiques proches de celles que nous allons mettre en oeuvre dans notre algorithme de classification.

Néanmoins, à ce stade, les données sont encore très bruitées. La première étape classique est de retirer les stop words et éventuellement des termes spécifiques à notre corpus. Par exemple, pour des données de caisse, on retirera les bruits, les abréviations, etc. qui peuvent bruiter notre corpus.

Premier algorithme d’apprentissage supervisé

Nous avons nettoyé nos données. Cela devrait améliorer la pertinence de nos modèles en réduisant le ratio signal/bruit. Nous allons généraliser notre nettoyage de texte en appliquant un peu plus d’étapes que précédemment. Nous allons notamment raciniser nos mots.

Pour cela, récupérer les fichiers suivants:

et mettre ceux-ci dans le même dossier que votre notebook Jupyter.

Le code de nettoyage est directement fourni:

from processor import Preprocessor
preprocessor = Preprocessor()

# Preprocess data before training and testing
TEXT_FEATURE = "text"
Y = "nace"

df = preprocessor.clean_text(train, TEXT_FEATURE).drop('text_clean', axis = "columns")
df.head(2)
nace text Code Libellé
0 8220Z mission ponctuel aid plateform 8220Z Activités de centres d'appels
1 8553Z inspecteur automobil 8553Z Enseignement de la conduite

Nous allons commencer à entraîner un modèle dont le plongement de mot est de faible dimension. Voici les paramètres qui seront utiles pour le prochain exercice.

import pathlib

params = {
    "dim": 25,
    "label_prefix": "__label__"
}

data_path = pathlib.Path("./data")
data_path.mkdir(parents=True, exist_ok=True)

def write_training_data(df, params, training_data_path=None):
    warnings.filterwarnings("ignore", "Setuptools is replacing distutils.")
    if training_data_path is None:
        training_data_path = get_root_path() / "data/training_data.txt"

    with open(training_data_path, "w", encoding="utf-8") as file:
        for _, item in df.iterrows():
            formatted_item = f"{params['label_prefix']}{item[Y]} {item[TEXT_FEATURE]}"
            file.write(f"{formatted_item}\n")
    return training_data_path.as_posix()
Read 1M wordsRead 2M wordsRead 3M wordsRead 4M wordsRead 4M words
Number of words:  26820
Number of labels: 699
Progress:   0.3% words/sec/thread:  222222 lr:  0.099729 avg.loss:  8.334959 ETA:   0h 0m37sProgress:   0.5% words/sec/thread:  221898 lr:  0.099455 avg.loss:  6.489203 ETA:   0h 0m36sProgress:   0.8% words/sec/thread:  222016 lr:  0.099178 avg.loss:  5.491721 ETA:   0h 0m36sProgress:   1.1% words/sec/thread:  220704 lr:  0.098909 avg.loss:  4.788203 ETA:   0h 0m36sProgress:   1.4% words/sec/thread:  221082 lr:  0.098639 avg.loss:  4.211728 ETA:   0h 0m36sProgress:   1.6% words/sec/thread:  221480 lr:  0.098368 avg.loss:  3.832979 ETA:   0h 0m36sProgress:   1.9% words/sec/thread:  221609 lr:  0.098098 avg.loss:  3.549031 ETA:   0h 0m36sProgress:   2.2% words/sec/thread:  221965 lr:  0.097825 avg.loss:  3.334481 ETA:   0h 0m36sProgress:   2.4% words/sec/thread:  222204 lr:  0.097553 avg.loss:  3.164074 ETA:   0h 0m36sProgress:   2.7% words/sec/thread:  222170 lr:  0.097284 avg.loss:  3.014777 ETA:   0h 0m36sProgress:   3.0% words/sec/thread:  222141 lr:  0.097014 avg.loss:  2.873260 ETA:   0h 0m35sProgress:   3.3% words/sec/thread:  222026 lr:  0.096746 avg.loss:  2.808947 ETA:   0h 0m35sProgress:   3.5% words/sec/thread:  222014 lr:  0.096477 avg.loss:  2.743132 ETA:   0h 0m35sProgress:   3.8% words/sec/thread:  221600 lr:  0.096214 avg.loss:  2.666778 ETA:   0h 0m35sProgress:   4.1% words/sec/thread:  221444 lr:  0.095948 avg.loss:  2.589062 ETA:   0h 0m35sProgress:   4.3% words/sec/thread:  221457 lr:  0.095679 avg.loss:  2.513007 ETA:   0h 0m35sProgress:   4.6% words/sec/thread:  221440 lr:  0.095410 avg.loss:  2.448326 ETA:   0h 0m35sProgress:   4.9% words/sec/thread:  221642 lr:  0.095137 avg.loss:  2.389624 ETA:   0h 0m35sProgress:   5.1% words/sec/thread:  221795 lr:  0.094864 avg.loss:  2.334245 ETA:   0h 0m35sProgress:   5.4% words/sec/thread:  221772 lr:  0.094596 avg.loss:  2.284260 ETA:   0h 0m35sProgress:   5.7% words/sec/thread:  221790 lr:  0.094326 avg.loss:  2.232221 ETA:   0h 0m35sProgress:   5.9% words/sec/thread:  221878 lr:  0.094054 avg.loss:  2.187194 ETA:   0h 0m34sProgress:   6.2% words/sec/thread:  221834 lr:  0.093786 avg.loss:  2.148152 ETA:   0h 0m34sProgress:   6.5% words/sec/thread:  221458 lr:  0.093528 avg.loss:  2.113349 ETA:   0h 0m34sProgress:   6.7% words/sec/thread:  221532 lr:  0.093257 avg.loss:  2.074541 ETA:   0h 0m34sProgress:   7.0% words/sec/thread:  221520 lr:  0.092988 avg.loss:  2.041353 ETA:   0h 0m34sProgress:   7.3% words/sec/thread:  221515 lr:  0.092719 avg.loss:  2.008992 ETA:   0h 0m34sProgress:   7.6% words/sec/thread:  221557 lr:  0.092449 avg.loss:  1.979816 ETA:   0h 0m34sProgress:   7.8% words/sec/thread:  221487 lr:  0.092182 avg.loss:  1.952689 ETA:   0h 0m34sProgress:   8.1% words/sec/thread:  221552 lr:  0.091911 avg.loss:  1.926539 ETA:   0h 0m34sProgress:   8.4% words/sec/thread:  221586 lr:  0.091641 avg.loss:  1.902669 ETA:   0h 0m34sProgress:   8.6% words/sec/thread:  221556 lr:  0.091373 avg.loss:  1.877792 ETA:   0h 0m33sProgress:   8.9% words/sec/thread:  221530 lr:  0.091105 avg.loss:  1.855325 ETA:   0h 0m33sProgress:   9.2% words/sec/thread:  221384 lr:  0.090842 avg.loss:  1.834237 ETA:   0h 0m33sProgress:   9.4% words/sec/thread:  221373 lr:  0.090573 avg.loss:  1.812660 ETA:   0h 0m33sProgress:   9.7% words/sec/thread:  221307 lr:  0.090307 avg.loss:  1.790830 ETA:   0h 0m33sProgress:  10.0% words/sec/thread:  221291 lr:  0.090039 avg.loss:  1.774563 ETA:   0h 0m33sProgress:  10.2% words/sec/thread:  221276 lr:  0.089771 avg.loss:  1.756998 ETA:   0h 0m33sProgress:  10.5% words/sec/thread:  221348 lr:  0.089499 avg.loss:  1.740047 ETA:   0h 0m33sProgress:  10.8% words/sec/thread:  221303 lr:  0.089233 avg.loss:  1.721676 ETA:   0h 0m33sProgress:  11.0% words/sec/thread:  221322 lr:  0.088963 avg.loss:  1.705550 ETA:   0h 0m33sProgress:  11.3% words/sec/thread:  221308 lr:  0.088695 avg.loss:  1.686978 ETA:   0h 0m33sProgress:  11.6% words/sec/thread:  221359 lr:  0.088424 avg.loss:  1.671093 ETA:   0h 0m32sProgress:  11.8% words/sec/thread:  221307 lr:  0.088158 avg.loss:  1.659806 ETA:   0h 0m32sProgress:  12.1% words/sec/thread:  221275 lr:  0.087891 avg.loss:  1.646089 ETA:   0h 0m32sProgress:  12.4% words/sec/thread:  221307 lr:  0.087620 avg.loss:  1.632876 ETA:   0h 0m32sProgress:  12.6% words/sec/thread:  221268 lr:  0.087354 avg.loss:  1.621455 ETA:   0h 0m32sProgress:  12.9% words/sec/thread:  221289 lr:  0.087084 avg.loss:  1.608689 ETA:   0h 0m32sProgress:  13.2% words/sec/thread:  221329 lr:  0.086812 avg.loss:  1.598138 ETA:   0h 0m32sProgress:  13.5% words/sec/thread:  221312 lr:  0.086545 avg.loss:  1.590196 ETA:   0h 0m32sProgress:  13.7% words/sec/thread:  221317 lr:  0.086275 avg.loss:  1.583292 ETA:   0h 0m32sProgress:  14.0% words/sec/thread:  221305 lr:  0.086007 avg.loss:  1.576262 ETA:   0h 0m32sProgress:  14.3% words/sec/thread:  221287 lr:  0.085740 avg.loss:  1.569649 ETA:   0h 0m31sProgress:  14.5% words/sec/thread:  221293 lr:  0.085471 avg.loss:  1.563027 ETA:   0h 0m31sProgress:  14.8% words/sec/thread:  221172 lr:  0.085210 avg.loss:  1.552657 ETA:   0h 0m31sProgress:  15.1% words/sec/thread:  221172 lr:  0.084942 avg.loss:  1.542013 ETA:   0h 0m31sProgress:  15.3% words/sec/thread:  221171 lr:  0.084673 avg.loss:  1.530227 ETA:   0h 0m31sProgress:  15.6% words/sec/thread:  221202 lr:  0.084402 avg.loss:  1.521034 ETA:   0h 0m31sProgress:  15.9% words/sec/thread:  221197 lr:  0.084134 avg.loss:  1.512556 ETA:   0h 0m31sProgress:  16.1% words/sec/thread:  221187 lr:  0.083866 avg.loss:  1.505225 ETA:   0h 0m31sProgress:  16.4% words/sec/thread:  221176 lr:  0.083598 avg.loss:  1.496475 ETA:   0h 0m31sProgress:  16.7% words/sec/thread:  221174 lr:  0.083329 avg.loss:  1.488645 ETA:   0h 0m31sProgress:  16.9% words/sec/thread:  221139 lr:  0.083063 avg.loss:  1.479832 ETA:   0h 0m30sProgress:  17.2% words/sec/thread:  221116 lr:  0.082797 avg.loss:  1.471715 ETA:   0h 0m30sProgress:  17.5% words/sec/thread:  221029 lr:  0.082535 avg.loss:  1.464522 ETA:   0h 0m30sProgress:  17.7% words/sec/thread:  221003 lr:  0.082269 avg.loss:  1.456631 ETA:   0h 0m30sProgress:  18.0% words/sec/thread:  221021 lr:  0.081999 avg.loss:  1.448672 ETA:   0h 0m30sProgress:  18.3% words/sec/thread:  221039 lr:  0.081729 avg.loss:  1.441509 ETA:   0h 0m30sProgress:  18.5% words/sec/thread:  221068 lr:  0.081458 avg.loss:  1.434805 ETA:   0h 0m30sProgress:  18.8% words/sec/thread:  221100 lr:  0.081187 avg.loss:  1.426661 ETA:   0h 0m30sProgress:  19.1% words/sec/thread:  221096 lr:  0.080919 avg.loss:  1.418709 ETA:   0h 0m30sProgress:  19.3% words/sec/thread:  221062 lr:  0.080653 avg.loss:  1.411663 ETA:   0h 0m30sProgress:  19.6% words/sec/thread:  221087 lr:  0.080383 avg.loss:  1.405313 ETA:   0h 0m29sProgress:  19.9% words/sec/thread:  221130 lr:  0.080110 avg.loss:  1.398888 ETA:   0h 0m29sProgress:  20.2% words/sec/thread:  221048 lr:  0.079849 avg.loss:  1.392529 ETA:   0h 0m29sProgress:  20.4% words/sec/thread:  221065 lr:  0.079579 avg.loss:  1.386115 ETA:   0h 0m29sProgress:  20.7% words/sec/thread:  221063 lr:  0.079311 avg.loss:  1.380249 ETA:   0h 0m29sProgress:  21.0% words/sec/thread:  221088 lr:  0.079040 avg.loss:  1.374889 ETA:   0h 0m29sProgress:  21.2% words/sec/thread:  221080 lr:  0.078772 avg.loss:  1.368945 ETA:   0h 0m29sProgress:  21.5% words/sec/thread:  221079 lr:  0.078504 avg.loss:  1.362819 ETA:   0h 0m29sProgress:  21.8% words/sec/thread:  221084 lr:  0.078235 avg.loss:  1.357011 ETA:   0h 0m29sProgress:  22.0% words/sec/thread:  221098 lr:  0.077965 avg.loss:  1.350540 ETA:   0h 0m29sProgress:  22.3% words/sec/thread:  221098 lr:  0.077697 avg.loss:  1.344844 ETA:   0h 0m28sProgress:  22.6% words/sec/thread:  221094 lr:  0.077429 avg.loss:  1.338595 ETA:   0h 0m28sProgress:  22.8% words/sec/thread:  221024 lr:  0.077168 avg.loss:  1.333354 ETA:   0h 0m28sProgress:  23.1% words/sec/thread:  221001 lr:  0.076902 avg.loss:  1.327761 ETA:   0h 0m28sProgress:  23.4% words/sec/thread:  221015 lr:  0.076632 avg.loss:  1.322733 ETA:   0h 0m28sProgress:  23.6% words/sec/thread:  221004 lr:  0.076364 avg.loss:  1.317777 ETA:   0h 0m28sProgress:  23.9% words/sec/thread:  221004 lr:  0.076096 avg.loss:  1.312924 ETA:   0h 0m28sProgress:  24.2% words/sec/thread:  221020 lr:  0.075826 avg.loss:  1.308280 ETA:   0h 0m28sProgress:  24.4% words/sec/thread:  221054 lr:  0.075554 avg.loss:  1.303284 ETA:   0h 0m28sProgress:  24.7% words/sec/thread:  221052 lr:  0.075286 avg.loss:  1.299090 ETA:   0h 0m28sProgress:  25.0% words/sec/thread:  221077 lr:  0.075014 avg.loss:  1.294265 ETA:   0h 0m27sProgress:  25.3% words/sec/thread:  221090 lr:  0.074745 avg.loss:  1.290327 ETA:   0h 0m27sProgress:  25.5% words/sec/thread:  221035 lr:  0.074482 avg.loss:  1.286141 ETA:   0h 0m27sProgress:  25.8% words/sec/thread:  221053 lr:  0.074212 avg.loss:  1.281860 ETA:   0h 0m27sProgress:  26.1% words/sec/thread:  221056 lr:  0.073943 avg.loss:  1.277866 ETA:   0h 0m27sProgress:  26.3% words/sec/thread:  221083 lr:  0.073672 avg.loss:  1.274375 ETA:   0h 0m27sProgress:  26.6% words/sec/thread:  221097 lr:  0.073401 avg.loss:  1.269349 ETA:   0h 0m27sProgress:  26.9% words/sec/thread:  221100 lr:  0.073133 avg.loss:  1.265177 ETA:   0h 0m27sProgress:  27.1% words/sec/thread:  221106 lr:  0.072863 avg.loss:  1.260931 ETA:   0h 0m27sProgress:  27.4% words/sec/thread:  221124 lr:  0.072593 avg.loss:  1.256718 ETA:   0h 0m27sProgress:  27.7% words/sec/thread:  221142 lr:  0.072322 avg.loss:  1.252596 ETA:   0h 0m26sProgress:  27.9% words/sec/thread:  221139 lr:  0.072054 avg.loss:  1.248794 ETA:   0h 0m26sProgress:  28.2% words/sec/thread:  221113 lr:  0.071789 avg.loss:  1.245719 ETA:   0h 0m26sProgress:  28.5% words/sec/thread:  221105 lr:  0.071521 avg.loss:  1.242473 ETA:   0h 0m26sProgress:  28.7% words/sec/thread:  221121 lr:  0.071251 avg.loss:  1.238919 ETA:   0h 0m26sProgress:  29.0% words/sec/thread:  221119 lr:  0.070982 avg.loss:  1.235805 ETA:   0h 0m26sProgress:  29.3% words/sec/thread:  221127 lr:  0.070713 avg.loss:  1.232382 ETA:   0h 0m26sProgress:  29.6% words/sec/thread:  221116 lr:  0.070446 avg.loss:  1.229187 ETA:   0h 0m26sProgress:  29.8% words/sec/thread:  221141 lr:  0.070174 avg.loss:  1.225700 ETA:   0h 0m26sProgress:  30.1% words/sec/thread:  221125 lr:  0.069908 avg.loss:  1.222500 ETA:   0h 0m26sProgress:  30.4% words/sec/thread:  221138 lr:  0.069638 avg.loss:  1.219662 ETA:   0h 0m25sProgress:  30.6% words/sec/thread:  221164 lr:  0.069365 avg.loss:  1.216043 ETA:   0h 0m25sProgress:  30.9% words/sec/thread:  221180 lr:  0.069095 avg.loss:  1.213240 ETA:   0h 0m25sProgress:  31.2% words/sec/thread:  221151 lr:  0.068830 avg.loss:  1.210324 ETA:   0h 0m25sProgress:  31.4% words/sec/thread:  221153 lr:  0.068561 avg.loss:  1.207042 ETA:   0h 0m25sProgress:  31.7% words/sec/thread:  221140 lr:  0.068295 avg.loss:  1.204211 ETA:   0h 0m25sProgress:  32.0% words/sec/thread:  221162 lr:  0.068023 avg.loss:  1.201383 ETA:   0h 0m25sProgress:  32.2% words/sec/thread:  221170 lr:  0.067753 avg.loss:  1.198069 ETA:   0h 0m25sProgress:  32.5% words/sec/thread:  221195 lr:  0.067481 avg.loss:  1.195149 ETA:   0h 0m25sProgress:  32.8% words/sec/thread:  221209 lr:  0.067210 avg.loss:  1.191997 ETA:   0h 0m25sProgress:  33.1% words/sec/thread:  221221 lr:  0.066940 avg.loss:  1.189479 ETA:   0h 0m24sProgress:  33.3% words/sec/thread:  221223 lr:  0.066671 avg.loss:  1.186658 ETA:   0h 0m24sProgress:  33.6% words/sec/thread:  221220 lr:  0.066403 avg.loss:  1.183741 ETA:   0h 0m24sProgress:  33.9% words/sec/thread:  221174 lr:  0.066141 avg.loss:  1.180929 ETA:   0h 0m24sProgress:  34.1% words/sec/thread:  221195 lr:  0.065870 avg.loss:  1.177963 ETA:   0h 0m24sProgress:  34.4% words/sec/thread:  221196 lr:  0.065601 avg.loss:  1.175435 ETA:   0h 0m24sProgress:  34.7% words/sec/thread:  221204 lr:  0.065331 avg.loss:  1.172373 ETA:   0h 0m24sProgress:  34.9% words/sec/thread:  221210 lr:  0.065062 avg.loss:  1.169577 ETA:   0h 0m24sProgress:  35.2% words/sec/thread:  221220 lr:  0.064791 avg.loss:  1.167237 ETA:   0h 0m24sProgress:  35.5% words/sec/thread:  221236 lr:  0.064520 avg.loss:  1.164364 ETA:   0h 0m24sProgress:  35.8% words/sec/thread:  221258 lr:  0.064248 avg.loss:  1.161345 ETA:   0h 0m23sProgress:  36.0% words/sec/thread:  221272 lr:  0.063977 avg.loss:  1.158625 ETA:   0h 0m23sProgress:  36.3% words/sec/thread:  221290 lr:  0.063706 avg.loss:  1.156259 ETA:   0h 0m23sProgress:  36.6% words/sec/thread:  221266 lr:  0.063441 avg.loss:  1.153881 ETA:   0h 0m23sProgress:  36.8% words/sec/thread:  221271 lr:  0.063171 avg.loss:  1.151194 ETA:   0h 0m23sProgress:  37.1% words/sec/thread:  221270 lr:  0.062903 avg.loss:  1.148711 ETA:   0h 0m23sProgress:  37.4% words/sec/thread:  221269 lr:  0.062634 avg.loss:  1.146519 ETA:   0h 0m23sProgress:  37.6% words/sec/thread:  221252 lr:  0.062369 avg.loss:  1.144480 ETA:   0h 0m23sProgress:  37.9% words/sec/thread:  221257 lr:  0.062099 avg.loss:  1.141556 ETA:   0h 0m23sProgress:  38.2% words/sec/thread:  221271 lr:  0.061828 avg.loss:  1.139248 ETA:   0h 0m23sProgress:  38.4% words/sec/thread:  221277 lr:  0.061558 avg.loss:  1.136660 ETA:   0h 0m22sProgress:  38.7% words/sec/thread:  221296 lr:  0.061287 avg.loss:  1.134147 ETA:   0h 0m22sProgress:  39.0% words/sec/thread:  221290 lr:  0.061019 avg.loss:  1.132013 ETA:   0h 0m22sProgress:  39.2% words/sec/thread:  221251 lr:  0.060757 avg.loss:  1.130043 ETA:   0h 0m22sProgress:  39.5% words/sec/thread:  221267 lr:  0.060486 avg.loss:  1.128065 ETA:   0h 0m22sProgress:  39.8% words/sec/thread:  221285 lr:  0.060214 avg.loss:  1.125991 ETA:   0h 0m22sProgress:  40.1% words/sec/thread:  221280 lr:  0.059946 avg.loss:  1.123966 ETA:   0h 0m22sProgress:  40.3% words/sec/thread:  221269 lr:  0.059679 avg.loss:  1.121430 ETA:   0h 0m22sProgress:  40.6% words/sec/thread:  221279 lr:  0.059409 avg.loss:  1.119603 ETA:   0h 0m22sProgress:  40.9% words/sec/thread:  221290 lr:  0.059138 avg.loss:  1.117434 ETA:   0h 0m22sProgress:  41.1% words/sec/thread:  221316 lr:  0.058865 avg.loss:  1.115255 ETA:   0h 0m21sProgress:  41.4% words/sec/thread:  221325 lr:  0.058594 avg.loss:  1.112909 ETA:   0h 0m21sProgress:  41.7% words/sec/thread:  221338 lr:  0.058323 avg.loss:  1.110886 ETA:   0h 0m21sProgress:  41.9% words/sec/thread:  221299 lr:  0.058062 avg.loss:  1.108683 ETA:   0h 0m21sProgress:  42.2% words/sec/thread:  221303 lr:  0.057792 avg.loss:  1.106414 ETA:   0h 0m21sProgress:  42.5% words/sec/thread:  221294 lr:  0.057525 avg.loss:  1.103952 ETA:   0h 0m21sProgress:  42.7% words/sec/thread:  221313 lr:  0.057253 avg.loss:  1.101766 ETA:   0h 0m21sProgress:  43.0% words/sec/thread:  221321 lr:  0.056983 avg.loss:  1.099527 ETA:   0h 0m21sProgress:  43.3% words/sec/thread:  221264 lr:  0.056725 avg.loss:  1.097430 ETA:   0h 0m21sProgress:  43.5% words/sec/thread:  221274 lr:  0.056455 avg.loss:  1.095625 ETA:   0h 0m21sProgress:  43.8% words/sec/thread:  221269 lr:  0.056187 avg.loss:  1.093550 ETA:   0h 0m20sProgress:  44.1% words/sec/thread:  221281 lr:  0.055916 avg.loss:  1.091722 ETA:   0h 0m20sProgress:  44.4% words/sec/thread:  221284 lr:  0.055646 avg.loss:  1.089769 ETA:   0h 0m20sProgress:  44.6% words/sec/thread:  221261 lr:  0.055382 avg.loss:  1.088263 ETA:   0h 0m20sProgress:  44.9% words/sec/thread:  221252 lr:  0.055116 avg.loss:  1.086469 ETA:   0h 0m20sProgress:  45.2% words/sec/thread:  221247 lr:  0.054848 avg.loss:  1.084812 ETA:   0h 0m20sProgress:  45.4% words/sec/thread:  221240 lr:  0.054581 avg.loss:  1.082642 ETA:   0h 0m20sProgress:  45.7% words/sec/thread:  221247 lr:  0.054311 avg.loss:  1.080596 ETA:   0h 0m20sProgress:  46.0% words/sec/thread:  221275 lr:  0.054036 avg.loss:  1.078825 ETA:   0h 0m20sProgress:  46.2% words/sec/thread:  221282 lr:  0.053766 avg.loss:  1.077612 ETA:   0h 0m20sProgress:  46.5% words/sec/thread:  221288 lr:  0.053496 avg.loss:  1.076455 ETA:   0h 0m19sProgress:  46.8% words/sec/thread:  221291 lr:  0.053227 avg.loss:  1.075261 ETA:   0h 0m19sProgress:  47.0% words/sec/thread:  221303 lr:  0.052956 avg.loss:  1.074283 ETA:   0h 0m19sProgress:  47.3% words/sec/thread:  221308 lr:  0.052686 avg.loss:  1.072984 ETA:   0h 0m19sProgress:  47.6% words/sec/thread:  221291 lr:  0.052421 avg.loss:  1.071589 ETA:   0h 0m19sProgress:  47.9% words/sec/thread:  221304 lr:  0.052149 avg.loss:  1.070374 ETA:   0h 0m19sProgress:  48.1% words/sec/thread:  221301 lr:  0.051881 avg.loss:  1.069057 ETA:   0h 0m19sProgress:  48.4% words/sec/thread:  221317 lr:  0.051609 avg.loss:  1.067562 ETA:   0h 0m19sProgress:  48.7% words/sec/thread:  221323 lr:  0.051339 avg.loss:  1.065980 ETA:   0h 0m19sProgress:  48.9% words/sec/thread:  221318 lr:  0.051071 avg.loss:  1.064532 ETA:   0h 0m19sProgress:  49.2% words/sec/thread:  221317 lr:  0.050803 avg.loss:  1.063046 ETA:   0h 0m18sProgress:  49.5% words/sec/thread:  221323 lr:  0.050533 avg.loss:  1.061725 ETA:   0h 0m18sProgress:  49.7% words/sec/thread:  221337 lr:  0.050261 avg.loss:  1.060636 ETA:   0h 0m18sProgress:  50.0% words/sec/thread:  221346 lr:  0.049990 avg.loss:  1.059692 ETA:   0h 0m18sProgress:  50.3% words/sec/thread:  221320 lr:  0.049728 avg.loss:  1.058633 ETA:   0h 0m18sProgress:  50.5% words/sec/thread:  221326 lr:  0.049457 avg.loss:  1.057492 ETA:   0h 0m18sProgress:  50.8% words/sec/thread:  221334 lr:  0.049187 avg.loss:  1.056457 ETA:   0h 0m18sProgress:  51.1% words/sec/thread:  221351 lr:  0.048915 avg.loss:  1.055236 ETA:   0h 0m18sProgress:  51.4% words/sec/thread:  221344 lr:  0.048647 avg.loss:  1.054034 ETA:   0h 0m18sProgress:  51.6% words/sec/thread:  221355 lr:  0.048376 avg.loss:  1.052721 ETA:   0h 0m18sProgress:  51.9% words/sec/thread:  221366 lr:  0.048104 avg.loss:  1.051648 ETA:   0h 0m17sProgress:  52.2% words/sec/thread:  221362 lr:  0.047837 avg.loss:  1.050374 ETA:   0h 0m17sProgress:  52.4% words/sec/thread:  221373 lr:  0.047565 avg.loss:  1.049402 ETA:   0h 0m17sProgress:  52.7% words/sec/thread:  221378 lr:  0.047296 avg.loss:  1.048124 ETA:   0h 0m17sProgress:  53.0% words/sec/thread:  221348 lr:  0.047034 avg.loss:  1.046836 ETA:   0h 0m17sProgress:  53.2% words/sec/thread:  221355 lr:  0.046763 avg.loss:  1.045287 ETA:   0h 0m17sProgress:  53.5% words/sec/thread:  221363 lr:  0.046492 avg.loss:  1.044436 ETA:   0h 0m17sProgress:  53.8% words/sec/thread:  221338 lr:  0.046229 avg.loss:  1.043257 ETA:   0h 0m17sProgress:  54.0% words/sec/thread:  221353 lr:  0.045957 avg.loss:  1.042068 ETA:   0h 0m17sProgress:  54.3% words/sec/thread:  221360 lr:  0.045687 avg.loss:  1.040940 ETA:   0h 0m17sProgress:  54.6% words/sec/thread:  221361 lr:  0.045418 avg.loss:  1.039740 ETA:   0h 0m16sProgress:  54.9% words/sec/thread:  221371 lr:  0.045146 avg.loss:  1.038830 ETA:   0h 0m16sProgress:  55.1% words/sec/thread:  221378 lr:  0.044876 avg.loss:  1.037476 ETA:   0h 0m16sProgress:  55.4% words/sec/thread:  221388 lr:  0.044604 avg.loss:  1.036217 ETA:   0h 0m16sProgress:  55.7% words/sec/thread:  221360 lr:  0.044343 avg.loss:  1.034946 ETA:   0h 0m16sProgress:  55.9% words/sec/thread:  221372 lr:  0.044071 avg.loss:  1.033996 ETA:   0h 0m16sProgress:  56.2% words/sec/thread:  221363 lr:  0.043805 avg.loss:  1.032855 ETA:   0h 0m16sProgress:  56.5% words/sec/thread:  221378 lr:  0.043532 avg.loss:  1.031944 ETA:   0h 0m16sProgress:  56.7% words/sec/thread:  221378 lr:  0.043263 avg.loss:  1.030582 ETA:   0h 0m16sProgress:  57.0% words/sec/thread:  221377 lr:  0.042994 avg.loss:  1.029199 ETA:   0h 0m16sProgress:  57.3% words/sec/thread:  221376 lr:  0.042726 avg.loss:  1.027878 ETA:   0h 0m15sProgress:  57.5% words/sec/thread:  221376 lr:  0.042457 avg.loss:  1.026757 ETA:   0h 0m15sProgress:  57.8% words/sec/thread:  221377 lr:  0.042188 avg.loss:  1.026038 ETA:   0h 0m15sProgress:  58.1% words/sec/thread:  221383 lr:  0.041917 avg.loss:  1.024716 ETA:   0h 0m15sProgress:  58.3% words/sec/thread:  221367 lr:  0.041653 avg.loss:  1.023801 ETA:   0h 0m15sProgress:  58.6% words/sec/thread:  221359 lr:  0.041386 avg.loss:  1.022748 ETA:   0h 0m15sProgress:  58.9% words/sec/thread:  221366 lr:  0.041116 avg.loss:  1.021841 ETA:   0h 0m15sProgress:  59.2% words/sec/thread:  221363 lr:  0.040848 avg.loss:  1.020803 ETA:   0h 0m15sProgress:  59.4% words/sec/thread:  221362 lr:  0.040579 avg.loss:  1.019666 ETA:   0h 0m15sProgress:  59.7% words/sec/thread:  221369 lr:  0.040309 avg.loss:  1.018596 ETA:   0h 0m15sProgress:  60.0% words/sec/thread:  221375 lr:  0.040038 avg.loss:  1.017696 ETA:   0h 0m14sProgress:  60.2% words/sec/thread:  221380 lr:  0.039768 avg.loss:  1.016496 ETA:   0h 0m14sProgress:  60.5% words/sec/thread:  221379 lr:  0.039499 avg.loss:  1.015388 ETA:   0h 0m14sProgress:  60.8% words/sec/thread:  221376 lr:  0.039231 avg.loss:  1.014025 ETA:   0h 0m14sProgress:  61.0% words/sec/thread:  221359 lr:  0.038967 avg.loss:  1.013116 ETA:   0h 0m14sProgress:  61.3% words/sec/thread:  221354 lr:  0.038700 avg.loss:  1.012259 ETA:   0h 0m14sProgress:  61.6% words/sec/thread:  221357 lr:  0.038430 avg.loss:  1.011506 ETA:   0h 0m14sProgress:  61.8% words/sec/thread:  221356 lr:  0.038162 avg.loss:  1.010466 ETA:   0h 0m14sProgress:  62.1% words/sec/thread:  221365 lr:  0.037890 avg.loss:  1.009825 ETA:   0h 0m14sProgress:  62.4% words/sec/thread:  221379 lr:  0.037617 avg.loss:  1.008803 ETA:   0h 0m14sProgress:  62.7% words/sec/thread:  221376 lr:  0.037350 avg.loss:  1.007692 ETA:   0h 0m13sProgress:  62.9% words/sec/thread:  221376 lr:  0.037081 avg.loss:  1.006877 ETA:   0h 0m13sProgress:  63.2% words/sec/thread:  221389 lr:  0.036808 avg.loss:  1.005874 ETA:   0h 0m13sProgress:  63.5% words/sec/thread:  221387 lr:  0.036540 avg.loss:  1.004845 ETA:   0h 0m13sProgress:  63.7% words/sec/thread:  221389 lr:  0.036261 avg.loss:  1.003945 ETA:   0h 0m13sProgress:  64.0% words/sec/thread:  221379 lr:  0.035995 avg.loss:  1.003255 ETA:   0h 0m13sProgress:  64.3% words/sec/thread:  221385 lr:  0.035724 avg.loss:  1.002166 ETA:   0h 0m13sProgress:  64.5% words/sec/thread:  221393 lr:  0.035453 avg.loss:  1.000965 ETA:   0h 0m13sProgress:  64.8% words/sec/thread:  221388 lr:  0.035186 avg.loss:  0.999828 ETA:   0h 0m13sProgress:  65.1% words/sec/thread:  221397 lr:  0.034914 avg.loss:  0.999003 ETA:   0h 0m12sProgress:  65.4% words/sec/thread:  221412 lr:  0.034641 avg.loss:  0.998170 ETA:   0h 0m12sProgress:  65.6% words/sec/thread:  221412 lr:  0.034372 avg.loss:  0.997210 ETA:   0h 0m12sProgress:  65.9% words/sec/thread:  221416 lr:  0.034102 avg.loss:  0.996096 ETA:   0h 0m12sProgress:  66.2% words/sec/thread:  221425 lr:  0.033831 avg.loss:  0.995147 ETA:   0h 0m12sProgress:  66.4% words/sec/thread:  221432 lr:  0.033559 avg.loss:  0.994357 ETA:   0h 0m12sProgress:  66.7% words/sec/thread:  221413 lr:  0.033297 avg.loss:  0.993474 ETA:   0h 0m12sProgress:  67.0% words/sec/thread:  221422 lr:  0.033025 avg.loss:  0.992582 ETA:   0h 0m12sProgress:  67.2% words/sec/thread:  221424 lr:  0.032755 avg.loss:  0.991569 ETA:   0h 0m12sProgress:  67.5% words/sec/thread:  221429 lr:  0.032485 avg.loss:  0.990449 ETA:   0h 0m12sProgress:  67.8% words/sec/thread:  221426 lr:  0.032217 avg.loss:  0.989407 ETA:   0h 0m11sProgress:  68.1% words/sec/thread:  221421 lr:  0.031950 avg.loss:  0.988171 ETA:   0h 0m11sProgress:  68.3% words/sec/thread:  221419 lr:  0.031681 avg.loss:  0.987209 ETA:   0h 0m11sProgress:  68.6% words/sec/thread:  221416 lr:  0.031414 avg.loss:  0.986114 ETA:   0h 0m11sProgress:  68.9% words/sec/thread:  221415 lr:  0.031145 avg.loss:  0.985212 ETA:   0h 0m11sProgress:  69.1% words/sec/thread:  221416 lr:  0.030876 avg.loss:  0.984259 ETA:   0h 0m11sProgress:  69.4% words/sec/thread:  221397 lr:  0.030613 avg.loss:  0.983269 ETA:   0h 0m11sProgress:  69.7% words/sec/thread:  221401 lr:  0.030343 avg.loss:  0.982422 ETA:   0h 0m11sProgress:  69.9% words/sec/thread:  221408 lr:  0.030072 avg.loss:  0.981380 ETA:   0h 0m11sProgress:  70.2% words/sec/thread:  221412 lr:  0.029802 avg.loss:  0.980637 ETA:   0h 0m11sProgress:  70.5% words/sec/thread:  221413 lr:  0.029532 avg.loss:  0.979785 ETA:   0h 0m10sProgress:  70.7% words/sec/thread:  221401 lr:  0.029268 avg.loss:  0.979064 ETA:   0h 0m10sProgress:  71.0% words/sec/thread:  221406 lr:  0.028997 avg.loss:  0.978273 ETA:   0h 0m10sProgress:  71.3% words/sec/thread:  221403 lr:  0.028729 avg.loss:  0.977402 ETA:   0h 0m10sProgress:  71.5% words/sec/thread:  221408 lr:  0.028459 avg.loss:  0.976691 ETA:   0h 0m10sProgress:  71.8% words/sec/thread:  221409 lr:  0.028189 avg.loss:  0.976005 ETA:   0h 0m10sProgress:  72.1% words/sec/thread:  221384 lr:  0.027929 avg.loss:  0.975002 ETA:   0h 0m10sProgress:  72.3% words/sec/thread:  221378 lr:  0.027662 avg.loss:  0.974210 ETA:   0h 0m10sProgress:  72.6% words/sec/thread:  221382 lr:  0.027392 avg.loss:  0.973336 ETA:   0h 0m10sProgress:  72.9% words/sec/thread:  221389 lr:  0.027120 avg.loss:  0.972334 ETA:   0h 0m10sProgress:  73.2% words/sec/thread:  221400 lr:  0.026848 avg.loss:  0.971277 ETA:   0h 0m 9sProgress:  73.4% words/sec/thread:  221407 lr:  0.026577 avg.loss:  0.970429 ETA:   0h 0m 9sProgress:  73.7% words/sec/thread:  221411 lr:  0.026307 avg.loss:  0.969685 ETA:   0h 0m 9sProgress:  74.0% words/sec/thread:  221420 lr:  0.026035 avg.loss:  0.968858 ETA:   0h 0m 9sProgress:  74.2% words/sec/thread:  221425 lr:  0.025765 avg.loss:  0.968052 ETA:   0h 0m 9sProgress:  74.5% words/sec/thread:  221424 lr:  0.025496 avg.loss:  0.967367 ETA:   0h 0m 9sProgress:  74.8% words/sec/thread:  221407 lr:  0.025233 avg.loss:  0.966674 ETA:   0h 0m 9sProgress:  75.0% words/sec/thread:  221404 lr:  0.024965 avg.loss:  0.965795 ETA:   0h 0m 9sProgress:  75.3% words/sec/thread:  221402 lr:  0.024697 avg.loss:  0.964988 ETA:   0h 0m 9sProgress:  75.6% words/sec/thread:  221398 lr:  0.024429 avg.loss:  0.964168 ETA:   0h 0m 9sProgress:  75.8% words/sec/thread:  221405 lr:  0.024158 avg.loss:  0.963559 ETA:   0h 0m 8sProgress:  76.1% words/sec/thread:  221409 lr:  0.023888 avg.loss:  0.962636 ETA:   0h 0m 8sProgress:  76.4% words/sec/thread:  221415 lr:  0.023617 avg.loss:  0.961908 ETA:   0h 0m 8sProgress:  76.7% words/sec/thread:  221416 lr:  0.023348 avg.loss:  0.961213 ETA:   0h 0m 8sProgress:  76.9% words/sec/thread:  221418 lr:  0.023078 avg.loss:  0.960396 ETA:   0h 0m 8sProgress:  77.2% words/sec/thread:  221421 lr:  0.022808 avg.loss:  0.959695 ETA:   0h 0m 8sProgress:  77.5% words/sec/thread:  221406 lr:  0.022545 avg.loss:  0.959051 ETA:   0h 0m 8sProgress:  77.7% words/sec/thread:  221403 lr:  0.022277 avg.loss:  0.958154 ETA:   0h 0m 8sProgress:  78.0% words/sec/thread:  221401 lr:  0.022008 avg.loss:  0.957249 ETA:   0h 0m 8sProgress:  78.3% words/sec/thread:  221407 lr:  0.021738 avg.loss:  0.956533 ETA:   0h 0m 8sProgress:  78.5% words/sec/thread:  221409 lr:  0.021468 avg.loss:  0.955947 ETA:   0h 0m 7sProgress:  78.8% words/sec/thread:  221414 lr:  0.021198 avg.loss:  0.955208 ETA:   0h 0m 7sProgress:  79.1% words/sec/thread:  221418 lr:  0.020927 avg.loss:  0.954420 ETA:   0h 0m 7sProgress:  79.3% words/sec/thread:  221421 lr:  0.020657 avg.loss:  0.953709 ETA:   0h 0m 7sProgress:  79.6% words/sec/thread:  221421 lr:  0.020388 avg.loss:  0.952976 ETA:   0h 0m 7sProgress:  79.9% words/sec/thread:  221426 lr:  0.020118 avg.loss:  0.952249 ETA:   0h 0m 7sProgress:  80.1% words/sec/thread:  221417 lr:  0.019852 avg.loss:  0.951435 ETA:   0h 0m 7sProgress:  80.4% words/sec/thread:  221409 lr:  0.019586 avg.loss:  0.950691 ETA:   0h 0m 7sProgress:  80.7% words/sec/thread:  221404 lr:  0.019319 avg.loss:  0.950060 ETA:   0h 0m 7sProgress:  80.9% words/sec/thread:  221401 lr:  0.019051 avg.loss:  0.949213 ETA:   0h 0m 7sProgress:  81.2% words/sec/thread:  221407 lr:  0.018780 avg.loss:  0.948241 ETA:   0h 0m 6sProgress:  81.5% words/sec/thread:  221409 lr:  0.018511 avg.loss:  0.947398 ETA:   0h 0m 6sProgress:  81.8% words/sec/thread:  221417 lr:  0.018239 avg.loss:  0.946768 ETA:   0h 0m 6sProgress:  82.0% words/sec/thread:  221423 lr:  0.017968 avg.loss:  0.946049 ETA:   0h 0m 6sProgress:  82.3% words/sec/thread:  221421 lr:  0.017700 avg.loss:  0.945282 ETA:   0h 0m 6sProgress:  82.6% words/sec/thread:  221418 lr:  0.017432 avg.loss:  0.944650 ETA:   0h 0m 6sProgress:  82.8% words/sec/thread:  221421 lr:  0.017162 avg.loss:  0.944041 ETA:   0h 0m 6sProgress:  83.1% words/sec/thread:  221406 lr:  0.016899 avg.loss:  0.943537 ETA:   0h 0m 6sProgress:  83.4% words/sec/thread:  221408 lr:  0.016629 avg.loss:  0.942693 ETA:   0h 0m 6sProgress:  83.6% words/sec/thread:  221417 lr:  0.016357 avg.loss:  0.941929 ETA:   0h 0m 6sProgress:  83.9% words/sec/thread:  221424 lr:  0.016086 avg.loss:  0.941183 ETA:   0h 0m 5sProgress:  84.2% words/sec/thread:  221428 lr:  0.015815 avg.loss:  0.940435 ETA:   0h 0m 5sProgress:  84.5% words/sec/thread:  221428 lr:  0.015546 avg.loss:  0.939817 ETA:   0h 0m 5sProgress:  84.7% words/sec/thread:  221436 lr:  0.015274 avg.loss:  0.939108 ETA:   0h 0m 5sProgress:  85.0% words/sec/thread:  221430 lr:  0.015008 avg.loss:  0.938537 ETA:   0h 0m 5sProgress:  85.3% words/sec/thread:  221433 lr:  0.014738 avg.loss:  0.937980 ETA:   0h 0m 5sProgress:  85.5% words/sec/thread:  221432 lr:  0.014469 avg.loss:  0.937388 ETA:   0h 0m 5sProgress:  85.8% words/sec/thread:  221418 lr:  0.014206 avg.loss:  0.936772 ETA:   0h 0m 5sProgress:  86.1% words/sec/thread:  221414 lr:  0.013938 avg.loss:  0.936205 ETA:   0h 0m 5sProgress:  86.3% words/sec/thread:  221417 lr:  0.013668 avg.loss:  0.935575 ETA:   0h 0m 5sProgress:  86.6% words/sec/thread:  221422 lr:  0.013397 avg.loss:  0.934948 ETA:   0h 0m 4sProgress:  86.9% words/sec/thread:  221423 lr:  0.013128 avg.loss:  0.934292 ETA:   0h 0m 4sProgress:  87.1% words/sec/thread:  221422 lr:  0.012860 avg.loss:  0.933496 ETA:   0h 0m 4sProgress:  87.4% words/sec/thread:  221419 lr:  0.012592 avg.loss:  0.932864 ETA:   0h 0m 4sProgress:  87.7% words/sec/thread:  221416 lr:  0.012325 avg.loss:  0.932104 ETA:   0h 0m 4sProgress:  87.9% words/sec/thread:  221407 lr:  0.012059 avg.loss:  0.931440 ETA:   0h 0m 4sProgress:  88.2% words/sec/thread:  221405 lr:  0.011791 avg.loss:  0.930896 ETA:   0h 0m 4sProgress:  88.5% words/sec/thread:  221376 lr:  0.011534 avg.loss:  0.930285 ETA:   0h 0m 4sProgress:  88.7% words/sec/thread:  221380 lr:  0.011264 avg.loss:  0.929574 ETA:   0h 0m 4sProgress:  89.0% words/sec/thread:  221377 lr:  0.010996 avg.loss:  0.928906 ETA:   0h 0m 4sProgress:  89.3% words/sec/thread:  221370 lr:  0.010730 avg.loss:  0.928476 ETA:   0h 0m 3sProgress:  89.5% words/sec/thread:  221378 lr:  0.010458 avg.loss:  0.927969 ETA:   0h 0m 3sProgress:  89.8% words/sec/thread:  221379 lr:  0.010189 avg.loss:  0.927503 ETA:   0h 0m 3sProgress:  90.1% words/sec/thread:  221383 lr:  0.009918 avg.loss:  0.927008 ETA:   0h 0m 3sProgress:  90.4% words/sec/thread:  221388 lr:  0.009647 avg.loss:  0.926594 ETA:   0h 0m 3sProgress:  90.6% words/sec/thread:  221388 lr:  0.009379 avg.loss:  0.926166 ETA:   0h 0m 3sProgress:  90.9% words/sec/thread:  221388 lr:  0.009110 avg.loss:  0.925497 ETA:   0h 0m 3sProgress:  91.2% words/sec/thread:  221371 lr:  0.008848 avg.loss:  0.924981 ETA:   0h 0m 3sProgress:  91.4% words/sec/thread:  221377 lr:  0.008577 avg.loss:  0.924425 ETA:   0h 0m 3sProgress:  91.7% words/sec/thread:  221376 lr:  0.008308 avg.loss:  0.923912 ETA:   0h 0m 3sProgress:  92.0% words/sec/thread:  221379 lr:  0.008038 avg.loss:  0.923364 ETA:   0h 0m 2sProgress:  92.2% words/sec/thread:  221378 lr:  0.007770 avg.loss:  0.922927 ETA:   0h 0m 2sProgress:  92.5% words/sec/thread:  221381 lr:  0.007499 avg.loss:  0.922607 ETA:   0h 0m 2sProgress:  92.8% words/sec/thread:  221385 lr:  0.007229 avg.loss:  0.922185 ETA:   0h 0m 2sProgress:  93.0% words/sec/thread:  221384 lr:  0.006961 avg.loss:  0.921527 ETA:   0h 0m 2sProgress:  93.3% words/sec/thread:  221388 lr:  0.006690 avg.loss:  0.921057 ETA:   0h 0m 2sProgress:  93.6% words/sec/thread:  221390 lr:  0.006421 avg.loss:  0.920510 ETA:   0h 0m 2sProgress:  93.8% words/sec/thread:  221380 lr:  0.006156 avg.loss:  0.920028 ETA:   0h 0m 2sProgress:  94.1% words/sec/thread:  221385 lr:  0.005885 avg.loss:  0.919332 ETA:   0h 0m 2sProgress:  94.4% words/sec/thread:  221387 lr:  0.005615 avg.loss:  0.918791 ETA:   0h 0m 2sProgress:  94.7% words/sec/thread:  221385 lr:  0.005347 avg.loss:  0.918037 ETA:   0h 0m 1sProgress:  94.9% words/sec/thread:  221387 lr:  0.005077 avg.loss:  0.917514 ETA:   0h 0m 1sProgress:  95.2% words/sec/thread:  221392 lr:  0.004807 avg.loss:  0.917060 ETA:   0h 0m 1sProgress:  95.5% words/sec/thread:  221397 lr:  0.004536 avg.loss:  0.916749 ETA:   0h 0m 1sProgress:  95.7% words/sec/thread:  221402 lr:  0.004265 avg.loss:  0.916309 ETA:   0h 0m 1sProgress:  96.0% words/sec/thread:  221409 lr:  0.003993 avg.loss:  0.915791 ETA:   0h 0m 1sProgress:  96.3% words/sec/thread:  221414 lr:  0.003722 avg.loss:  0.915394 ETA:   0h 0m 1sProgress:  96.5% words/sec/thread:  221414 lr:  0.003453 avg.loss:  0.914936 ETA:   0h 0m 1sProgress:  96.8% words/sec/thread:  221416 lr:  0.003183 avg.loss:  0.914423 ETA:   0h 0m 1sProgress:  97.1% words/sec/thread:  221419 lr:  0.002913 avg.loss:  0.913833 ETA:   0h 0m 1sProgress:  97.4% words/sec/thread:  221427 lr:  0.002641 avg.loss:  0.913373 ETA:   0h 0m 0sProgress:  97.6% words/sec/thread:  221432 lr:  0.002370 avg.loss:  0.912932 ETA:   0h 0m 0sProgress:  97.9% words/sec/thread:  221440 lr:  0.002097 avg.loss:  0.912378 ETA:   0h 0m 0sProgress:  98.2% words/sec/thread:  221444 lr:  0.001827 avg.loss:  0.911868 ETA:   0h 0m 0sProgress:  98.4% words/sec/thread:  221441 lr:  0.001559 avg.loss:  0.911224 ETA:   0h 0m 0sProgress:  98.7% words/sec/thread:  221442 lr:  0.001290 avg.loss:  0.910578 ETA:   0h 0m 0sProgress:  99.0% words/sec/thread:  221444 lr:  0.001020 avg.loss:  0.910185 ETA:   0h 0m 0sProgress:  99.3% words/sec/thread:  221451 lr:  0.000748 avg.loss:  0.909581 ETA:   0h 0m 0sProgress:  99.5% words/sec/thread:  221438 lr:  0.000485 avg.loss:  0.909115 ETA:   0h 0m 0sProgress:  99.8% words/sec/thread:  221440 lr:  0.000215 avg.loss:  0.908633 ETA:   0h 0m 0sProgress: 100.0% words/sec/thread:  221321 lr: -0.000000 avg.loss:  0.908223 ETA:   0h 0m 0sProgress: 100.0% words/sec/thread:  221320 lr:  0.000000 avg.loss:  0.908223 ETA:   0h 0m 0s
Label IC Code Libellé
0 4799A 0.809980 4799A Vente à domicile
1 4791B 0.098938 4791B Vente à distance sur catalogue spécialisé
2 4789Z 0.052251 4789Z Autres commerces de détail sur éventaires et m...
3 1071C 0.992017 1071C Boulangerie et boulangerie-pâtisserie
4 1071B 0.004299 1071B Cuisson de produits de boulangerie
5 9001Z 0.002068 9001Z Arts du spectacle vivant
correct_predictions total recall
nace
4719B 17 204 0.083333
4642Z 39 270 0.144444
6832A 67 241 0.278008
4778C 130 429 0.303030
4782Z 135 309 0.436893
... ... ... ...
8623Z 394 402 0.980100
3511Z 2331 2375 0.981474
5320Z 6711 6806 0.986042
5520Z 12616 12792 0.986241
8690D 1589 1610 0.986957

127 rows × 3 columns

Pour aller plus loin, introduction au MLOps

On utilise dans cette application un modèle de Machine Learning (ML) pour prédire l’activité des entreprises à partir de texte descriptifs. Les méthodes de ML sont quasiment indispensables pour traiter du texte, mais utiliser des modèles de ML pour servir des cas d’usage réels demande de respecter un certain nombre de bonnes pratiques pour que tout se passe convenablement, en particulier:

  • Tracking propre des expérimentations
  • Versioning des modèles, en même temps que des données et du code correspondants
  • Mise à disposition efficace du modèle aux utilisateurs
  • Monitoring de l’activité du modèle servi
  • Réentraînement du modèle

Une introduction à ces bonnes pratiques, auxquelles on fait régulièrement référence à travers le terme MLOps, est donné dans cette formation (dépôt associé).