TabNet for fastai

Overview

TabNet for fastai

This is an adaptation of TabNet (Attention-based network for tabular data) for fastai (>=2.0) library. The original paper https://arxiv.org/pdf/1908.07442.pdf.

Install

pip install fast_tabnet

How to use

model = TabNetModel(emb_szs, n_cont, out_sz, embed_p=0., y_range=None, n_d=8, n_a=8, n_steps=3, gamma=1.5, n_independent=2, n_shared=2, epsilon=1e-15, virtual_batch_size=128, momentum=0.02)

Parameters emb_szs, n_cont, out_sz, embed_p, y_range are the same as for fastai TabularModel.

  • n_d : int Dimension of the prediction layer (usually between 4 and 64)
  • n_a : int Dimension of the attention layer (usually between 4 and 64)
  • n_steps: int Number of sucessive steps in the newtork (usually betwenn 3 and 10)
  • gamma : float Float above 1, scaling factor for attention updates (usually betwenn 1.0 to 2.0)
  • momentum : float Float value between 0 and 1 which will be used for momentum in all batch norm
  • n_independent : int Number of independent GLU layer in each GLU block (default 2)
  • n_shared : int Number of independent GLU layer in each GLU block (default 2)
  • epsilon: float Avoid log(0), this should be kept very low

Example

Below is an example from fastai library, but the model in use is TabNet

from fastai.basics import *
from fastai.tabular.all import *
from fast_tabnet.core import *
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
df_main,df_test = df.iloc[:-1000].copy(),df.iloc[-1000:].copy()
df_main.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country salary
0 49 Private 101320 Assoc-acdm 12.0 Married-civ-spouse NaN Wife White Female 0 1902 40 United-States >=50k
1 44 Private 236746 Masters 14.0 Divorced Exec-managerial Not-in-family White Male 10520 0 45 United-States >=50k
2 38 Private 96185 HS-grad NaN Divorced NaN Unmarried Black Female 0 0 32 United-States <50k
3 38 Self-emp-inc 112847 Prof-school 15.0 Married-civ-spouse Prof-specialty Husband Asian-Pac-Islander Male 0 0 40 United-States >=50k
4 42 Self-emp-not-inc 82297 7th-8th NaN Married-civ-spouse Other-service Wife Black Female 0 0 50 United-States <50k
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 
             'relationship', 'race', 'native-country', 'sex']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
splits = RandomSplitter()(range_of(df_main))
to = TabularPandas(df_main, procs, cat_names, cont_names, y_names="salary", 
                   y_block = CategoryBlock(), splits=splits)
dls = to.dataloaders(bs=32)
dls.valid.show_batch()
workclass education marital-status occupation relationship race native-country sex education-num_na age fnlwgt education-num salary
0 Private HS-grad Married-civ-spouse Other-service Wife White United-States Female False 39.000000 196673.000115 9.0 <50k
1 Private HS-grad Married-civ-spouse Craft-repair Husband White United-States Male False 32.000000 198067.999771 9.0 <50k
2 State-gov HS-grad Never-married Adm-clerical Own-child White United-States Female False 18.999999 176633.999931 9.0 <50k
3 Private Some-college Married-civ-spouse Prof-specialty Husband White United-States Male False 67.999999 107626.998490 10.0 <50k
4 Private Masters Never-married Exec-managerial Not-in-family Black United-States Male False 29.000000 214925.000260 14.0 <50k
5 Private HS-grad Married-civ-spouse Priv-house-serv Wife White United-States Female False 22.000000 200109.000126 9.0 <50k
6 Private Some-college Never-married Sales Own-child White United-States Female False 18.000000 60980.998429 10.0 <50k
7 Private Some-college Separated Adm-clerical Not-in-family White United-States Female False 28.000000 334367.998199 10.0 <50k
8 Private 11th Married-civ-spouse Transport-moving Husband White United-States Male False 49.000000 123584.001097 7.0 <50k
9 Private Masters Never-married Prof-specialty Not-in-family White United-States Female False 26.000000 397316.999922 14.0 <50k
to_tst = to.new(df_test)
to_tst.process()
to_tst.all_cols.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
workclass education marital-status occupation relationship race native-country sex education-num_na age fnlwgt education-num salary
31561 5 2 5 9 3 3 40 2 1 -1.505833 -0.559418 -1.202170 0
31562 2 12 5 2 5 3 40 1 1 -1.432653 0.421241 -0.418032 0
31563 5 7 3 4 1 5 40 2 1 -0.115406 0.132868 -1.986307 0
31564 8 12 3 9 1 5 40 2 1 1.494561 0.749805 -0.418032 0
31565 1 12 1 1 5 3 40 2 1 -0.481308 7.529798 -0.418032 0
emb_szs = get_emb_sz(to)

That's the use of the model

model = TabNetModel(emb_szs, len(to.cont_names), dls.c, n_d=8, n_a=8, n_steps=5, mask_type='entmax');
learn = Learner(dls, model, CrossEntropyLossFlat(), opt_func=Adam, lr=3e-2, metrics=[accuracy])
learn.lr_find()
SuggestedLRs(lr_min=0.2754228591918945, lr_steep=1.9054607491852948e-06)

png

learn.fit_one_cycle(5)
epoch train_loss valid_loss accuracy time
0 0.446274 0.414451 0.817015 00:30
1 0.380002 0.393030 0.818916 00:30
2 0.371149 0.359802 0.832066 00:30
3 0.349027 0.352255 0.835868 00:30
4 0.355339 0.349360 0.836819 00:30

Tabnet interpretability

# feature importance for 2k rows
dl = learn.dls.test_dl(df.iloc[:2000], bs=256)
feature_importances = tabnet_feature_importances(learn.model, dl)
# per sample interpretation
dl = learn.dls.test_dl(df.iloc[:20], bs=20)
res_explain, res_masks = tabnet_explain(learn.model, dl)
plt.xticks(rotation='vertical')
plt.bar(dl.x_names, feature_importances, color='g')
plt.show()

png

def plot_explain(masks, lbls, figsize=(12,12)):
    "Plots masks with `lbls` (`dls.x_names`)"
    fig = plt.figure(figsize=figsize)
    ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
    plt.yticks(np.arange(0, len(masks), 1.0))
    plt.xticks(np.arange(0, len(masks[0]), 1.0))
    ax.set_xticklabels(lbls, rotation=90)
    plt.ylabel('Sample Number')
    plt.xlabel('Variable')
    plt.imshow(masks)
plot_explain(res_explain, dl.x_names)

png

Hyperparameter search with Bayesian Optimization

If your dataset isn't huge you can tune hyperparameters for tabular models with Bayesian Optimization. You can optimize directly your metric using this approach if the metric is sensitive enough (in our example it is not and we use validation loss instead). Also, you should create the second validation set, because you will use the first as a training set for Bayesian Optimization.

You may need to install the optimizer pip install bayesian-optimization

from functools import lru_cache
# The function we'll optimize
@lru_cache(1000)
def get_accuracy(n_d:Int, n_a:Int, n_steps:Int):
    model = TabNetModel(emb_szs, len(to.cont_names), dls.c, n_d=n_d, n_a=n_a, n_steps=n_steps, gamma=1.5)
    learn = Learner(dls, model, CrossEntropyLossFlat(), opt_func=opt_func, lr=3e-2, metrics=[accuracy])
    learn.fit_one_cycle(5)
    return float(learn.validate(dl=learn.dls.valid)[1])

This implementation of Bayesian Optimization doesn't work naturally with descreet values. That's why we use wrapper with lru_cache.

def fit_accuracy(pow_n_d, pow_n_a, pow_n_steps):
    n_d, n_a, n_steps = map(lambda x: 2**int(x), (pow_n_d, pow_n_a, pow_n_steps))
    return get_accuracy(n_d, n_a, n_steps)
from bayes_opt import BayesianOptimization

# Bounded region of parameter space
pbounds = {'pow_n_d': (0, 8), 'pow_n_a': (0, 8), 'pow_n_steps': (0, 4)}

optimizer = BayesianOptimization(
    f=fit_accuracy,
    pbounds=pbounds,
)
optimizer.maximize(
    init_points=15,
    n_iter=100,
)
|   iter    |  target   |  pow_n_a  |  pow_n_d  | pow_n_... |
-------------------------------------------------------------
epoch train_loss valid_loss accuracy time
0 0.404888 0.432834 0.793885 00:10
1 0.367979 0.384840 0.818600 00:09
2 0.366444 0.372005 0.819708 00:09
3 0.362771 0.366949 0.823511 00:10
4 0.353682 0.367132 0.823511 00:10
| �[0m 1       �[0m | �[0m 0.8235  �[0m | �[0m 0.9408  �[0m | �[0m 1.898   �[0m | �[0m 1.652   �[0m |
epoch train_loss valid_loss accuracy time
0 0.393301 0.449742 0.810836 00:08
1 0.379140 0.413773 0.815589 00:07
2 0.355790 0.388907 0.822560 00:07
3 0.349984 0.362671 0.828739 00:07
4 0.348000 0.360150 0.827313 00:07
| �[95m 2       �[0m | �[95m 0.8273  �[0m | �[95m 4.262   �[0m | �[95m 5.604   �[0m | �[95m 0.2437  �[0m |
epoch train_loss valid_loss accuracy time
0 0.451572 0.434189 0.781210 00:12
1 0.423763 0.413420 0.805450 00:12
2 0.398922 0.408688 0.814164 00:12
3 0.390981 0.392398 0.808935 00:12
4 0.376418 0.382250 0.817174 00:12
| �[0m 3       �[0m | �[0m 0.8172  �[0m | �[0m 7.233   �[0m | �[0m 6.471   �[0m | �[0m 2.508   �[0m |
epoch train_loss valid_loss accuracy time
0 0.403187 0.413986 0.798162 00:07
1 0.398544 0.390102 0.820184 00:07
2 0.390569 0.389703 0.825253 00:07
3 0.375426 0.385706 0.826996 00:07
4 0.370446 0.383366 0.831115 00:06
| �[95m 4       �[0m | �[95m 0.8311  �[0m | �[95m 5.935   �[0m | �[95m 1.241   �[0m | �[95m 0.3809  �[0m |
epoch train_loss valid_loss accuracy time
0 0.464145 0.458641 0.751267 00:18
1 0.424691 0.436968 0.788023 00:18
2 0.431576 0.436581 0.775824 00:18
3 0.432143 0.437062 0.759506 00:18
4 0.429915 0.438332 0.758555 00:18
| �[0m 5       �[0m | �[0m 0.7586  �[0m | �[0m 2.554   �[0m | �[0m 0.4992  �[0m | �[0m 3.111   �[0m |
epoch train_loss valid_loss accuracy time
0 0.470359 0.475826 0.748891 00:12
1 0.411564 0.409433 0.797053 00:12
2 0.392718 0.397363 0.809727 00:12
3 0.387564 0.380033 0.814322 00:12
4 0.374153 0.378258 0.818916 00:12
| �[0m 6       �[0m | �[0m 0.8189  �[0m | �[0m 4.592   �[0m | �[0m 2.138   �[0m | �[0m 2.824   �[0m |
epoch train_loss valid_loss accuracy time
0 0.547042 0.588752 0.754119 00:18
1 0.491731 0.469795 0.771863 00:18
2 0.454340 0.433961 0.775190 00:18
3 0.424386 0.432385 0.782953 00:18
4 0.397645 0.406420 0.805767 00:19
| �[0m 7       �[0m | �[0m 0.8058  �[0m | �[0m 6.186   �[0m | �[0m 7.016   �[0m | �[0m 3.316   �[0m |
epoch train_loss valid_loss accuracy time
0 0.485245 0.487635 0.751109 00:18
1 0.450832 0.446423 0.750317 00:18
2 0.448203 0.449419 0.755228 00:18
3 0.430258 0.443562 0.744297 00:18
4 0.429821 0.437173 0.761565 00:18
| �[0m 8       �[0m | �[0m 0.7616  �[0m | �[0m 2.018   �[0m | �[0m 1.316   �[0m | �[0m 3.675   �[0m |
epoch train_loss valid_loss accuracy time
0 0.458425 0.455733 0.751584 00:12
1 0.439781 0.467807 0.751109 00:12
2 0.420331 0.432216 0.775190 00:12
3 0.421012 0.421412 0.782319 00:12
4 0.401828 0.413434 0.801014 00:12
| �[0m 9       �[0m | �[0m 0.801   �[0m | �[0m 2.051   �[0m | �[0m 1.958   �[0m | �[0m 2.332   �[0m |
epoch train_loss valid_loss accuracy time
0 0.546997 0.506728 0.761407 00:18
1 0.489712 0.439324 0.799588 00:18
2 0.448558 0.448419 0.786122 00:18
3 0.436869 0.435375 0.801648 00:18
4 0.417128 0.421093 0.798321 00:18
| �[0m 10      �[0m | �[0m 0.7983  �[0m | �[0m 5.203   �[0m | �[0m 7.719   �[0m | �[0m 3.407   �[0m |
epoch train_loss valid_loss accuracy time
0 0.380781 0.463409 0.786439 00:07
1 0.359212 0.461147 0.798321 00:07
2 0.351414 0.368950 0.822719 00:07
3 0.347257 0.367056 0.829373 00:07
4 0.337212 0.362375 0.830799 00:07
| �[0m 11      �[0m | �[0m 0.8308  �[0m | �[0m 6.048   �[0m | �[0m 4.376   �[0m | �[0m 0.08141 �[0m |
epoch train_loss valid_loss accuracy time
0 0.430772 0.430897 0.767744 00:12
1 0.402611 0.432137 0.764259 00:12
2 0.407579 0.409651 0.812104 00:12
3 0.374988 0.391822 0.816698 00:12
4 0.378011 0.389278 0.816065 00:12
| �[0m 12      �[0m | �[0m 0.8161  �[0m | �[0m 7.083   �[0m | �[0m 1.385   �[0m | �[0m 2.806   �[0m |
epoch train_loss valid_loss accuracy time
0 0.402018 0.412051 0.812262 00:09
1 0.372804 0.464937 0.811629 00:09
2 0.368274 0.384675 0.820184 00:09
3 0.364502 0.371920 0.820659 00:09
4 0.348998 0.369445 0.823828 00:09
| �[0m 13      �[0m | �[0m 0.8238  �[0m | �[0m 4.812   �[0m | �[0m 3.785   �[0m | �[0m 1.396   �[0m |
| �[0m 14      �[0m | �[0m 0.8172  �[0m | �[0m 7.672   �[0m | �[0m 6.719   �[0m | �[0m 2.72    �[0m |
epoch train_loss valid_loss accuracy time
0 0.476033 0.442598 0.803549 00:12
1 0.405236 0.414015 0.788973 00:11
2 0.406291 0.451269 0.789449 00:11
3 0.391013 0.393243 0.816065 00:12
4 0.374160 0.377635 0.821451 00:12
| �[0m 15      �[0m | �[0m 0.8215  �[0m | �[0m 6.464   �[0m | �[0m 7.954   �[0m | �[0m 2.647   �[0m |
epoch train_loss valid_loss accuracy time
0 0.390142 0.390678 0.810995 00:06
1 0.381717 0.382202 0.813055 00:06
2 0.368564 0.378705 0.823828 00:06
3 0.358858 0.368329 0.823511 00:07
4 0.353392 0.363913 0.825887 00:06
| �[0m 16      �[0m | �[0m 0.8259  �[0m | �[0m 0.1229  �[0m | �[0m 7.83    �[0m | �[0m 0.3708  �[0m |
epoch train_loss valid_loss accuracy time
0 0.381215 0.422651 0.800697 00:06
1 0.377345 0.380863 0.815906 00:06
2 0.366631 0.370579 0.822877 00:06
3 0.362745 0.366619 0.823352 00:07
4 0.356861 0.364835 0.825887 00:07
| �[0m 17      �[0m | �[0m 0.8259  �[0m | �[0m 0.03098 �[0m | �[0m 3.326   �[0m | �[0m 0.007025�[0m |
epoch train_loss valid_loss accuracy time
0 0.404604 0.443035 0.824461 00:07
1 0.361872 0.388880 0.823669 00:06
2 0.375164 0.369968 0.825095 00:06
3 0.352091 0.363823 0.827947 00:06
4 0.335458 0.362544 0.829373 00:07
| �[0m 18      �[0m | �[0m 0.8294  �[0m | �[0m 7.81    �[0m | �[0m 7.976   �[0m | �[0m 0.0194  �[0m |
epoch train_loss valid_loss accuracy time
0 0.679292 0.677299 0.248891 00:05
1 0.675403 0.678406 0.248891 00:05
2 0.673259 0.673374 0.248891 00:06
3 0.674996 0.673514 0.248891 00:07
4 0.668813 0.673671 0.248891 00:07
| �[0m 19      �[0m | �[0m 0.2489  �[0m | �[0m 0.4499  �[0m | �[0m 0.138   �[0m | �[0m 0.001101�[0m |
epoch train_loss valid_loss accuracy time
0 0.524201 0.528132 0.729880 00:30
1 0.419377 0.403198 0.812104 00:31
2 0.399398 0.418890 0.812421 00:31
3 0.381651 0.391744 0.819075 00:31
4 0.368742 0.377904 0.822085 00:31
| �[0m 20      �[0m | �[0m 0.8221  �[0m | �[0m 0.0     �[0m | �[0m 6.575   �[0m | �[0m 4.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.681083 0.682397 0.248891 00:05
1 0.672935 0.679371 0.248891 00:06
2 0.675200 0.673466 0.248891 00:06
3 0.674251 0.673356 0.248891 00:06
4 0.668861 0.673186 0.248891 00:06
| �[0m 21      �[0m | �[0m 0.2489  �[0m | �[0m 8.0     �[0m | �[0m 0.0     �[0m | �[0m 0.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.407246 0.432203 0.801331 00:10
1 0.385086 0.399513 0.811312 00:11
2 0.377365 0.384121 0.816065 00:12
3 0.366855 0.371010 0.823194 00:12
4 0.361931 0.368933 0.825095 00:12
| �[0m 22      �[0m | �[0m 0.8251  �[0m | �[0m 0.0     �[0m | �[0m 4.502   �[0m | �[0m 2.193   �[0m |
epoch train_loss valid_loss accuracy time
0 0.493623 0.476921 0.766001 00:30
1 0.441126 0.443774 0.776774 00:31
2 0.424523 0.437125 0.783904 00:31
3 0.402457 0.408628 0.795944 00:31
4 0.439420 0.431756 0.788973 00:32
| �[0m 23      �[0m | �[0m 0.789   �[0m | �[0m 8.0     �[0m | �[0m 3.702   �[0m | �[0m 4.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.515615 0.513919 0.751109 00:31
1 0.462674 0.495322 0.751584 00:31
2 0.465430 0.483685 0.751267 00:31
3 0.481308 0.495375 0.755070 00:31
4 0.481324 0.491275 0.754911 00:32
| �[0m 24      �[0m | �[0m 0.7549  �[0m | �[0m 6.009   �[0m | �[0m 0.0     �[0m | �[0m 4.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.422837 0.403953 0.819392 00:06
1 0.380753 0.367345 0.826838 00:06
2 0.353045 0.365174 0.830006 00:07
3 0.348628 0.364282 0.826362 00:07
4 0.343561 0.361509 0.829214 00:07
| �[0m 25      �[0m | �[0m 0.8292  �[0m | �[0m 3.522   �[0m | �[0m 8.0     �[0m | �[0m 0.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.807766 1.307279 0.481622 00:31
1 0.513308 0.499470 0.783587 00:32
2 0.445906 0.492620 0.798004 00:31
3 0.385094 0.399986 0.807509 00:32
4 0.387228 0.384739 0.817015 00:31
| �[0m 26      �[0m | �[0m 0.817   �[0m | �[0m 0.0     �[0m | �[0m 8.0     �[0m | �[0m 4.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.442076 0.491338 0.755387 00:31
1 0.441078 0.443674 0.760773 00:31
2 0.417575 0.418758 0.792142 00:31
3 0.410825 0.417581 0.788498 00:34
4 0.403407 0.410941 0.798321 00:46
| �[0m 27      �[0m | �[0m 0.7983  �[0m | �[0m 0.0     �[0m | �[0m 0.0     �[0m | �[0m 4.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.407006 0.419679 0.792142 00:08
1 0.390913 0.392631 0.810520 00:08
2 0.365560 0.394330 0.817491 00:08
3 0.378459 0.387244 0.820659 00:08
4 0.375275 0.385417 0.828897 00:08
| �[0m 28      �[0m | �[0m 0.8289  �[0m | �[0m 3.379   �[0m | �[0m 2.848   �[0m | �[0m 0.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.430604 0.469592 0.781210 00:45
1 0.423074 0.429704 0.797529 00:45
2 0.400120 0.393398 0.810995 00:45
3 0.382361 0.390651 0.816065 00:46
4 0.389520 0.401878 0.807193 00:46
| �[0m 29      �[0m | �[0m 0.8072  �[0m | �[0m 0.0     �[0m | �[0m 2.588   �[0m | �[0m 4.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.396348 0.397454 0.806717 00:08
1 0.383342 0.386023 0.819550 00:07
2 0.369493 0.374401 0.820025 00:07
3 0.356015 0.366535 0.826204 00:08
4 0.341073 0.365241 0.826204 00:08
| �[0m 30      �[0m | �[0m 0.8262  �[0m | �[0m 1.217   �[0m | �[0m 5.622   �[0m | �[0m 0.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.809077 0.480591 0.782795 00:45
1 0.571318 0.497731 0.739068 00:45
2 0.514562 0.461726 0.781527 00:45
3 0.439822 0.451722 0.787231 00:44
4 0.419881 0.422125 0.801648 00:45
| �[0m 31      �[0m | �[0m 0.8016  �[0m | �[0m 8.0     �[0m | �[0m 8.0     �[0m | �[0m 4.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.410521 0.435045 0.810044 00:08
1 0.363098 0.378001 0.821926 00:08
2 0.359525 0.364477 0.827788 00:08
3 0.354005 0.366507 0.821610 00:08
4 0.347293 0.362657 0.829373 00:08
| �[0m 32      �[0m | �[0m 0.8294  �[0m | �[0m 5.864   �[0m | �[0m 8.0     �[0m | �[0m 0.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.498376 0.436696 0.794043 00:16
1 0.411699 0.435537 0.801331 00:16
2 0.385327 0.396916 0.820184 00:16
3 0.382020 0.389856 0.813371 00:16
4 0.373869 0.377804 0.820817 00:15
| �[0m 33      �[0m | �[0m 0.8208  �[0m | �[0m 1.776   �[0m | �[0m 8.0     �[0m | �[0m 2.212   �[0m |
epoch train_loss valid_loss accuracy time
0 0.404653 0.440106 0.772180 00:11
1 0.377931 0.393715 0.817332 00:11
2 0.373221 0.379273 0.826838 00:11
3 0.359682 0.362844 0.828422 00:11
4 0.340384 0.363072 0.828897 00:11
| �[0m 34      �[0m | �[0m 0.8289  �[0m | �[0m 5.777   �[0m | �[0m 2.2     �[0m | �[0m 1.31    �[0m |
epoch train_loss valid_loss accuracy time
0 0.520308 0.503207 0.749208 00:45
1 0.472501 0.451469 0.780418 00:45
2 0.454686 0.429175 0.784854 00:45
3 0.400800 0.413727 0.795469 00:44
4 0.405604 0.409770 0.801648 00:45
| �[0m 35      �[0m | �[0m 0.8016  �[0m | �[0m 2.748   �[0m | �[0m 5.915   �[0m | �[0m 4.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.504537 0.501541 0.750317 00:45
1 0.465937 0.477715 0.773289 00:45
2 0.435364 0.481415 0.766635 00:45
3 0.425434 0.442198 0.772814 00:45
4 0.425779 0.458947 0.771863 00:45
| �[0m 36      �[0m | �[0m 0.7719  �[0m | �[0m 6.251   �[0m | �[0m 2.532   �[0m | �[0m 4.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.420782 0.420721 0.791350 00:10
1 0.403576 0.408376 0.800222 00:10
2 0.390236 0.393624 0.820342 00:11
3 0.377777 0.389657 0.821610 00:11
4 0.382809 0.386011 0.820976 00:11
| �[0m 37      �[0m | �[0m 0.821   �[0m | �[0m 5.093   �[0m | �[0m 0.172   �[0m | �[0m 1.64    �[0m |
epoch train_loss valid_loss accuracy time
0 0.393575 0.397811 0.812262 00:08
1 0.378272 0.381915 0.815748 00:08
2 0.364799 0.369214 0.824620 00:08
3 0.355757 0.364554 0.826996 00:08
4 0.342090 0.362723 0.824303 00:08
| �[0m 38      �[0m | �[0m 0.8243  �[0m | �[0m 8.0     �[0m | �[0m 5.799   �[0m | �[0m 0.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.393693 0.396980 0.822085 00:11
1 0.361231 0.393146 0.813847 00:11
2 0.345645 0.379510 0.823986 00:11
3 0.349778 0.367077 0.826679 00:11
4 0.342390 0.362027 0.827788 00:11
| �[0m 39      �[0m | �[0m 0.8278  �[0m | �[0m 1.62    �[0m | �[0m 3.832   �[0m | �[0m 1.151   �[0m |
epoch train_loss valid_loss accuracy time
0 0.832737 0.491002 0.771546 00:43
1 0.627948 0.553552 0.764734 00:43
2 0.498901 0.467162 0.791984 00:46
3 0.431196 0.444576 0.785646 00:43
4 0.399745 0.427060 0.796578 00:42
| �[0m 40      �[0m | �[0m 0.7966  �[0m | �[0m 2.198   �[0m | �[0m 8.0     �[0m | �[0m 4.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.511301 0.514401 0.751267 00:43
1 0.447332 0.445157 0.751109 00:43
2 0.451125 0.438327 0.750951 00:42
3 0.445883 0.443266 0.751267 00:42
4 0.444816 0.438459 0.764100 00:42
| �[0m 41      �[0m | �[0m 0.7641  �[0m | �[0m 8.0     �[0m | �[0m 1.03    �[0m | �[0m 4.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.408504 0.413275 0.797212 00:15
1 0.392707 0.399085 0.805767 00:15
2 0.379938 0.395550 0.817807 00:15
3 0.375288 0.383186 0.820817 00:15
4 0.360417 0.375098 0.823194 00:16
| �[0m 42      �[0m | �[0m 0.8232  �[0m | �[0m 0.0     �[0m | �[0m 2.504   �[0m | �[0m 2.135   �[0m |
epoch train_loss valid_loss accuracy time
0 0.399371 0.415196 0.801014 00:07
1 0.367804 0.392020 0.810995 00:06
2 0.362288 0.385124 0.820659 00:07
3 0.344728 0.371339 0.823669 00:07
4 0.345769 0.362059 0.829373 00:07
| �[0m 43      �[0m | �[0m 0.8294  �[0m | �[0m 0.0     �[0m | �[0m 5.441   �[0m | �[0m 0.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.397157 0.431003 0.803866 00:06
1 0.394964 0.396448 0.810361 00:06
2 0.378584 0.387943 0.820659 00:07
3 0.371601 0.386186 0.818283 00:07
4 0.369759 0.384339 0.827630 00:07
| �[0m 44      �[0m | �[0m 0.8276  �[0m | �[0m 4.636   �[0m | �[0m 1.476   �[0m | �[0m 0.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.408654 0.426806 0.791191 00:12
1 0.394184 0.406586 0.786439 00:12
2 0.369625 0.372680 0.822560 00:12
3 0.349444 0.368142 0.823828 00:12
4 0.351684 0.363406 0.826204 00:12
| �[0m 45      �[0m | �[0m 0.8262  �[0m | �[0m 0.0     �[0m | �[0m 7.071   �[0m | �[0m 2.071   �[0m |
epoch train_loss valid_loss accuracy time
0 0.400293 0.416098 0.811629 00:08
1 0.377387 0.433395 0.807034 00:08
2 0.368131 0.395448 0.796420 00:08
3 0.367750 0.376879 0.817174 00:08
4 0.362124 0.371432 0.821134 00:08
| �[0m 46      �[0m | �[0m 0.8211  �[0m | �[0m 4.26    �[0m | �[0m 6.934   �[0m | �[0m 1.79    �[0m |
epoch train_loss valid_loss accuracy time
0 0.404579 0.437443 0.814797 00:07
1 0.375342 0.380416 0.824937 00:07
2 0.365835 0.377617 0.812738 00:07
3 0.354619 0.364503 0.827471 00:07
4 0.340603 0.363488 0.827947 00:07
| �[0m 47      �[0m | �[0m 0.8279  �[0m | �[0m 6.579   �[0m | �[0m 6.485   �[0m | �[0m 0.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.384890 0.440342 0.812579 00:08
1 0.371483 0.387200 0.813847 00:09
2 0.365951 0.378071 0.818283 00:09
3 0.362965 0.369994 0.821610 00:09
4 0.356483 0.365151 0.826521 00:09
| �[0m 48      �[0m | �[0m 0.8265  �[0m | �[0m 8.0     �[0m | �[0m 4.293   �[0m | �[0m 1.74    �[0m |
epoch train_loss valid_loss accuracy time
0 0.386308 0.389250 0.815431 00:08
1 0.368402 0.389338 0.814956 00:09
2 0.362211 0.377196 0.824778 00:09
3 0.356135 0.362951 0.829531 00:09
4 0.341577 0.362476 0.830799 00:09
| �[0m 49      �[0m | �[0m 0.8308  �[0m | �[0m 7.909   �[0m | �[0m 7.827   �[0m | �[0m 1.323   �[0m |
epoch train_loss valid_loss accuracy time
0 0.426044 0.422882 0.791984 00:08
1 0.375756 0.381810 0.817491 00:09
2 0.363932 0.375904 0.818916 00:09
3 0.349442 0.365052 0.823986 00:09
4 0.344509 0.363027 0.830323 00:09
| �[0m 50      �[0m | �[0m 0.8303  �[0m | �[0m 4.946   �[0m | �[0m 1.246   �[0m | �[0m 1.589   �[0m |
epoch train_loss valid_loss accuracy time
0 0.388522 0.431909 0.820976 00:07
1 0.372532 0.448644 0.751109 00:07
2 0.358823 0.373322 0.823669 00:07
3 0.352838 0.362424 0.831591 00:07
4 0.352949 0.361356 0.831432 00:07
| �[95m 51      �[0m | �[95m 0.8314  �[0m | �[95m 5.664   �[0m | �[95m 2.626   �[0m | �[95m 0.003048�[0m |
epoch train_loss valid_loss accuracy time
0 0.389195 0.390032 0.817332 00:06
1 0.369993 0.382199 0.819708 00:07
2 0.362801 0.373282 0.826521 00:06
3 0.359760 0.363597 0.824303 00:06
4 0.344525 0.362097 0.828897 00:07
| �[0m 52      �[0m | �[0m 0.8289  �[0m | �[0m 1.287   �[0m | �[0m 3.505   �[0m | �[0m 0.06804 �[0m |
epoch train_loss valid_loss accuracy time
0 0.394940 0.403165 0.814639 00:06
1 0.371518 0.452118 0.806876 00:06
2 0.364734 0.377214 0.824461 00:06
3 0.347968 0.365335 0.823511 00:07
4 0.345476 0.363670 0.827155 00:07
| �[0m 53      �[0m | �[0m 0.8272  �[0m | �[0m 1.606   �[0m | �[0m 7.998   �[0m | �[0m 0.2009  �[0m |
epoch train_loss valid_loss accuracy time
0 0.500322 0.519261 0.757129 00:10
1 0.413270 0.423630 0.801965 00:11
2 0.380234 0.395588 0.813371 00:12
3 0.361677 0.378123 0.817174 00:12
4 0.374629 0.373772 0.820025 00:12
| �[0m 54      �[0m | �[0m 0.82    �[0m | �[0m 4.579   �[0m | �[0m 5.017   �[0m | �[0m 2.928   �[0m |
| �[0m 55      �[0m | �[0m 0.8259  �[0m | �[0m 0.02565 �[0m | �[0m 3.699   �[0m | �[0m 0.9808  �[0m |
epoch train_loss valid_loss accuracy time
0 0.452787 0.443697 0.768695 00:11
1 0.428332 0.415454 0.800697 00:11
2 0.396522 0.402850 0.807668 00:12
3 0.424802 0.414648 0.783587 00:12
4 0.385055 0.392359 0.801489 00:12
| �[0m 56      �[0m | �[0m 0.8015  �[0m | �[0m 1.927   �[0m | �[0m 5.92    �[0m | �[0m 2.53    �[0m |
epoch train_loss valid_loss accuracy time
0 0.435597 0.438222 0.810836 00:19
1 0.399920 0.531189 0.770754 00:19
2 0.403408 0.409382 0.804816 00:18
3 0.363519 0.383823 0.815906 00:19
4 0.360030 0.377621 0.819708 00:19
| �[0m 57      �[0m | �[0m 0.8197  �[0m | �[0m 0.7796  �[0m | �[0m 4.576   �[0m | �[0m 3.952   �[0m |
epoch train_loss valid_loss accuracy time
0 0.388445 0.420243 0.800539 00:07
1 0.372912 0.369659 0.827630 00:07
2 0.354443 0.366757 0.828105 00:07
3 0.352468 0.366038 0.822560 00:07
4 0.347822 0.362001 0.829690 00:07
| �[0m 58      �[0m | �[0m 0.8297  �[0m | �[0m 3.525   �[0m | �[0m 4.198   �[0m | �[0m 0.02314 �[0m |
epoch train_loss valid_loss accuracy time
0 0.444627 0.431739 0.787072 00:12
1 0.392637 0.412985 0.799747 00:12
2 0.369733 0.396133 0.802440 00:12
3 0.365821 0.373095 0.820342 00:12
4 0.370486 0.371560 0.819392 00:12
| �[0m 59      �[0m | �[0m 0.8194  �[0m | �[0m 6.711   �[0m | �[0m 3.848   �[0m | �[0m 2.395   �[0m |
epoch train_loss valid_loss accuracy time
0 0.396831 0.389045 0.809569 00:07
1 0.371171 0.375065 0.818600 00:07
2 0.350309 0.371795 0.824620 00:07
3 0.359700 0.363041 0.828739 00:07
4 0.345735 0.361556 0.830799 00:07
| �[0m 60      �[0m | �[0m 0.8308  �[0m | �[0m 4.914   �[0m | �[0m 7.944   �[0m | �[0m 0.9998  �[0m |
epoch train_loss valid_loss accuracy time
0 0.422853 0.412691 0.804341 00:09
1 0.375209 0.394692 0.817174 00:09
2 0.365574 0.380376 0.820184 00:08
3 0.359143 0.363607 0.831115 00:08
4 0.347991 0.361650 0.827947 00:08
| �[0m 61      �[0m | �[0m 0.8279  �[0m | �[0m 7.962   �[0m | �[0m 6.151   �[0m | �[0m 1.119   �[0m |
epoch train_loss valid_loss accuracy time
0 0.388147 0.405885 0.810678 00:07
1 0.367743 0.391867 0.807826 00:07
2 0.366964 0.362980 0.828739 00:07
3 0.363402 0.363396 0.829531 00:07
4 0.351094 0.362245 0.829214 00:07
| �[0m 62      �[0m | �[0m 0.8292  �[0m | �[0m 2.583   �[0m | �[0m 6.996   �[0m | �[0m 0.008348�[0m |
epoch train_loss valid_loss accuracy time
0 0.432724 0.390516 0.815114 00:11
1 0.407100 0.401564 0.811153 00:11
2 0.359414 0.384463 0.820342 00:11
3 0.358061 0.371844 0.826362 00:12
4 0.345357 0.362986 0.831115 00:12
| �[0m 63      �[0m | �[0m 0.8311  �[0m | �[0m 0.2249  �[0m | �[0m 8.0     �[0m | �[0m 2.823   �[0m |
epoch train_loss valid_loss accuracy time
0 0.418990 0.420463 0.808618 00:07
1 0.389830 0.398110 0.816223 00:08
2 0.382975 0.387620 0.814956 00:06
3 0.384093 0.379607 0.819392 00:06
4 0.358019 0.371140 0.823828 00:08
| �[0m 64      �[0m | �[0m 0.8238  �[0m | �[0m 5.764   �[0m | �[0m 5.509   �[0m | �[0m 1.482   �[0m |
epoch train_loss valid_loss accuracy time
0 0.385147 0.399680 0.806242 00:09
1 0.376032 0.381131 0.822560 00:09
2 0.363870 0.378227 0.822402 00:09
3 0.351089 0.368790 0.826838 00:09
4 0.340404 0.361807 0.829214 00:09
| �[0m 65      �[0m | �[0m 0.8292  �[0m | �[0m 1.048   �[0m | �[0m 2.939   �[0m | �[0m 1.922   �[0m |
epoch train_loss valid_loss accuracy time
0 0.526628 0.507236 0.746673 00:17
1 0.460229 0.455675 0.765684 00:19
2 0.417427 0.421368 0.785963 00:19
3 0.462800 0.458844 0.773923 00:19
4 0.449479 0.456627 0.783587 00:19
| �[0m 66      �[0m | �[0m 0.7836  �[0m | �[0m 3.68    �[0m | �[0m 3.977   �[0m | �[0m 3.919   �[0m |
epoch train_loss valid_loss accuracy time
0 0.621678 0.559080 0.749049 00:10
1 0.457104 0.473610 0.758397 00:11
2 0.416287 0.416622 0.764575 00:13
3 0.388107 0.403844 0.811945 00:13
4 0.384231 0.396397 0.813055 00:13
| �[0m 67      �[0m | �[0m 0.8131  �[0m | �[0m 5.907   �[0m | �[0m 0.9452  �[0m | �[0m 2.168   �[0m |
epoch train_loss valid_loss accuracy time
0 0.410105 0.416171 0.808618 00:11
1 0.381669 0.400109 0.809094 00:13
2 0.377539 0.403879 0.803074 00:13
3 0.374653 0.389122 0.808618 00:13
4 0.366356 0.380526 0.814005 00:13
| �[0m 68      �[0m | �[0m 0.814   �[0m | �[0m 7.981   �[0m | �[0m 2.796   �[0m | �[0m 2.78    �[0m |
epoch train_loss valid_loss accuracy time
0 0.380539 0.420856 0.815589 00:07
1 0.368232 0.424276 0.803866 00:05
2 0.358194 0.378501 0.827155 00:06
3 0.353427 0.362224 0.829690 00:07
4 0.344443 0.361554 0.826838 00:07
| �[0m 69      �[0m | �[0m 0.8268  �[0m | �[0m 7.223   �[0m | �[0m 3.762   �[0m | �[0m 0.6961  �[0m |
epoch train_loss valid_loss accuracy time
0 0.445924 0.488581 0.752693 00:17
1 0.410709 0.400962 0.813688 00:18
2 0.373518 0.393235 0.820184 00:18
3 0.364160 0.378920 0.820817 00:17
4 0.357551 0.371629 0.825412 00:17
| �[0m 70      �[0m | �[0m 0.8254  �[0m | �[0m 0.009375�[0m | �[0m 5.081   �[0m | �[0m 3.79    �[0m |
epoch train_loss valid_loss accuracy time
0 0.410323 0.442670 0.814797 00:08
1 0.379279 0.405500 0.807034 00:08
2 0.387576 0.392448 0.819708 00:09
3 0.371622 0.389167 0.823035 00:09
4 0.374182 0.386964 0.825095 00:09
| �[0m 71      �[0m | �[0m 0.8251  �[0m | �[0m 3.293   �[0m | �[0m 2.76    �[0m | �[0m 1.061   �[0m |
| �[0m 72      �[0m | �[0m 0.8308  �[0m | �[0m 4.589   �[0m | �[0m 7.13    �[0m | �[0m 0.003179�[0m |
epoch train_loss valid_loss accuracy time
0 0.391509 0.399966 0.806400 00:06
1 0.366694 0.405719 0.823828 00:07
2 0.359751 0.375496 0.822877 00:07
3 0.347678 0.361711 0.830799 00:07
4 0.336896 0.361922 0.828580 00:07
| �[0m 73      �[0m | �[0m 0.8286  �[0m | �[0m 7.118   �[0m | �[0m 5.204   �[0m | �[0m 0.5939  �[0m |
epoch train_loss valid_loss accuracy time
0 0.435485 0.414883 0.808143 00:08
1 0.373591 0.417138 0.814005 00:09
2 0.369590 0.375724 0.820184 00:09
3 0.370829 0.368655 0.829531 00:09
4 0.346463 0.366307 0.825412 00:09
| �[0m 74      �[0m | �[0m 0.8254  �[0m | �[0m 6.837   �[0m | �[0m 7.988   �[0m | �[0m 1.055   �[0m |
| �[0m 75      �[0m | �[0m 0.8259  �[0m | �[0m 0.6629  �[0m | �[0m 7.012   �[0m | �[0m 0.03222 �[0m |
| �[0m 76      �[0m | �[0m 0.8311  �[0m | �[0m 5.177   �[0m | �[0m 1.457   �[0m | �[0m 0.5857  �[0m |
epoch train_loss valid_loss accuracy time
0 0.455741 0.530008 0.794994 00:11
1 0.421961 0.423317 0.805292 00:11
2 0.405799 0.405729 0.807351 00:12
3 0.383895 0.395092 0.816857 00:12
4 0.378882 0.386044 0.818758 00:12
| �[0m 77      �[0m | �[0m 0.8188  �[0m | �[0m 3.308   �[0m | �[0m 4.533   �[0m | �[0m 2.048   �[0m |
epoch train_loss valid_loss accuracy time
0 0.412261 0.410950 0.798954 00:08
1 0.382314 0.408471 0.803390 00:09
2 0.347488 0.387500 0.815273 00:09
3 0.343408 0.372050 0.821451 00:09
4 0.344963 0.366158 0.822719 00:09
| �[0m 78      �[0m | �[0m 0.8227  �[0m | �[0m 0.4036  �[0m | �[0m 7.997   �[0m | �[0m 1.439   �[0m |
epoch train_loss valid_loss accuracy time
0 0.412856 0.410016 0.799113 00:08
1 0.410852 0.416405 0.788498 00:08
2 0.373897 0.384385 0.824303 00:09
3 0.353164 0.366129 0.822719 00:09
4 0.353253 0.362269 0.826362 00:09
| �[0m 79      �[0m | �[0m 0.8264  �[0m | �[0m 3.438   �[0m | �[0m 7.982   �[0m | �[0m 1.829   �[0m |
epoch train_loss valid_loss accuracy time
0 0.419316 0.408936 0.798162 00:08
1 0.393826 0.390526 0.820184 00:08
2 0.372879 0.374823 0.822719 00:08
3 0.358019 0.370913 0.820342 00:08
4 0.346020 0.362252 0.829690 00:08
| �[0m 80      �[0m | �[0m 0.8297  �[0m | �[0m 6.88    �[0m | �[0m 2.404   �[0m | �[0m 1.666   �[0m |
epoch train_loss valid_loss accuracy time
0 0.433481 0.437320 0.790082 00:18
1 0.415280 0.402946 0.814164 00:18
2 0.365575 0.376285 0.822877 00:18
3 0.363206 0.371865 0.820501 00:18
4 0.356401 0.370252 0.823828 00:18
| �[0m 81      �[0m | �[0m 0.8238  �[0m | �[0m 0.03221 �[0m | �[0m 1.306   �[0m | �[0m 3.909   �[0m |
epoch train_loss valid_loss accuracy time
0 0.393150 0.420964 0.783745 00:07
1 0.375065 0.371380 0.823986 00:07
2 0.362952 0.387037 0.813688 00:07
3 0.347245 0.370225 0.824937 00:07
4 0.348406 0.361420 0.830640 00:07
| �[0m 82      �[0m | �[0m 0.8306  �[0m | �[0m 1.575   �[0m | �[0m 2.689   �[0m | �[0m 0.8684  �[0m |
epoch train_loss valid_loss accuracy time
0 0.395530 0.397430 0.818600 00:06
1 0.358679 0.396773 0.818283 00:07
2 0.349305 0.372877 0.823828 00:07
3 0.347346 0.363006 0.828422 00:07
4 0.335652 0.362567 0.830957 00:07
| �[0m 83      �[0m | �[0m 0.831   �[0m | �[0m 2.765   �[0m | �[0m 5.439   �[0m | �[0m 0.04047 �[0m |
epoch train_loss valid_loss accuracy time
0 0.380831 0.386072 0.814322 00:05
1 0.369778 0.392521 0.810520 00:07
2 0.368286 0.383131 0.816857 00:07
3 0.356585 0.367839 0.821768 00:07
4 0.344722 0.366639 0.825253 00:07
| �[0m 84      �[0m | �[0m 0.8253  �[0m | �[0m 0.1961  �[0m | �[0m 4.123   �[0m | �[0m 0.02039 �[0m |
epoch train_loss valid_loss accuracy time
0 0.502605 0.463703 0.772180 00:11
1 0.407231 0.404386 0.804499 00:13
2 0.406849 0.411254 0.817491 00:12
3 0.379910 0.389118 0.817174 00:12
4 0.370964 0.379835 0.821293 00:12
| �[0m 85      �[0m | �[0m 0.8213  �[0m | �[0m 7.937   �[0m | �[0m 7.939   �[0m | �[0m 2.895   �[0m |
epoch train_loss valid_loss accuracy time
0 0.402574 0.431237 0.786755 00:11
1 0.380494 0.392253 0.817966 00:11
2 0.363284 0.393907 0.815748 00:11
3 0.355680 0.368488 0.822560 00:12
4 0.362978 0.367755 0.823511 00:12
| �[0m 86      �[0m | �[0m 0.8235  �[0m | �[0m 0.06921 �[0m | �[0m 5.7     �[0m | �[0m 2.778   �[0m |
epoch train_loss valid_loss accuracy time
0 0.448572 0.451255 0.789766 00:10
1 0.417093 0.411838 0.808143 00:11
2 0.400799 0.400185 0.816223 00:12
3 0.378641 0.385082 0.820342 00:12
4 0.365278 0.380320 0.818441 00:11
| �[0m 87      �[0m | �[0m 0.8184  �[0m | �[0m 7.965   �[0m | �[0m 5.261   �[0m | �[0m 2.661   �[0m |
epoch train_loss valid_loss accuracy time
0 0.399453 0.429288 0.783112 00:07
1 0.368987 0.376985 0.825729 00:07
2 0.358226 0.372103 0.830165 00:07
3 0.348940 0.362069 0.831115 00:07
4 0.341437 0.361470 0.830482 00:07
| �[0m 88      �[0m | �[0m 0.8305  �[0m | �[0m 2.792   �[0m | �[0m 7.917   �[0m | �[0m 0.761   �[0m |
| �[0m 89      �[0m | �[0m 0.8294  �[0m | �[0m 7.995   �[0m | �[0m 7.186   �[0m | �[0m 0.1199  �[0m |
epoch train_loss valid_loss accuracy time
0 0.390914 0.403780 0.799905 00:07
1 0.375241 0.400406 0.821610 00:07
2 0.359710 0.373233 0.826046 00:07
3 0.351108 0.367255 0.823986 00:07
4 0.348230 0.362827 0.830482 00:07
| �[0m 90      �[0m | �[0m 0.8305  �[0m | �[0m 2.526   �[0m | �[0m 3.741   �[0m | �[0m 0.1186  �[0m |
| �[0m 91      �[0m | �[0m 0.8294  �[0m | �[0m 0.03285 �[0m | �[0m 5.742   �[0m | �[0m 0.9747  �[0m |
epoch train_loss valid_loss accuracy time
0 0.384485 0.413711 0.816857 00:06
1 0.358383 0.382125 0.813055 00:06
2 0.349632 0.380644 0.825570 00:07
3 0.350005 0.363857 0.833492 00:07
4 0.341342 0.362533 0.829848 00:07
| �[0m 92      �[0m | �[0m 0.8298  �[0m | �[0m 4.112e-0�[0m | �[0m 8.0     �[0m | �[0m 0.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.461806 0.454314 0.752218 00:11
1 0.429429 0.452721 0.768061 00:13
2 0.408792 0.422315 0.785013 00:12
3 0.400290 0.411134 0.805292 00:12
4 0.396974 0.409337 0.804658 00:12
| �[0m 93      �[0m | �[0m 0.8047  �[0m | �[0m 4.563   �[0m | �[0m 0.6868  �[0m | �[0m 2.461   �[0m |
epoch train_loss valid_loss accuracy time
0 0.464920 0.440214 0.772655 00:11
1 0.401611 0.400240 0.805767 00:11
2 0.391905 0.414512 0.798638 00:13
3 0.391561 0.407271 0.805608 00:13
4 0.387596 0.397586 0.814639 00:13
| �[0m 94      �[0m | �[0m 0.8146  �[0m | �[0m 4.697   �[0m | �[0m 3.412   �[0m | �[0m 2.514   �[0m |
epoch train_loss valid_loss accuracy time
0 0.417917 0.423588 0.755070 00:05
1 0.390158 0.396916 0.817649 00:07
2 0.374562 0.389764 0.828422 00:07
3 0.368317 0.385268 0.822719 00:07
4 0.373367 0.385092 0.827471 00:07
| �[0m 95      �[0m | �[0m 0.8275  �[0m | �[0m 5.04    �[0m | �[0m 0.4492  �[0m | �[0m 0.5899  �[0m |
epoch train_loss valid_loss accuracy time
0 0.368991 0.388251 0.822402 00:07
1 0.363111 0.399670 0.814797 00:08
2 0.370390 0.377726 0.818441 00:10
3 0.353493 0.368603 0.823828 00:10
4 0.355493 0.368281 0.823669 00:09
| �[0m 96      �[0m | �[0m 0.8237  �[0m | �[0m 0.6025  �[0m | �[0m 2.712   �[0m | �[0m 1.166   �[0m |
epoch train_loss valid_loss accuracy time
0 0.426161 0.417567 0.793726 00:07
1 0.397633 0.392852 0.818124 00:08
2 0.371073 0.371678 0.821768 00:09
3 0.351716 0.361607 0.831749 00:10
4 0.347291 0.359881 0.832066 00:09
| �[95m 97      �[0m | �[95m 0.8321  �[0m | �[95m 6.389   �[0m | �[95m 3.648   �[0m | �[95m 1.016   �[0m |
epoch train_loss valid_loss accuracy time
0 0.408937 0.440755 0.785805 00:12
1 0.369800 0.379622 0.819867 00:12
2 0.356048 0.370119 0.827630 00:11
3 0.354271 0.366091 0.826362 00:11
4 0.359324 0.366020 0.827313 00:12
| �[0m 98      �[0m | �[0m 0.8273  �[0m | �[0m 0.5927  �[0m | �[0m 1.715   �[0m | �[0m 2.847   �[0m |
epoch train_loss valid_loss accuracy time
0 0.443831 0.469775 0.753802 00:11
1 0.411636 0.423362 0.787864 00:11
2 0.411669 0.421874 0.797529 00:13
3 0.392242 0.395377 0.816382 00:12
4 0.383947 0.390270 0.818441 00:12
| �[0m 99      �[0m | �[0m 0.8184  �[0m | �[0m 6.602   �[0m | �[0m 2.266   �[0m | �[0m 2.535   �[0m |
epoch train_loss valid_loss accuracy time
0 0.413688 0.469193 0.811153 00:08
1 0.383908 0.401520 0.815589 00:08
2 0.363908 0.375178 0.824937 00:08
3 0.360694 0.366536 0.828739 00:08
4 0.341851 0.362886 0.830165 00:08
| �[0m 100     �[0m | �[0m 0.8302  �[0m | �[0m 3.099   �[0m | �[0m 6.058   �[0m | �[0m 1.058   �[0m |
epoch train_loss valid_loss accuracy time
0 0.404324 0.402310 0.807668 00:09
1 0.383628 0.428126 0.780577 00:09
2 0.360917 0.375198 0.826996 00:09
3 0.349922 0.367114 0.828264 00:09
4 0.338715 0.363762 0.830165 00:09
| �[0m 101     �[0m | �[0m 0.8302  �[0m | �[0m 4.268   �[0m | �[0m 5.28    �[0m | �[0m 1.397   �[0m |
epoch train_loss valid_loss accuracy time
0 0.381317 0.410546 0.815431 00:06
1 0.355632 0.424577 0.802440 00:06
2 0.361193 0.377675 0.818124 00:07
3 0.355751 0.363205 0.827313 00:07
4 0.338802 0.362403 0.827471 00:07
| �[0m 102     �[0m | �[0m 0.8275  �[0m | �[0m 4.735   �[0m | �[0m 3.398   �[0m | �[0m 0.01904 �[0m |
epoch train_loss valid_loss accuracy time
0 0.381150 0.398499 0.813213 00:07
1 0.370997 0.413040 0.798162 00:07
2 0.364154 0.369066 0.823352 00:07
3 0.354434 0.362925 0.825095 00:07
4 0.348736 0.362125 0.824461 00:07
| �[0m 103     �[0m | �[0m 0.8245  �[0m | �[0m 0.0     �[0m | �[0m 6.371   �[0m | �[0m 9.721e-0�[0m |
epoch train_loss valid_loss accuracy time
0 0.388378 0.412839 0.810361 00:06
1 0.377533 0.380432 0.822560 00:06
2 0.356242 0.372250 0.824937 00:07
3 0.346274 0.364236 0.830323 00:07
4 0.349273 0.362597 0.830482 00:07
| �[0m 104     �[0m | �[0m 0.8305  �[0m | �[0m 1.683   �[0m | �[0m 6.381   �[0m | �[0m 0.8564  �[0m |
epoch train_loss valid_loss accuracy time
0 0.410286 0.413506 0.801648 00:07
1 0.386787 0.397328 0.812262 00:07
2 0.365268 0.390419 0.825570 00:07
3 0.368566 0.386955 0.828264 00:07
4 0.362751 0.383124 0.830482 00:07
| �[0m 105     �[0m | �[0m 0.8305  �[0m | �[0m 6.387   �[0m | �[0m 2.333   �[0m | �[0m 0.4877  �[0m |
epoch train_loss valid_loss accuracy time
0 0.432347 0.450028 0.780894 00:17
1 0.414766 0.402565 0.809569 00:17
2 0.382495 0.382281 0.812421 00:18
3 0.366852 0.373158 0.822085 00:18
4 0.353692 0.368471 0.824461 00:18
| �[0m 106     �[0m | �[0m 0.8245  �[0m | �[0m 0.9452  �[0m | �[0m 6.902   �[0m | �[0m 3.991   �[0m |
epoch train_loss valid_loss accuracy time
0 0.397604 0.407966 0.814797 00:06
1 0.386755 0.382698 0.800063 00:07
2 0.361672 0.373610 0.823194 00:07
3 0.345655 0.363278 0.829848 00:07
4 0.349931 0.362171 0.830957 00:07
| �[0m 107     �[0m | �[0m 0.831   �[0m | �[0m 5.475   �[0m | �[0m 5.721   �[0m | �[0m 0.02659 �[0m |
| �[0m 108     �[0m | �[0m 0.8308  �[0m | �[0m 4.597   �[0m | �[0m 7.994   �[0m | �[0m 0.1318  �[0m |
epoch train_loss valid_loss accuracy time
0 0.391720 0.395161 0.819550 00:06
1 0.372859 0.387859 0.810203 00:07
2 0.367385 0.376952 0.812262 00:07
3 0.346810 0.365312 0.827155 00:07
4 0.341221 0.363528 0.829056 00:07
| �[0m 109     �[0m | �[0m 0.8291  �[0m | �[0m 2.391   �[0m | �[0m 4.915   �[0m | �[0m 0.8695  �[0m |
epoch train_loss valid_loss accuracy time
0 0.391635 0.402138 0.804658 00:08
1 0.402425 0.468013 0.803866 00:09
2 0.368443 0.372927 0.823669 00:09
3 0.361579 0.364913 0.826679 00:09
4 0.347989 0.363416 0.828580 00:09
| �[0m 110     �[0m | �[0m 0.8286  �[0m | �[0m 5.711   �[0m | �[0m 7.031   �[0m | �[0m 1.157   �[0m |
epoch train_loss valid_loss accuracy time
0 0.393225 0.414461 0.812738 00:07
1 0.377641 0.381910 0.820659 00:07
2 0.363979 0.375042 0.826838 00:07
3 0.349867 0.363547 0.830165 00:07
4 0.350035 0.364254 0.830006 00:07
| �[0m 111     �[0m | �[0m 0.83    �[0m | �[0m 3.867   �[0m | �[0m 7.351   �[0m | �[0m 0.7373  �[0m |
| �[0m 112     �[0m | �[0m 0.8275  �[0m | �[0m 5.568   �[0m | �[0m 0.8565  �[0m | �[0m 0.9522  �[0m |
| �[0m 113     �[0m | �[0m 0.8311  �[0m | �[0m 5.553   �[0m | �[0m 1.45    �[0m | �[0m 0.0     �[0m |
epoch train_loss valid_loss accuracy time
0 0.404215 0.421293 0.817807 00:10
1 0.380753 0.388533 0.803074 00:09
2 0.362475 0.381199 0.820342 00:09
3 0.353416 0.368431 0.824620 00:09
4 0.335854 0.365448 0.825570 00:09
| �[0m 114     �[0m | �[0m 0.8256  �[0m | �[0m 3.949   �[0m | �[0m 1.125   �[0m | �[0m 1.109   �[0m |
epoch train_loss valid_loss accuracy time
0 0.397351 0.482881 0.766160 00:08
1 0.373190 0.372999 0.819075 00:08
2 0.354075 0.368455 0.826046 00:08
3 0.350617 0.362793 0.830323 00:08
4 0.347381 0.361980 0.831274 00:08
| �[0m 115     �[0m | �[0m 0.8313  �[0m | �[0m 7.258   �[0m | �[0m 4.724   �[0m | �[0m 1.639   �[0m |
=============================================================
optimizer.max['target']
0.8428390622138977
{key: 2**int(value)
  for key, value in optimizer.max['params'].items()}
{'pow_n_a': 2, 'pow_n_d': 64, 'pow_n_steps': 1}

Out of memory dataset

If your dataset is so big it doesn't fit in memory, you can load a chunk of it each epoch.

df = pd.read_csv(path/'adult.csv')
df_main,df_valid = df.iloc[:-1000].copy(),df.iloc[-1000:].copy()
# choose size that fit in memory
dataset_size = 1000
# load chunk with your own code
def load_chunk():
    return df_main.sample(dataset_size).copy()
df_small = load_chunk()
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 
             'relationship', 'race', 'native-country', 'sex']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
splits = RandomSplitter()(range_of(df_small))
to = TabularPandas(df_small, procs, cat_names, cont_names, y_names="salary", y_block = CategoryBlock(), 
                   splits=None, do_setup=True)
# save the validation set
to_valid = to.new(df_valid)
to_valid.process()
val_dl = TabDataLoader(to_valid.train)
len(to.train)
1000
class ReloadCallback(Callback):
    def begin_epoch(self): 
        df_small = load_chunk()
        to_new = to.new(df_small)
        to_new.process()
        trn_dl = TabDataLoader(to_new.train)
        self.learn.dls = DataLoaders(trn_dl, val_dl).cuda()
dls = to.dataloaders()
emb_szs = get_emb_sz(to)
model = TabNetModel(emb_szs, len(to.cont_names), dls.c, n_d=8, n_a=32, n_steps=1); 
opt_func = partial(Adam, wd=0.01, eps=1e-5)
learn = Learner(dls, model, CrossEntropyLossFlat(), opt_func=opt_func, lr=3e-2, metrics=[accuracy])
learn.add_cb(ReloadCallback());
learn.fit_one_cycle(10)
epoch train_loss valid_loss accuracy time
0 0.587740 0.550544 0.756000 00:01
1 0.545411 0.515772 0.782000 00:01
2 0.484289 0.468586 0.813000 00:01
3 0.447111 0.435774 0.817000 00:01
4 0.449050 0.394715 0.819000 00:01
5 0.428863 0.382005 0.835000 00:01
6 0.382100 0.404258 0.826000 00:01
7 0.383915 0.376179 0.833000 00:01
8 0.389460 0.367857 0.834000 00:01
9 0.376486 0.367577 0.834000 00:01
Owner
Mikhail Grankin
Mikhail Grankin
[IROS2021] NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

NYU-VPR This repository provides the experiment code for the paper Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymiza

Automation and Intelligence for Civil Engineering (AI4CE) Lab @ NYU 22 Sep 28, 2022
Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation.

DuoRec Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. Usage Download datasets fr

Qrh 46 Dec 19, 2022
System Design course at HSE (2021)

System Design course at HSE (2021) Wiki-страница курса Структура репозитория: slides - директория с презентациями с занятий tasks - материалы для выпо

22 Dec 25, 2022
Implementation of ViViT: A Video Vision Transformer

ViViT: A Video Vision Transformer Unofficial implementation of ViViT: A Video Vision Transformer. Notes: This is in WIP. Model 2 is implemented, Model

Rishikesh (ऋषिकेश) 297 Jan 06, 2023
[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

InvCompress Official Pytorch Implementation for "Enhanced Invertible Encoding for Learned Image Compression", ACMMM 2021 (Oral) Figure: Our framework

96 Nov 30, 2022
Fuse radar and camera for detection

SAF-FCOS: Spatial Attention Fusion for Obstacle Detection using MmWave Radar and Vision Sensor This project hosts the code for implementing the SAF-FC

ChangShuo 18 Jan 01, 2023
[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data This repo provides the source code & data of our paper: Break-It-Fix-It: Unsupervised

Michihiro Yasunaga 86 Nov 30, 2022
This is the repository for Learning to Generate Piano Music With Sustain Pedals

SusPedal-Gen This is the official repository of Learning to Generate Piano Music With Sustain Pedals Demo Page Dataset The dataset used in this projec

Joann Ching 12 Sep 02, 2022
RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

Iterative refinement graph neural network for antibody sequence-structure co-des

Wengong Jin 83 Dec 31, 2022
Dynamic Slimmable Network (CVPR 2021, Oral)

Dynamic Slimmable Network (DS-Net) This repository contains PyTorch code of our paper: Dynamic Slimmable Network (CVPR 2021 Oral). Architecture of DS-

Changlin Li 197 Dec 09, 2022
Simple Baselines for Human Pose Estimation and Tracking

Simple Baselines for Human Pose Estimation and Tracking News Our new work High-Resolution Representations for Labeling Pixels and Regions is available

Microsoft 2.7k Jan 05, 2023
The official repository for "Score Transformer: Generating Musical Scores from Note-level Representation" (MMAsia '21)

Score Transformer This is the official repository for "Score Transformer": Score Transformer: Generating Musical Scores from Note-level Representation

22 Dec 22, 2022
This is the code for the paper "Motion-Focused Contrastive Learning of Video Representations" (ICCV'21).

Motion-Focused Contrastive Learning of Video Representations Introduction This is the code for the paper "Motion-Focused Contrastive Learning of Video

11 Sep 23, 2022
ML-PersonalWork - Big assignment PersonalWork in Machine Learning, 2021 autumn BUAA.

ML-PersonalWork - Big assignment PersonalWork in Machine Learning, 2021 autumn BUAA.

Snapdragon Lee 2 Dec 16, 2022
Layered Neural Atlases for Consistent Video Editing

Layered Neural Atlases for Consistent Video Editing Project Page | Paper This repository contains an implementation for the SIGGRAPH Asia 2021 paper L

Yoni Kasten 353 Dec 27, 2022
Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

Easy-To-Hard The official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks". Gett

Avi Schwarzschild 52 Sep 08, 2022
A lightweight deep network for fast and accurate optical flow estimation.

FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation The official PyTorch implementation of FastFlowNet (ICRA 2021). Authors: Lingtong

Tone 161 Jan 03, 2023
Visual Adversarial Imitation Learning using Variational Models (VMAIL)

Visual Adversarial Imitation Learning using Variational Models (VMAIL) This is the official implementation of the NeurIPS 2021 paper. Project website

14 Nov 18, 2022
A Pytorch loader for MVTecAD dataset.

MVTecAD A Pytorch loader for MVTecAD dataset. It strictly follows the code style of common Pytorch datasets, such as torchvision.datasets.CIFAR10. The

Jiyuan 1 Dec 27, 2021
L-Verse: Bidirectional Generation Between Image and Text

Far beyond learning long-range interactions of natural language, transformers are becoming the de-facto standard for many vision tasks with their power and scalabilty

Kim, Taehoon 102 Dec 21, 2022