Understanding Feedback Mechanisms in Machine Learning Jupyter Notebooks

Online Appendix

Abstract

This website hosts the online appendix for the research paper titled Understanding Feedback Mechanisms in Machine Learning Jupyter Notebooks which has been submitted to the Journal of Empirical Software Engineering.

(RQ2) How is explicit feedback from assert statements used to validate ML code written in Jupyter notebooks?

Data Shape Check (\(N = 26\))

Key Code
A5 assert y_valid.shape == (1132,)
A17 assert X.shape[1] == 13, 'Did you drop/lose some columns in X? Did you properly load and split the data?'
A29 assert len(test_y_preds) == len(test_y), 'Unexpected number of predictions.'
A31 assert img.shape == (112, 92)
A76 assert len(encoding['token_type_ids']) == max_seq_length
A84 assert red.get_shape().as_list()[1:] == [224, 224, 1]
A90 assert len(X_train) == 2000
A93 assert temp_embed.shape[0] == stride

Data Validation Check (\(N = 14\))

Key Code
A41 assert np.all(np.unique(X['smoke'].values) == np.array([0, 1]))
A44 assert np.all(np.unique(X['smoke'].values) == np.array([0, 1]))
A46 assert np.isclose(stdev_norm, 1.0, atol=1e-16)
A52 assert grouped_users['user_id'].nunique() == user_engagement['user_id'].nunique()
A65 assert np.all(y <= nb_classes)
A73 assert df['clf'].value_counts()[1] == len(df[df['quality'] >= 7])

Model Performance Check (\(N = 11\))

Key Code
A7 assert len(neighbours_1) == 20, "Neighbors don't match!"
A15 assert np.allclose(verify('images/camera_1.jpg', 'bertrand', database, FRmodel), (0.54364836, True))$
A19 assert np.allclose(linear_model.coef_, [[1.57104472, 0.92521608]]), 'The model parameters you learned seem incorrect!'
A38 assert 0.75 < auc(fpr, tpr) < 0.85
A58 assert np.isclose(accuracy, 0.9666666666666667)

Existence Check (\(N = 8\))

Key Code
A23 assert np.all(orders.groupby('user_id') .days_since_prior_order.tail(1).notnull())
A42 assert not lab_s.isnull().values.any()
A43 assert len(data) != 0, 'cannot divide by zero'
A50 assert not np.any(np.isnan(X))
A51 assert data.target.notnull().all()
A63 assert X.isnull().sum().sum() == 0
A79 assert not processed_data_df.isna().any().any()
A86 assert p0 in poi_info.index

Resource Check (\(N = 7\))

Key Code
A10 assert le_path.is_file(), f"Label encoder file not found at {le_path}. Make sure 'label_encoder.pkl' exists in the lightning_logs directory."
A14 assert self.model is not None, 'Model is not loaded, load it by calling .load_model()'
A18 assert pd.__version__.rpartition('.')[0] == '1.0', f"Unexpected pandas version: expected 1.0, got {pd.__version__.rpartition('.')[0]}"
A37 assert svm.fit_status_ == 0, 'Forgot to train the SVM!'
A60 assert f2.gca().has_data()
A67 assert pm.__version__ == '3.9.2'
A74 assert os.path.exists(image_dir)

Type Check (\(N = 5\))

Key Code
A2 assert isinstance(X_trn, torch.FloatTensor), 'Features should be float32!'
A35 assert isinstance(column_transformer, ColumnTransformer), "Input isn't a ColumnTransformer"
A40 assert isinstance(model_3, sklearn.ensemble.RandomForestClassifier)
A81 assert is_all_ints(filled_df[r]) is True
A88 assert isinstance(betas, np.ndarray)

Mathematical Property Check (\(N = 4\))

Key Code
A3 assert (xH - wH) % self.stride == 0
A25 assert test_output.std() < 0.15, "Don't use batchnorm here"
A56 assert np.allclose(e_v_states[:, -1], np.ones_like(e_v_states[:, -1]))
A64 assert np.allclose(T, T.T)

Batch Size Check (\(N = 3\))

Key Code
A21 assert x.size(0) % batch_size == 0, f'the first dimension of input tensor ({x.size(0)}) should be divisible by batch_size ({batch_size})'
A28 assert image_size % patch_size_small == 0, 'Image dimensions must be divisible by the patch size.'
A70 assert n_img > batch_size

Network Architecture Check (\(N = 3\))

Key Code
A11 assert self.encoder_conv_01[0].weight.size() == self.vgg16.features[2].weight.size()
A62 assert self.encoder_conv_01[0].weight.size() == self.vgg16.features[2].weight.size()
A75 assert reg in ['none', 'l2']

Data Leakage Check (\(N = 1\))

Key Code
A33 assert len(set( tr_df.PetID.unique()).intersection(valid_df.PetID.unique())) == 0

(RQ3) How is implicit feedback from print statements and last cell statements used when writing ML code in Jupyter notebooks?

Model Performance Check (\(N = 33\))

Key Code
P3 print('The mean accuracy with 10 fold cross validation is: %s ' % round(scores * 100, 2), '%')
P6 print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, pred)))
P18 print('The Accuracy is:', accuracy_score(y_test, y_pred))
P50 print('Classification Report: SVM (validation data)')$
P54 print('Intercept value:', lm.intercept_)
L3 skplt.metrics. plot_confusion_matrix(Y_val, Vote.predict(X_val), normalize=True, figsize=(10, 10))
L52 spot_check_recs(classifier, 910)

Data Distribution (\(N = 7\))

Key Code
L2 _ = sns.catplot(x='category_id', y='likes', data=train, height=5, aspect=1.5)
L9 sns.kdeplot(data=data.loc[ data['Survived'] == 0].Age, label='Died', shade=True)
L14 pd.pivot_table(train, index='Survived', values=['Age', 'SibSp', 'Parch', 'Fare'])
L25 sns.countplot(house_pred['OverallQual'])
L48 x_train.describe()

Resource Check (\(N = 7\))

Key Code
P68 print('GPU is available')
P71 print('Hub version: ', hub.__version__)
P82 print('Running on TPU ', tpu.master())
P86 print('Cuda is available')
P107 print('Model loaded')
L64 full_table.head(-5)
L66 prostate_cancer_df.shape

Spot Check (\(N = 5\))

Key Code
L60 X_pca.head()
P64 print(np.max(cur[:, :, 1]))
P114 print(onehot_encoded)

Model Training Check (\(N = 4\))

Key Code
L8 autoencoder.fit(x=X_train, y=X_train, epochs=15, validation_data=[X_test, X_test], callbacks=[keras_utils.TqdmProgressCallback()], verbose=0)
L31 adaBoost.fit(X_train, y_train)
L42 m_r.best_params_

Missing Value Check (\(N = 3\))

Key Code
P74 print(train_df.isnull().sum())
L12 sns.heatmap(test_df.isnull(), yticklabels=False, cbar=False, cmap='viridis')
L36 test.isna().sum().unique()

Shape Check (\(N = 3\))

Key Code
P4 print('no.of examples in test data : ', len(test_data))
P32 print('Training set shape : ', x_train.shape)
P117 print('Y_train.shape: ', Y_train.shape)

Data Relationship Check (\(N = 2\))

Key Code
L6 b = sns.relplot(x='SIZE', y='Cash', hue='CLARITY', alpha=0.9, palette='muted', height=8, data=raw_data)
L10 sns.regplot(x='X4 number of convenience stores', y='Y house price of unit area', data=data)

Type Check (\(N = 2\))

Key Code
P43 print('data type:', images.dtype)
L71 type(Y)

Execution Time Check (\(N = 1\))

Key Code
P66 print('Total Run Time:')

Network Architecture Check (\(N = 1\))

Key Code
P92 print(MyNetwork)