Training/Validation Split
It is rather standard, but we will look at special mechanism: Stratified Sampling, to avoid any bias in the output labels
Problem
Solution
Demo
X_train, X_val, y_train, y_val = train_test_split(
df.index.values,
df.label.values,
test_size = 0.2,
random_state = 10,
stratify = df.label.values # the stratified sampling part
)Use index to add the labels
PreviousExploratory Data Analysis and Pre-processingNextEncode with Tokenizer and Prepare the dataset for Machine Learning
Last updated