DataLoader, Optimizer, Scheduler
Previous we have the dataset for ML. We also downloaded the model for ML. Here we specify how we can define the ML training process: feed data, set learning rate, set learning epochs
DataLoader: how our data will be feed into the model
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
# it is good use RandomSampler for training data, because we don’t want the sequence that data is listed to disrupt our learning process
# it is ok to use SequenctialSampler for validataion data, because it doesn’t matter on the sequence for verification
dataloader_train = DataLoader(
dataset_train, # the tokenized data we are going to feed in the model
sampler=RamdomSampler(dataset_train), # necessary to use random sampler
batch_size=32 # how many pieces of data we feed in at once
)Optimizer: how fast is the model going to learn
from transformers import AdamW
optimizer = AdamW(
model.parameters(),
lr=1e-5,
eps=1e-8
)Scheduler: tell the model how to train
Last updated