CS231n Solver.py 详解-白红宇

CS231n Solver.py 详解

阅读量：4543 次

发布时间：2019-06-08

本文共 8814 字，大约阅读时间需要 29 分钟。

Solver是一个类，该类用于接收数据与标签，对权值进行相应求解，在solver类中调整一些超参数以达到最好的训练效果。

成员函数

初始化函数

1 def __init__(self, model, data, **kwargs): 2     """ 3     Construct a new Solver instance. 4      5     Required arguments: 6     - model: A model object conforming to the API described above 7     - data: A dictionary of training and validation data with the following: 8       'X_train': Array of shape (N_train, d_1, ..., d_k) giving training images 9       'X_val': Array of shape (N_val, d_1, ..., d_k) giving validation images10       'y_train': Array of shape (N_train,) giving labels for training images11       'y_val': Array of shape (N_val,) giving labels for validation images12       13     Optional arguments:14     - update_rule: A string giving the name of an update rule in optim.py.15       Default is 'sgd'.16     - optim_config: A dictionary containing hyperparameters that will be17       passed to the chosen update rule. Each update rule requires different18       hyperparameters (see optim.py) but all update rules require a19       'learning_rate' parameter so that should always be present.20     - lr_decay: A scalar for learning rate decay; after each epoch the learning21       rate is multiplied by this value.22     - batch_size: Size of minibatches used to compute loss and gradient during23       training.24     - num_epochs: The number of epochs to run for during training.25     - print_every: Integer; training losses will be printed every print_every26       iterations.27     - verbose: Boolean; if set to false then no output will be printed during28       training.29     """30     self.model = model31     self.X_train = data['X_train']32     self.y_train = data['y_train']33     self.X_val = data['X_val']34     self.y_val = data['y_val']35     36     # Unpack keyword arguments37     self.update_rule = kwargs.pop('update_rule', 'sgd')38     self.optim_config = kwargs.pop('optim_config', {})39     self.lr_decay = kwargs.pop('lr_decay', 1.0)40     self.batch_size = kwargs.pop('batch_size', 100)41     self.num_epochs = kwargs.pop('num_epochs', 10)42 43     self.print_every = kwargs.pop('print_every', 100)44     self.verbose = kwargs.pop('verbose', True)45 46     # Throw an error if there are extra keyword arguments47     if len(kwargs) > 0:48       extra = ', '.join('"%s"' % k for k in kwargs.keys())49       raise ValueError('Unrecognized arguments %s' % extra)50 51     # Make sure the update rule exists, then replace the string52     # name with the actual function53     if not hasattr(optim, self.update_rule):54       raise ValueError('Invalid update_rule "%s"' % self.update_rule)55     self.update_rule = getattr(optim, self.update_rule)56 57     self._reset()

初始化函数接收的变量有：

（1）模型model，这本是一个类对象，定义了网络的结构特征，和数据，优化方法等没有关系，就是单纯的一个网络结构，包含了网络前向后向的计算函数。

（2）数据data，这是一个结构体，包含了训练集：X_train。验证集X_val。训练标签：y_train。验证标签：y_val

（3）第三个参数**kwargs是指将输入的量写成一个字典的形式。在初始化函数中会依次进行pop，如果没有设定某些值就赋予一个默认值

重置函数

1 def _reset(self): 2     """ 3     Set up some book-keeping variables for optimization. Don't call this 4     manually. 5     """ 6     # Set up some variables for book-keeping 7     self.epoch = 0 8     self.best_val_acc = 0 9     self.best_params = {}10     self.loss_history = []11     self.train_acc_history = []12     self.val_acc_history = []13 14     # Make a deep copy of the optim_config for each parameter15     self.optim_configs = {}16     for p in self.model.params:17       d = {k: v for k, v in self.optim_config.iteritems()}18       self.optim_configs[p] = d

重置函数对一些solver类中的变量进行了重置。特别注意的是新建了一个

optim_configs字典来存储优化的参数，之前的优化参数保存在self.optim_config字典中，这两个是完全不一样的！！

_step函数

1 def _step(self): 2     """ 3     Make a single gradient update. This is called by train() and should not 4     be called manually. 5     """ 6     # Make a minibatch of training data 7     num_train = self.X_train.shape[0] %确定有多少个训练集样本 8     batch_mask = np.random.choice(num_train, self.batch_size) % 从中随机选择出batch_size这么多个 9     X_batch = self.X_train[batch_mask] % 从训练集中截取10     y_batch = self.y_train[batch_mask] % 截取对应的标志11 12     # Compute loss and gradient %计算损失函数和梯度13     loss, grads = self.model.loss(X_batch, y_batch) % 调用模型的loss函数进行计算14     self.loss_history.append(loss) % 将loss值存入一个向量中，后面会plot出来。注意每一个loss都是用一个batch这么多数据求出来的15 16     # Perform a parameter update17     for p, w in self.model.params.iteritems():18       dw = grads[p]19       config = self.optim_configs[p]20       next_w, next_config = self.update_rule(w, dw, config)% 注意这里！！，之前使用过getattr函数，所以成了一个函数21       self.model.params[p] = next_w22       self.optim_configs[p] = next_config

check_accuracy函数

1 def check_accuracy(self, X, y, num_samples=None, batch_size=100): 2     """ 3     Check accuracy of the model on the provided data. 4      5     Inputs: 6     - X: Array of data, of shape (N, d_1, ..., d_k) 7     - y: Array of labels, of shape (N,) 8     - num_samples: If not None, subsample the data and only test the model 9       on num_samples datapoints.10     - batch_size: Split X and y into batches of this size to avoid using too11       much memory.12       13     Returns:14     - acc: Scalar giving the fraction of instances that were correctly15       classified by the model.16     """17     18     # Maybe subsample the data19     N = X.shape[0] % 输入例子的个数20     if num_samples is not None and N > num_samples: % 例子太多随机抽取一些子类21       mask = np.random.choice(N, num_samples)22       N = num_samples23       X = X[mask] % 随机抽取一些子例子24       y = y[mask]25 26     # Compute predictions in batches27     num_batches = N / batch_size % 看看N可以分成几个batch28     if N % batch_size != 0: %如果不能整除29       num_batches += 1 % 分成的份数加130     y_pred = [] %预测值31     for i in xrange(num_batches): %对每一份例子进行循环32       start = i * batch_size % 选出当前的例子：这是开头33       end = (i + 1) * batch_size % 选出当前的例子： 这是结尾34       scores = self.model.loss(X[start:end]) % 对开头结尾之间的例子进行预测35       y_pred.append(np.argmax(scores, axis=1)) %将预测后的值取最大值代表该例子的类别，并链接36     y_pred = np.hstack(y_pred) %将所有的预测合在一起37     acc = np.mean(y_pred == y) % 求一个平均，做为准确率38 39     return acc % 返回准确率

之所以我们分成batch来求，然后合在一起，是为了防止例子过多，内存装不下。

train函数

1 def train(self): 2     """ 3     Run optimization to train the model. 4     """ 5     num_train = self.X_train.shape[0] % 读取训练的例子的个数 6     iterations_per_epoch = max(num_train / self.batch_size, 1) % 在下面进行解释 7     num_iterations = self.num_epochs * iterations_per_epoch 8  9     for t in xrange(num_iterations): % 对每一个iteration进行循环！！10       self._step() % 更新一下。每次更新都是从所有例子中，抽取batch_size个例子，所以batch越小，要想覆盖所有的数据集 所需要的迭代次数越多，也就解释了上面的iterations_per_epoch的来源11 12       # Maybe print training loss13       if self.verbose and t % self.print_every == 0: % 在计算过程中观察中间结果，14         print '(Iteration %d / %d) loss: %f' % ( %可见print_every后面是迭代的次数15                t + 1, num_iterations, self.loss_history[-1]) % 不是epoch的次数16 17       # At the end of every epoch, increment the epoch counter and decay the18       # learning rate.19       epoch_end = (t + 1) % iterations_per_epoch == 0 由于每个epoch是由一些iteration组成20       if epoch_end: %如果到达了足够多的iteration，也就是epoch结束了21         self.epoch += 1 % epoch加 122         for k in self.optim_configs:  % 所有的learning_rate都要decay23           self.optim_configs[k]['learning_rate'] *= self.lr_decay24 25       # Check train and val accuracy on the first iteration, the last26       # iteration, and at the end of each epoch.27       first_it = (t == 0) % 在第一个和最后一个iteration，以及epoch结束时检查acc28       last_it = (t == num_iterations + 1) %29       if first_it or last_it or epoch_end: % 计算train和val的acc30         train_acc = self.check_accuracy(self.X_train, self.y_train,31                                         num_samples=1000)32         val_acc = self.check_accuracy(self.X_val, self.y_val)33         self.train_acc_history.append(train_acc)% 将两个的acc进行记录34         self.val_acc_history.append(val_acc)35 36         if self.verbose:37           print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (38                  self.epoch, self.num_epochs, train_acc, val_acc)39 40         # Keep track of the best model41         if val_acc > self.best_val_acc:42           self.best_val_acc = val_acc43           self.best_params = {}44           for k, v in self.model.params.iteritems():45             self.best_params[k] = v.copy()46 47     # At the end of training swap the best params into the model48     self.model.params = self.best_params

iterations_per_epoch和num_iterations比较奇怪

（1）iterations_per_epoch：用训练集中例子的个数除以batch的个数，如果小于1就取1.

比如训练集有10000个例子，一个batch取100个例子，那么该变量为100。代表在一个epoch中迭代100次？

比如训练集有10000个例子，一个batch取50个例子，那么改变量为200，代表在一个epoch中迭代200次？

一个batch越小，一个epoch中迭代的次数越大。

（2）num_iterations：用self.num_epochs的个数，乘以上面的每个epoch中迭代的次数，就是总的迭代数。

（3）在每一个epoch结束的时候，对learning_rate进行decay

打法

posted on

2016-06-14 09:02 阅读(

...) 评论(

...)

转载于:https://www.cnblogs.com/lijiajun/p/5582789.html

你可能感兴趣的文章