如何2小时快速入门MxNet？_浏览器插件

一、MXnet的安装及使用

　　开源地址：

　　如下是单节点的具体安装和实验流程，参考于官方文档：

　　1.1、基本依赖的安装

　　sudo apt-get update

　　sudo apt-get install -y build-essential git libblas-dev libopencv-dev

　　1.2、下载mxnet

　　git clone --recursive

　　1.3、安装cuda

　　详见博客：

　　1.4、编译支持GPU的MXnet

　　将mxnet/目录里找到mxnet/make/子目录，把该目录下的config.mk复制到mxnet/目录，用文本编辑器打开，找到并修改以下两行：

　　USE_CUDA = 1

　　USE_CUDA_PATH = /usr/local/cuda

　　修改之后，在mxnet/目录下编译

　　make -j4

　　1.5、安装Python支持

　　cd python;

　　python setup.py install

　　有些时候需要安装setuptools和numpy(sudo apt-get install python-numpy)。

　　1.6、运行Mnist手写体识别实例

　　MNIST手写数字识别，数据集包含6万个手写数字的训练数据集以及1万个测试数据集，每个图片是28x28的灰度图。在mxnet/example/image-classification里可以找到MXnet自带MNIST的识别样例，我们可以先运行一下试试：

　　cd mxnet/example/image-classification

　　python train_mnist.py

　　在第一次运行的时候会自动下载MNIST数据集。

　　以上的命令是使用默认的参数运行，即使用mlp网络，在cpu上计算。

　　如果使用lenet网络，在GPU上实现加速，则使用如下命令：

　　python train_mnist.py --gpus 0 --network lenet

　　想要搞清楚一个框架怎么使用，第一步就是用它来训练自己的数据，这是个很关键的一步。

二、MXnet数据预处理

　　整个数据预处理的代码都集成在了toosl/im2rec.py中了，这个首先要造出一个list文件，lst文件有三列，分别是index label 图片路径。如下图所示：

　　我这个label是瞎填的，所以都是0。另外最新的MXnet上面的im2rec是有问题的，它生成的list所有的index都是0，不过据说这个index没什么用.....但我还是改了一下。把yield生成器换成直接append即可。

　　执行的命令如下：

　　　　sudo python im2rec.py --list=True /home/erya/dhc/result/try /home/erya/dhc/result/ --recursive=True --shuffle=true --train-ratio=0.8

　　每个参数的意义在代码内部都可以查到，简单说一下这里用到的：--list=True说明这次的目的是make list，后面紧跟的是生成的list的名字的前缀，我这里是加了路径，然后是图片所在文件夹的路径，recursive是是否迭代的进入文件夹读取图片，--train-ratio则表示train和val在数据集中的比例。

　　执行上面的命令后，会得到三个文件：

然后再执行下面的命令生成最后的rec文件：

　　sudo python im2rec.py /home/erya/dhc/result/try_val.lst /home/erya/dhc/result --quality=100

以及，sudo python im2rec.py /home/erya/dhc/result/try_train.lst /home/erya/dhc/result --quality=100

　来生成相应的lst文件的rec文件，参数意义太简单就不说了..看着就明白，result是我存放图片的目录。

　　这样最终就完成了数据的预处理，简单的说，就是先生成lst文件，这个其实完全可以自己做，而且后期我做segmentation的时候，label就是图片了..

三、非常简单的小demo

　　先上代码：

import mxnet as mximport loggingimport numpy as np logger = logging.getLogger() logger.setLevel(logging.DEBUG)#暂时不需要管的logdef ConvFactory(data, num_filter, kernel, stride=(1,1), pad=(0, 0), act_type="relu"): conv = mx.symbol.Convolution(data=data, workspace=256, num_filter=num_filter, kernel=kernel, stride=stride, pad=pad) return conv #我把这个删除到只有一个卷积的操作def DownsampleFactory(data, ch_3x3): # conv 3x3 conv = ConvFactory(data=data, kernel=(3, 3), stride=(2, 2), num_filter=ch_3x3, pad=(1, 1)) # pool pool = mx.symbol.Pooling(data=data, kernel=(3, 3), stride=(2, 2), pool_type='max') # concat concat = mx.symbol.Concat(*[conv, pool]) return concatdef SimpleFactory(data, ch_1x1, ch_3x3): # 1x1 conv1x1 = ConvFactory(data=data, kernel=(1, 1), pad=(0, 0), num_filter=ch_1x1) # 3x3 conv3x3 = ConvFactory(data=data, kernel=(3, 3), pad=(1, 1), num_filter=ch_3x3) #concat concat = mx.symbol.Concat(*[conv1x1, conv3x3]) return concatif __name__ == "__main__": batch_size = 1 train_dataiter = mx.io.ImageRecordIter( shuffle=True, path_imgrec="/home/erya/dhc/result/try_train.rec", rand_crop=True, rand_mirror=True, data_shape=(3,28,28), batch_size=batch_size, preprocess_threads=1)#这里是使用我们之前的创造的数据，简单的说就是要自己写一个iter，然后把相应的参数填进去。 test_dataiter = mx.io.ImageRecordIter( path_imgrec="/home/erya/dhc/result/try_val.rec", rand_crop=False, rand_mirror=False, data_shape=(3,28,28), batch_size=batch_size, round_batch=False, preprocess_threads=1)#同理 data = mx.symbol.Variable(name="data") conv1 = ConvFactory(data=data, kernel=(3,3), pad=(1,1), num_filter=96, act_type="relu") in3a = SimpleFactory(conv1, 32, 32) fc = mx.symbol.FullyConnected(data=in3a, num_hidden=10) softmax = mx.symbol.SoftmaxOutput(name='softmax',data=fc)#上面就是定义了一个巨巨巨简单的结构 # For demo purpose, this model only train 1 epoch # We will use the first GPU to do training num_epoch = 1 model = mx.model.FeedForward(ctx=mx.gpu(), symbol=softmax, num_epoch=num_epoch, learning_rate=0.05, momentum=0.9, wd=0.00001) #将整个model训练的架构定下来了，类似于caffe里面solver所做的事情。# we can add learning rate scheduler to the model# model = mx.model.FeedForward(ctx=mx.gpu(), symbol=softmax, num_epoch=num_epoch,# learning_rate=0.05, momentum=0.9, wd=0.00001,# lr_scheduler=mx.misc.FactorScheduler(2))model.fit(X=train_dataiter, eval_data=test_dataiter, eval_metric="accuracy", batch_end_callback=mx.callback.Speedometer(batch_size))#开跑数据。

四、detaiter

　　MXnet的设计结构是C++做后端运算，python、R等做前端来使用，这样既兼顾了效率，又让使用者方便了很多，完整的使用MXnet训练自己的数据集需要了解几个方面。今天我们先谈一谈Data iterators。

　　MXnet中的data iterator和python中的迭代器是很相似的，当其内置方法next被call的时候它每次返回一个 data batch。所谓databatch，就是神经网络的输入和label，一般是(n, c, h, w)的格式的图片输入和(n, h, w)或者标量式样的label。直接上官网上的一个简单的例子来说说吧。

import numpy as npclass SimpleIter: def __init__(self, data_names, data_shapes, data_gen, label_names, label_shapes, label_gen, num_batches=10): self._provide_data = zip(data_names, data_shapes) self._provide_label = zip(label_names, label_shapes) self.num_batches = num_batches self.data_gen = data_gen self.label_gen = label_gen self.cur_batch = 0 def __iter__(self): return self def reset(self): self.cur_batch = 0 def __next__(self): return self.next() @property def provide_data(self): return self._provide_data @property def provide_label(self): return self._provide_label def next(self): if self.cur_batch < self.num_batches: self.cur_batch += 1 data = [mx.nd.array(g(d[1])) for d,g in zip(self._provide_data, self.data_gen)] assert len(data) > 0, "Empty batch data." label = [mx.nd.array(g(d[1])) for d,g in zip(self._provide_label, self.label_gen)] assert len(label) > 0, "Empty batch label." return SimpleBatch(data, label) else: raise StopIteration

　　上面的代码是最简单的一个dataiter了，没有对数据的预处理，甚至于没有自己去读取数据，但是基本的意思是到了，一个dataiter必须要实现上面的几个方法，provide_data返回的格式是(dataname, batchsize, channel, width, height)， provide_label返回的格式是(label_name, batchsize, width, height),reset()的目的是在每个epoch后打乱读取图片的顺序，这样随机采样的话训练效果会好一点，一般情况下是用shuffle你的lst（上篇用来读取图片的lst）实现的，next()的方法就很显然了，用来返回你的databatch，如果出现问题...记得raise stopIteration，这里或许用try更好吧...需要注意的是，databatch返回的数据类型是mx.nd.ndarry。

　　下面是我最近做segmentation的时候用的一个稍微复杂的dataiter，多了预处理和shuffle等步骤：

# pylint: skip-fileimport randomimport cv2import mxnet as mximport numpy as npimport osfrom mxnet.io import DataIter, DataBatchclass FileIter(DataIter): #一般都是继承DataIter """FileIter object in fcn-xs example. Taking a file list file to get dataiter. in this example, we use the whole image training for fcn-xs, that is to say we do not need resize/crop the image to the same size, so the batch_size is set to 1 here Parameters ---------- root_dir : string the root dir of image/label lie in flist_name : string the list file of iamge and label, every line owns the form: index \t image_data_path \t image_label_path cut_off_size : int if the maximal size of one image is larger than cut_off_size, then it will crop the image with the minimal size of that image data_name : string the data name used in symbol data(default data name) label_name : string the label name used in symbol softmax_label(default label name) """ def __init__(self, root_dir, flist_name, rgb_mean=(117, 117, 117), data_name="data", label_name="softmax_label", p=None): super(FileIter, self).__init__() self.fac = p.fac #这里的P是自己定义的config self.root_dir = root_dir self.flist_name = os.path.join(self.root_dir, flist_name) self.mean = np.array(rgb_mean) # (R, G, B) self.data_name = data_name self.label_name = label_name self.batch_size = p.batch_size self.random_crop = p.random_crop self.random_flip = p.random_flip self.random_color = p.random_color self.random_scale = p.random_scale self.output_size = p.output_size self.color_aug_range = p.color_aug_range self.use_rnn = p.use_rnn self.num_hidden = p.num_hidden if self.use_rnn: self.init_h_name = 'init_h' self.init_h = mx.nd.zeros((self.batch_size, self.num_hidden)) self.cursor = -1 self.data = mx.nd.zeros((self.batch_size, 3, self.output_size[0], self.output_size[1])) self.label = mx.nd.zeros((self.batch_size, self.output_size[0] / self.fac, self.output_size[1] / self.fac)) self.data_list = [] self.label_list = [] self.order = [] self.dict = {} lines = file(self.flist_name).read().splitlines() cnt = 0 for line in lines: #读取lst，为后面读取图片做好准备 _, data_img_name, label_img_name = line.strip('\n').split("\t") self.data_list.append(data_img_name) self.label_list.append(label_img_name) self.order.append(cnt) cnt += 1 self.num_data = cnt self._shuffle() def _shuffle(self): random.shuffle(self.order) def _read_img(self, img_name, label_name): 　　　　　# 这个是在服务器上跑的时候，因为数据集很小，而且经常被同事卡IO，所以我就把数据全部放进了内存 if os.path.join(self.root_dir, img_name) in self.dict: img = self.dict[os.path.join(self.root_dir, img_name)] else: img = cv2.imread(os.path.join(self.root_dir, img_name)) self.dict[os.path.join(self.root_dir, img_name)] = img if os.path.join(self.root_dir, label_name) in self.dict: label = self.dict[os.path.join(self.root_dir, label_name)] else: label = cv2.imread(os.path.join(self.root_dir, label_name),0) self.dict[os.path.join(self.root_dir, label_name)] = label 　　　　 # 下面是读取图片后的一系统预处理工作 if self.random_flip: flip = random.randint(0, 1) if flip == 1: img = cv2.flip(img, 1) label = cv2.flip(label, 1) # scale jittering scale = random.uniform(self.random_scale[0], self.random_scale[1]) new_width = int(img.shape[1] * scale) # 680 new_height = int(img.shape[0] * scale) # new_width * img.size[1] / img.size[0] img = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_NEAREST) label = cv2.resize(label, (new_width, new_height), interpolation=cv2.INTER_NEAREST) #img = cv2.resize(img, (900,450), interpolation=cv2.INTER_NEAREST) #label = cv2.resize(label, (900, 450), interpolation=cv2.INTER_NEAREST) if self.random_crop: start_w = np.random.randint(0, img.shape[1] - self.output_size[1] + 1) start_h = np.random.randint(0, img.shape[0] - self.output_size[0] + 1) img = img[start_h : start_h + self.output_size[0], start_w : start_w + self.output_size[1], :] label = label[start_h : start_h + self.output_size[0], start_w : start_w + self.output_size[1]] if self.random_color: img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) hue = random.uniform(-self.color_aug_range[0], self.color_aug_range[0]) sat = random.uniform(-self.color_aug_range[1], self.color_aug_range[1]) val = random.uniform(-self.color_aug_range[2], self.color_aug_range[2]) img = np.array(img, dtype=np.float32) img[..., 0] += hue img[..., 1] += sat img[..., 2] += val img[..., 0] = np.clip(img[..., 0], 0, 255) img[..., 1] = np.clip(img[..., 1], 0, 255) img[..., 2] = np.clip(img[..., 2], 0, 255) img = cv2.cvtColor(img.astype('uint8'), cv2.COLOR_HSV2BGR) is_rgb = True #cv2.imshow('main', img) #cv2.waitKey() #cv2.imshow('maain', label) #cv2.waitKey() img = np.array(img, dtype=np.float32) # (h, w, c) reshaped_mean = self.mean.reshape(1, 1, 3) img = img - reshaped_mean img[:, :, :] = img[:, :, [2, 1, 0]] img = img.transpose(2, 0, 1) # img = np.expand_dims(img, axis=0) # (1, c, h, w) label_zoomed = cv2.resize(label, None, fx = 1.0 / self.fac, fy = 1.0 / self.fac) label_zoomed = label_zoomed.astype('uint8') return (img, label_zoomed) @property def provide_data(self): """The name and shape of data provided by this iterator""" if self.use_rnn: return [(self.data_name, (self.batch_size, 3, self.output_size[0], self.output_size[1])), (self.init_h_name, (self.batch_size, self.num_hidden))] else: return [(self.data_name, (self.batch_size, 3, self.output_size[0], self.output_size[1]))] @property def provide_label(self): """The name and shape of label provided by this iterator""" return [(self.label_name, (self.batch_size, self.output_size[0] / self.fac, self.output_size[1] / self.fac))] def get_batch_size(self): return self.batch_size def reset(self): self.cursor = -self.batch_size self._shuffle() def iter_next(self): self.cursor += self.batch_size return self.cursor < self.num_data def _getpad(self): if self.cursor + self.batch_size > self.num_data: return self.cursor + self.batch_size - self.num_data else: return 0 def _getdata(self): """Load data from underlying arrays, internal use only""" assert(self.cursor < self.num_data), "DataIter needs reset." data = np.zeros((self.batch_size, 3, self.output_size[0], self.output_size[1])) label = np.zeros((self.batch_size, self.output_size[0] / self.fac, self.output_size[1] / self.fac)) if self.cursor + self.batch_size <= self.num_data: for i in range(self.batch_size): idx = self.order[self.cursor + i] data_, label_ = self._read_img(self.data_list[idx], self.label_list[idx]) data[i] = data_ label[i] = label_ else: for i in range(self.num_data - self.cursor): idx = self.order[self.cursor + i] data_, label_ = self._read_img(self.data_list[idx], self.label_list[idx]) data[i] = data_ label[i] = label_ pad = self.batch_size - self.num_data + self.cursor #for i in pad: for i in range(pad): idx = self.order[i] data_, label_ = self._read_img(self.data_list[idx], self.label_list[idx]) data[i + self.num_data - self.cursor] = data_ label[i + self.num_data - self.cursor] = label_ return mx.nd.array(data), mx.nd.array(label) def next(self): """return one dict which contains "data" and "label" """ if self.iter_next(): data, label = self._getdata() data = [data, self.init_h] if self.use_rnn else [data] label = [label] return DataBatch(data=data, label=label, pad=self._getpad(), index=None, provide_data=self.provide_data, provide_label=self.provide_label) else: raise StopIteration

　　到这里基本上正常的训练我们就可以开始了，但是当你有了很多新的想法的时候，你又会遇到新的问题...比如：multi input/output怎么办？

　　其实也很简单，只需要修改几个地方：

　　　　1、provide_label和provide_data，注意到之前我们的return都是一个list，所以之间在里面添加和之前一样的格式就行了。

　　　　2. next() 如果你需要传 data和depth两个输入，只需要传 input = sum([[data],[depth],[]])到databatch的data就行了，label也同理。

　　值得一提的时候，MXnet的multi loss实现起来需要在写network的symbol的时候注意一点，假设你有softmax_loss和regression_loss。那么只要在最后return mx.symbol.Group([softmax_loss, regression_loss])。

　　我们在MXnet中定义好symbol、写好dataiter并且准备好data之后，就可以开开心的去训练了。一般训练一个网络有两种常用的策略，基于model的和基于module的。接下来谈一谈他们的使用。

五、Model

　　按照老规矩，直接从官方文档里面拿出来的代码看一下：

# configure a two layer neuralnetwork data = mx.symbol.Variable('data') fc1 = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128) act1 = mx.symbol.Activation(fc1, name='relu1', act_type='relu') fc2 = mx.symbol.FullyConnected(act1, name='fc2', num_hidden=64) softmax = mx.symbol.SoftmaxOutput(fc2, name='sm')# create a model using sklearn-style two-step way#创建一个model model = mx.model.FeedForward( softmax, num_epoch=num_epoch, learning_rate=0.01)#开始训练 model.fit(X=data_set)

　　具体的API参照。

　　然后呢，model这部分就说完了。。。之所以这么快主要有两个原因：

　　　　1.确实东西不多，一般都是查一查文档就可以了。

　　　　2.model的可定制性不强，一般我们是很少使用的，常用的还是module。

六、Module

　　Module真的是一个很棒的东西，虽然深入了解后，你会觉得“哇，好厉害，但是感觉没什么鸟用呢”这种想法。。实际上我就有过，现在回想起来，从代码的设计和使用的角度来讲，Module确实是一个非常好的东西，它可以为我们的网络计算提高了中级、高级的接口，这样一来，就可以有很多的个性化配置让我们自己来做了。

　　Module有四种状态：

　　　　1.初始化状态，就是显存还没有被分配，基本上啥都没做的状态。

　　　　2.binded，在把data和label的shape传到Bind函数里并且执行之后，显存就分配好了，可以准备好计算能力。

　　　　3.参数初始化。就是初始化参数

　　　　3.Optimizer installed 。就是传入SGD，Adam这种optimuzer中去进行训练　

　　先上一个简单的代码：

import mxnet as mx # construct a simple MLP data = mx.symbol.Variable('data') fc1 = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128) act1 = mx.symbol.Activation(fc1, name='relu1', act_type="relu") fc2 = mx.symbol.FullyConnected(act1, name = 'fc2', num_hidden = 64) act2 = mx.symbol.Activation(fc2, name='relu2', act_type="relu") fc3 = mx.symbol.FullyConnected(act2, name='fc3', num_hidden=10) out = mx.symbol.SoftmaxOutput(fc3, name = 'softmax') # construct the module mod = mx.mod.Module(out) mod.bind(data_shapes=train_dataiter.provide_data, label_shapes=train_dataiter.provide_label) mod.init_params() mod.fit(train_dataiter, eval_data=eval_dataiter, optimizer_params={'learning_rate':0.01, 'momentum': 0.9}, num_epoch=n_epoch)

　　分析一下：首先是定义了一个简单的MLP，symbol的名字就叫做out，然后可以直接用mx.mod.Module来创建一个mod。之后mod.bind的操作是在显卡上分配所需的显存，所以我们需要把data_shapehe label_shape传递给他，然后初始化网络的参数，再然后就是mod.fit开始训练了。这里补充一下。fit这个函数我们已经看见两次了，实际上它是一个集成的功能，mod.fit（）实际上它内部的核心代码是这样的：

for epoch in range(begin_epoch, num_epoch): tic = time.time() eval_metric.reset() for nbatch, data_batch in enumerate(train_data): if monitor is not None: monitor.tic() self.forward_backward(data_batch) #网络进行一次前向传播和后向传播 self.update() #更新参数 self.update_metric(eval_metric, data_batch.label) #更新metric if monitor is not None: monitor.toc_print() if batch_end_callback is not None: batch_end_params = BatchEndParam(epoch=epoch, nbatch=nbatch, eval_metric=eval_metric, locals=locals()) for callback in _as_list(batch_end_callback): callback(batch_end_params)

　　正是因为module里面我们可以使用很多intermediate的interface，所以可以做出很多改进，举个最简单的例子：如果我们的训练网络是大小可变怎么办？我们可以实现一个mutumodule，基本上就是，每次data的shape变了的时候，我们就重新bind一下symbol，这样训练就可以照常进行了。

　　总结：实际上学一个框架的关键还是使用它，要说诀窍的话也就是多看看源码和文档了，我写这些博客的目的，一是为了记录一些东西，二是让后来者少走一些弯路。所以有些东西不会说的很全。。