Skip to content
kaishengyao edited this page Jun 8, 2016 · 21 revisions

Welcome to the CNN wiki!

Builds

How to build double precision

CNN by default use single (float) precision. To build a double precision, uncomment the following macro in cnn/macros.h

//    #define USE_DOUBLE

After this, recompile your code.

A second way, which doesn't involve code change, is using the following cmake setup

cmake .. -DUSE_DOUBLE=USE_DOUBLE 

The above cmake command will generate c++ solution files that enable the macro USE_DOUBLE.

How to build code to support UTF8 and Unicode text input

Make sure to have -DINPUT_ENCODING=INPUT_UTF8 when using cmake. For example,

cmake .. -DEIGEN3_INCLUDE_DIR=c:/tools/eigen-eigen-fd9611fa2d9c -G "Visual Studio 12 Win64" -DINPUT_ENCODING=INPUT_UTF8

How to build code for release

Notice that by default, the build is for release with debug after a recent change of CMakeLists.txt. Before this change, the Linux compiler is by default for debug mode.

To generate code for release, need to specify the following

 -DCMAKE_BUILD_TYPE=Release

How to build code that uses CPU to host lookup table parameters and gradients

Some models may require very large memory, which can exhaust GPU memory. One solution is hosting lookup table parameters and their gradients on CPU. Since gradients are sparsely computed for the lookup tables, gradients for backpropagation are only allocated on GPU as required.

To enable building such code, use -DLOOKUP_AT_CPU=USE_CPU_FOR_LOOKUP_PARAM when compiling

For example

 cmake .. -DLOOKUP_AT_CPU=USE_CPU_FOR_LOOKUP_PARAM

How to build code on a GPU with small memory

It is possible to have to work on small GPU. A macro for such situation is SMALL_GPU. To enable this macro, do the following when calling cmake.

 cmake .. -DSMALL_GPU=SMALL_GPU

Coding

How to build data-parallel training

The following explains data-parallel training setup in the context of encoder/decoder.

First, need to tell encoder/decoder that they will work on multiple sentences instead of one, which is the default.

     Encoder.new_graph(cg);
     Encoder.set_data_in_parallel(C);
     Decoder.new_graph(cg);
     Decoder.set_data_in_parallel(C);

C is the number of sentences to be processed in parallel.

The input data to encoder is organized in {T,C,H,W} layout, following cuDNN terminology. The right most is the fastest dimension. By default, CNN uses W = 1. C is the number of sentences. T is the length of the sentences. In the case of C=3, the data looks like [x0 y0 z0 x1 y1 z1 x2y2 z2…], where x,y,z denote sentences. 0,1,2 denote their time/frame instances.

If encoder/decoder are RNNs, we unroll the data to have T-size vector of Expression.

     vector<Expression> source(T);

Then feed the data to the encoder

 for (int t = 0; t < T; ++t) {
     src_fwd[t] = Encoder.add_input(source[t]);
 }

Then extract the last state of encoder.

vector<Expression> encoder_representation = Encoder.final_s();

This encoder_representation is an Expression with HxC dimension.

Then initialize decoder

 Decoder.start_new_sequence(encoder_representation)

Then, use decoder as follows.

     // oslen is the maximum length of the decoder generations
     for (int t = 0; t < oslen; ++t) {
         vector<Expression> vobs;
         for (auto p : osent)
         {
             if (t < p.size())
                 vobs.push_back(lookup(cg, p_cs, p)); /// add true value
             else
                 vobs.push_back(input(cg, {D}, &zero);   /// add a dummy value
         }
         Expression obs = concatenate_cols(v_x_t);

         /// obs is the data 
         Expression i_y_t = Decoder.add_input(obs);
         Expression i_r_t = i_bias_mb + i_R * i_y_t;  /// transform to the space for softmax
         Expression i_ydist = log_softmax(i_r_t);     /// do log-softmax
         Expression r_r_t = reshape(i_ydist, {vocab_size * nutt});  /// prepare to extract training signal

         for (size_t i = 0; i < C; i++)
         {
             int offset = i * vocab_size;
             if (t < osent[i].size() - 1)
             {
                 /// only compute errors on with output labels
                 this_errs[i].push_back( - pick(r_r_t, offset + osent[i][t + 1]));
             }
         }
     }

Notice that the lengths of output sequence is used to decide the position to compute cost function.

Profiling

How to generate pdb file for release build

First need to setup in Visual Studio as follows. Then, build with release mode. The pdb file is in the same directory as exe. When doing profiling, use Analyze->Profile->Attach.

Linker->Debugging->Generate Debug Info = Yes (/DEBUG)
Linker->Optimization->Reference = Yes (/OPT:REF)
Select /OPT:ICF
Compiler : /Zi /Zl 
Linker : /machine:x64 /debugtype:cv 

versions

v0.5 : naacl 2016 submission uses this version.