-
Notifications
You must be signed in to change notification settings - Fork 8
Home
CNN by default use single (float) precision. To build a double precision, uncomment the following macro in cnn/macros.h
// #define USE_DOUBLE
After this, recompile your code.
A second way, which doesn't involve code change, is using the following cmake setup
cmake .. -DUSE_DOUBLE=USE_DOUBLE
The above cmake command will generate c++ solution files that enable the macro USE_DOUBLE.
Make sure to have -DINPUT_ENCODING=INPUT_UTF8 when using cmake. For example,
cmake .. -DEIGEN3_INCLUDE_DIR=c:/tools/eigen-eigen-fd9611fa2d9c -G "Visual Studio 12 Win64" -DINPUT_ENCODING=INPUT_UTF8
Notice that by default, the build is for release with debug after a recent change of CMakeLists.txt. Before this change, the Linux compiler is by default for debug mode.
To generate code for release, need to specify the following
-DCMAKE_BUILD_TYPE=Release
Some models may require very large memory, which can exhaust GPU memory. One solution is hosting lookup table parameters and their gradients on CPU. Since gradients are sparsely computed for the lookup tables, gradients for backpropagation are only allocated on GPU as required.
To enable building such code, use -DLOOKUP_AT_CPU=USE_CPU_FOR_LOOKUP_PARAM when compiling
For example
cmake .. -DLOOKUP_AT_CPU=USE_CPU_FOR_LOOKUP_PARAM
It is possible to have to work on small GPU. A macro for such situation is SMALL_GPU. To enable this macro, do the following when calling cmake.
cmake .. -DSMALL_GPU=SMALL_GPU
The following explains data-parallel training setup in the context of encoder/decoder.
First, need to tell encoder/decoder that they will work on multiple sentences instead of one, which is the default.
Encoder.new_graph(cg);
Encoder.set_data_in_parallel(C);
Decoder.new_graph(cg);
Decoder.set_data_in_parallel(C);
C is the number of sentences to be processed in parallel.
The input data to encoder is organized in {T,C,H,W} layout, following cuDNN terminology. The right most is the fastest dimension. By default, CNN uses W = 1. C is the number of sentences. T is the length of the sentences. In the case of C=3, the data looks like [x0 y0 z0 x1 y1 z1 x2y2 z2…], where x,y,z denote sentences. 0,1,2 denote their time/frame instances.
If encoder/decoder are RNNs, we unroll the data to have T-size vector of Expression.
vector<Expression> source(T);
Then feed the data to the encoder
for (int t = 0; t < T; ++t) {
src_fwd[t] = Encoder.add_input(source[t]);
}
Then extract the last state of encoder.
vector<Expression> encoder_representation = Encoder.final_s();
This encoder_representation is an Expression with HxC dimension.
Then initialize decoder
Decoder.start_new_sequence(encoder_representation)
Then, use decoder as follows.
// oslen is the maximum length of the decoder generations
for (int t = 0; t < oslen; ++t) {
vector<Expression> vobs;
for (auto p : osent)
{
if (t < p.size())
vobs.push_back(lookup(cg, p_cs, p)); /// add true value
else
vobs.push_back(input(cg, {D}, &zero); /// add a dummy value
}
Expression obs = concatenate_cols(v_x_t);
/// obs is the data
Expression i_y_t = Decoder.add_input(obs);
Expression i_r_t = i_bias_mb + i_R * i_y_t; /// transform to the space for softmax
Expression i_ydist = log_softmax(i_r_t); /// do log-softmax
Expression r_r_t = reshape(i_ydist, {vocab_size * nutt}); /// prepare to extract training signal
for (size_t i = 0; i < C; i++)
{
int offset = i * vocab_size;
if (t < osent[i].size() - 1)
{
/// only compute errors on with output labels
this_errs[i].push_back( - pick(r_r_t, offset + osent[i][t + 1]));
}
}
}
Notice that the lengths of output sequence is used to decide the position to compute cost function.
First need to setup in Visual Studio as follows. Then, build with release mode. The pdb file is in the same directory as exe. When doing profiling, use Analyze->Profile->Attach.
Linker->Debugging->Generate Debug Info = Yes (/DEBUG)
Linker->Optimization->Reference = Yes (/OPT:REF)
Select /OPT:ICF
Compiler : /Zi /Zl
Linker : /machine:x64 /debugtype:cv
v0.5 : naacl 2016 submission uses this version.