This tutorial is based on tutorial TensorFlow for Mobile Poets.
There are three steps of optimization:

  1. Basic optimization for inference
  2. Quantization (reduces compressed size of graph)
  3. Memory mapping (improves stability)

For tutorial is used retrained_graph.pb from tensorflow image retraining example.
Let’s start from easy way.

Easy way

Download tensorflow sources

git clone -b 0.12.1 --depth=1 https://github.com/tensorflow/tensorflow.git

Optimize graph

python ./tensorflow/tensorflow/python/tools/optimize_for_inference.py \
--input=retrained_graph.pb \
--output=optimized_graph.pb \
--frozen_graph=True \
--input_names=Mul \
--output_names=final_result

Quantize graph

python ./tensorflow/tensorflow/tools/quantization/quantize_graph.py \
--input=optimized_graph.pb \
--output=rounded_graph.pb \
--output_node_names=final_result \
--mode=weights_rounded

Memory mapping in Docker container via pre-built binary.

# run docker container
docker run --rm -it -v $PWD:/tf_files ubuntu:16.04

# download pre-built binary
apt-get update && apt-get install -y wget
wget https://github.com/dato-ml/stuff/releases/download/0.0.1/convert_graphdef_memmapped_format_0.12.1.tar.gz
tar -zxf convert_graphdef_memmapped_format_0.12.1.tar.gz

# memory mapping
./convert_graphdef_memmapped_format_0.12.1 \
--in_graph=./tf_files/rounded_graph.pb \
--out_graph=./tf_files/mmapped_graph.pb

exit

Hard way

Only if you like long builds, please welcome.
Note: result will be the same.

# run docker container
docker run -it -p 8888:8888 -v $PWD:/tf_files tensorflow/tensorflow:0.12.1-devel

cd /tensorflow/
# fix dependency url
sed -i 's_http://zlib.net/zlib-1.2.8.tar.gz_http://zlib.net/fossils/zlib-1.2.8.tar.gz_' tensorflow/workspace.bzl

# build optimization graph tool
bazel build tensorflow/python/tools:optimize_for_inference \
-c opt --copt=-mavx --verbose_failures \
--local_resources 2048,2.0,1.0 -j 1

# optimize graph
bazel-bin/tensorflow/python/tools/optimize_for_inference \
--input=/tf_files/retrained_graph.pb \
--output=/tf_files/optimized_graph.pb \
--input_names=Mul \
--output_names=final_result

# build quantization graph tool
bazel build tensorflow/tools/quantization:quantize_graph \
-c opt --copt=-mavx --verbose_failures \
--local_resources 2048,2.0,1.0 -j 1

# quantize graph
bazel-bin/tensorflow/tools/quantization/quantize_graph \
--input=/tf_files/optimized_graph.pb \
--output=/tf_files/rounded_graph.pb \
--output_node_names=final_result \
--mode=weights_rounded

# build memory mapping tool
bazel build tensorflow/contrib/util:convert_graphdef_memmapped_format \
-c opt --copt=-mavx --verbose_failures \
--local_resources 2048,2.0,1.0 -j 1

# memory mapping
bazel-bin/tensorflow/contrib/util/convert_graphdef_memmapped_format \
--in_graph=/tf_files/rounded_graph.pb \
--out_graph=/tf_files/mmapped_graph.pb

exit

Created mmapped_graph.pb you can use in mobile apps.

Image