Module1 - Mtrl - Deep learning for perception
By now you cannot have missed the enormous impact that deep learning have had on many domains in the last years. If you have somehow missed this, take a look at one of the many pages Links to an external site. exemplifying the achievements.
Many of you are probably using deep learning already or will use it in your research. Here we will mostly consider the deep learning tools as black boxes and see what they can do for us. We will come back for more details in the next course (Autonomous Systems II) that will be offered in the Autumn of 2018.
For a very quick introduction to deep learning, and convolutional neural networks in particular, take a look at the following two videos
Introduction to Deep Learning: What Is Deep Learning?
Links to an external site. (3:33min)
Introduction to Deep Learning: What are Convolutional Neural Networks?
Links to an external site. (4:44min)
There is an abundance of deep learning tools out there. In this course we will make use of TensorFlow. Check out these two videos
Google's short intro/promo to TensorFlow
Links to an external site. (2:17min)
Introduction to TensorFlow
Links to an external site. (5:37min)
In a time where deep learning seems to be the solution to everything it is worth taking a step back and for example consider what is said in the following video (especially what comes around 11:20 into the video)
Ali Rahimi's talk at NIPS 2017 (NIPS 2017 Test-of-time award presentation)
Links to an external site.
Installing TensorFlow
TensorFlow runs on all platforms and it has a C/C++ and a python interface.
We will stick to our Ubuntu environment and use python here but take a look at https://www.tensorflow.org/install/ Links to an external site. and you will see how to install it on other platforms.
If you have a GPU or more on your setup it is well worth to use, you can speed things up significantly if you switch from the CPU version of TensorFlow to the GPU version. See the installation instructions above for how to modify you installation.
What is presented below was tested on a laptop with an Intel i7-4510U CPU @ 2.00GHz and 8GB RAM.
Open a terminal (if you do not know how to, you should go back to the page Course computer environment)
Create a directory where we will install the TensorFlow python stuff not to mess up any other things using python or let other things influence this installation for this matter. We will use the python virtualenv for this.
cd ~ virtualenv --system-site-packages ~/tensorflowenv
Activate the virtualenv
source ~/tensorflowenv/bin/activate
This should change the prompt in the terminal to be prefixed with (tensorflowenv).
Now install the TensorFlow binary in this virtualenv
pip install --upgrade tensorflow
pip install --upgrade tensorflow_hub
Deactive the virtualenv
deactivate
We will also get hold of the full source of TensorFlow and some additional code for some of the experiments below.
cd ~ git clone https://github.com/tensorflow/tensorflow
cd ~/tensorflow/
git clone https://github.com/tensorflow/hub Links to an external site.
and finally we will download the trained TensorFlow models
cd ~/
git clone https://github.com/tensorflow/models tensorflow-models
Image recognition
Image recognition comes in handy in many tasks. The example below comes from this TensorFlow tutorial on image recognition.
Let us now try to classify an image. Do not forget to activate the tensorflowenv environment again. If you forget the system is likely going to complain that you have not installed packages such as numpy which was done inside the virtualenv called tensorflowenv
source ~/tensorflowenv/bin/activate
Now move to the ImageNet Links to an external site. based models inside the TensorFlow model directory you downloaded
cd ~/tensorflow-models/tutorials/image/imagenet
Run the classification program called classify_image.py. The first time you run the program it will download the TensorFlow models which will take a while depending on your connection. The program is given the following image as input by default.
python classify_image.py
Download another image and try to classify it. Assuming that you have downloaded an image called car.jpeg into the Download folder you would classify it using the following command
python classify_image.py --image ~/Downloads/car.jpeg
Now try to classify an image that you grabbed with you camera (remember what you learned here Course computer environment about grabbing images from the camera).
Q1: What happens when there are many different types of objects in an image?
Q2: How many and what classes can this system recognize? What happens when faced with other classes?
Retrain the network
Now let us retrain this network so that it can recognize something new. We will start with retraining it on five kinds of flowers based on information from these this tutorial (Link Links to an external site.).
First download the dataset with flowers (218MB).
cd ~
curl -O http://download.tensorflow.org/example_images/flower_photos.tgz tar xzf flower_photos.tgz
rm flower_photos.tgz
(The -O flag is the letter O not the number 0). Now we are ready to retrain. This will take a few minutes depending on the hardware you have. It runs 4000 steps. We begin with standard parameters.
mkdir ~/retrain
cd ~/tensorflow
python hub/examples/image_retraining/retrain.py --output_labels ~/retrain/output_labels.txt --output_graph ~/retrain/output_graph.pb --image_dir ~/flower_photos --architecture mobilenet_1.0_224
To start with the retraining will go through the training images to prep them. After that the actual training starts. When this starts you can monitor the progress using the tensorboard tool. Open a terminal (right-click in your terminator and split it for example). Activate tensorflowenv there and then execute the following command
tensorboard --logdir /tmp/retrain_logs/
Now open a browser and point it to http://localhost:6006 Links to an external site.
Q3: What accuracy did you get with these settings?
Now download images of roses, dandelions, sunflowers, tulips and daises. You can try other flowers or objects as well and see what happens. Assuming that you have downloaded an image called rose1.jpg you can try to classify it with the following command
cd ~/tensorflow
python tensorflow/examples/label_image/label_image.py --graph ~/retrain/output_graph.pb --labels ~/retrain/output_labels.txt --input_layer=Placeholder --output_layer=final_result --image ~/Downloads/rose1.jpg
Note that the values for input_width and input_height must match the architecture parameter that you fed into the retraining script (224 in the example above).
Q4: How well does it do? What type of images does it handle well and less well?
You can add a lot more parameters to the retraining to make use of data augmentation, modify learning rates, etc. Take a look at this page Links to an external site. for some ideas. NOTE: The values used below are taken from another example so they most likely need to be modified for the example in your case.
cd ~/tensorflow
python tensorflow/examples/image_retraining/retrain.py --output_graph ~/retrain/output_graph2.pb -output_labels ~/retrain/output_labels2.txt --image_dir ~wasp/flower_photos --learning_rate=0.0001 --testing_percentage=20 --validation_percentage=20 --train_batch_size=32 --validation_batch_size=-1 --flip_left_right True --random_scale=30 --random_brightness=30 --eval_step_interval=100 --how_many_training_steps=600 --architecture mobilenet_1.0_224
Note that if you want to test this model you need to tell label_image.py to use the new model (we gave it an extra 2 in the filename in the last command).
Q5: What accuracy are you able to achieve on the validation set and with what settings and why? Make a post in this discussion forum about it. When you have posted there you can see how others have done as well.
Image recognition with your camera (extra)
Here we assume that you have gone through the steps in Course computer environment regarding getting an image into ROS.
In the first terminal run
roscore
In a second terminal run the following commands
rosparam set cv_camera/device_id 0
rosrun cv_camera cv_camera_node
If you have built-in camera and the USB webcam you probably want to change the device id to 1.
Now we are ready to test some TensorFlow stuff on the image from our camera. In a third terminal start by downloading some further software
cd ~
git clone https://github.com/OTL/rostensorflow.git
cd rostensorflow
then make sure that you have activated the python environment containing the tensorflow binaries
source ~/tensorflowenv/bin/activate
and then run the image recognition code
python image_recognition.py image:=/cv_camera/image_raw
The first time you run this it will download a relatively large model so give it some time. Once started it will output the best hypothesis in the terminal.
Q6: Can you modify the code so that it uses your network instead?
Object detection and recognition
In what we have seen so far we have classified whole images as being/containing a certain class. In many cases images contain several objects and you want to know where in the images an object is and what object it is.
We will use a jupyter notebook for this, provided inside the TensorFlow models directory we already downloaded before. What is described below can also be found in this tutorial. Links to an external site.
Installation
What is described in this section is something you do once.
source ~/tensorflowenv/bin/activate
cd ~/tensorflow-models/research/
You need to modify your PYTHONPATH, i.e., the system variable that tells python where to look for things
export PYTHONPATH=$PYTHONPATH:/home/wasp/tensorflow-models/research:/home/wasp/tensorflow-models/research/slim
You need to make sure that this is run in every window that you want to use. You can ensure that this is taken care of automatically by pasting this line in at the end of your /home/wasp/.bashrc file. Every new terminal that you open will have the PYTHONPATH set with this.
Now we need to expand our Ubuntu installation a bit as described by the installation instructions, which you can read with
more ~/tensorflow-models/research/object_detection/g3doc/installation.md
What you need to do is
sudo apt-get install protobuf-compiler
This might prompt for your password at which point you just type it. Then it will present you with the packages that will be installed (might be more than the ones you specified because of dependencies. Press y to confirm if needed.
source ~/tensorflowenv/bin/activate
pip install pillow
pip install lxml
pip install jupyter
pip install matplotlib
cd ~/tensorflow-models/research/
protoc object_detection/protos/*.proto --python_out=.
Running
We are now ready to run!
Make sure that you have set your PYTHONPATH as described above. You can verify that it is set by issuing
echo $PYTHONPATH
which should then contain the tensorflow-models directory as described above.
Make sure your virtualenv is activated and step into the directory where the action will take place.
source ~/tensorflowenv/bin/activate
cd ~/tensorflow-models/research/object_detection/
Now start the jupyter notebook with
jupyter notebook object_detection_tutorial.ipynb
This should bring up a browser window as shown below.
You run the notebook by stepping through the section with the "Run" button. The code snippets will first show a * as the code is run and then it will turn into a number. Give it some time and pay attention to what is printed in the terminal if something seems to go wrong. It will download things, which could take some time if your connection is slow. Sections 5 and 6 ("Download Model" and "Load a (frozen) Tensorflow model into memory") and Section 10 is doing the classification so it will take a while too. If all went well, you should see images at the bottom of the notebook. (Scroll down so you see all three images, not only the dogs.)
Download another test image and call it image3.jpg and place it in the directory test_images. Assuming you downloaded an image called zimmerman.jpg into your Downloads folder, this would be done with
cp ~/Downloads/zimmerman.jpg ~/tensorflow-models/research/object_detection/test_images/image3.jpg
Modify the code in the first part of the detection code in the notebook, the one that defines the testimage by changing the interval from (1, 3) to (1, 4) so that all three images are loaded. If you have more images or only want to work with a subset of them, you could modify this to have indices (from, to+1).
Q7: How well does it work?
Q8 (extra): Try with other models as described in the tutorial Links to an external site..
If you would like to get back the original version of the notebook file you can always get the version from the repository again. NOTE that this will erase any changes you made.
git checkout object_detection_tutorial.ipynb
Audio Recognition
So far we have been looking at images only. Deep learning has also been used very successfully in, for example, text and speech processing. In the following we will go through the first part of Simple Audio Recognition tutorial Links to an external site. from TensorFlow.
We first need to train a model. This will take a lot of time (about 30h on the laptop that this was tested on) so we recommend that you start this and let it sit in the background when you do other things, why not move to the next topic and come back maybe? Note that the script will start by downloading a big chunk of data (1GB or so) which will also take a while depending on your connection but nowhere near as long as the training.
source ~/tensorflowenv/bin/activate
mkdir ~/speech
mkdir ~/speech/commands_train
mkdir ~/speech/dataset
python ~/tensorflow/tensorflow/examples/speech_commands/train.py --data_dir ~/speech/dataset --train_dir ~/speech/commands_train
After about 30h, give or take 29h or so depending on your hardware setup, this will have finished after 18000 steps. Now you know why people use GPUs and cloud computing to accelerate the training :-) Don't worry, you do not have to wait until it is completely finished to test it as you will see a bit further down.
After the training has finished its 18000 steps, we will put the model in a bit more compact format (remember to activate your tensorflowenv if you are in another window before doing the following).
python ~/tensorflow/tensorflow/examples/speech_commands/freeze.py --start_checkpoint ~/speech/commands_train/conv.ckpt-18000 --output_file ~/speech/my_frozen_graph.pb
If you do not have the patience to wait until it runs all 18,000 steps, you can take a look in your ~/speech/commands_train folder and see when you start having checkpoint files
wasp@WASPPC:~$ ls ~/speech/commands_train/
checkpoint conv.ckpt-5600.meta conv.ckpt-5800.meta
conv.ckpt-5500.data-00000-of-00001 conv.ckpt-5700.data-00000-of-00001 conv.ckpt-5900.data-00000-of-00001
conv.ckpt-5500.index conv.ckpt-5700.index conv.ckpt-5900.index
conv.ckpt-5500.meta conv.ckpt-5700.meta conv.ckpt-5900.meta
conv.ckpt-5600.data-00000-of-00001 conv.ckpt-5800.data-00000-of-00001 conv_labels.txt
conv.ckpt-5600.index conv.ckpt-5800.index conv.pbtxt
You can also look at the progress with tensorboard and see when the changes are not so big.
python ~/tensorflow/tensorflow/examples/speech_commands/freeze.py --start_checkpoint ~/speech/commands_train/conv.ckpt-5900 --output_file ~/speech/my_frozen_graph.pb
(assuming checkpoint 5900 has been produced).
If you stopped the training at some point and wanted to resume it you can do so from a checkpoint, for example, like this
python ~/tensorflow/tensorflow/examples/speech_commands/train.py --data_dir ~/speech/dataset --train_dir ~/speech/commands_train --start_checkpoint ~/speech/commands_train/conv.ckpt-5900
Now let us test it. You have tons of wav files in the dataset. Play them using, for example,
aplay ~/speech/dataset/left/a5d485dc_nohash_0.wav
If you want to hear all of the left sound files you can do
aplay ~/speech/dataset/left/*.wav
Press ctrl-c if you want to stop before all files are played.
If you want to hear other sounds, simply modify the above command and change left to some other folder. If you press TAB, the terminal will show you which files are available.
Now let us try to do some recognition.
python ~/tensorflow/tensorflow/examples/speech_commands/label_wav.py --graph ~/speech/my_frozen_graph.pb --labels ~/speech/commands_train/conv_labels.txt --wav ~/speech/dataset/left/a5d485dc_nohash_0.wav
Try with a few other files and then try with your own wav files that you record. Go back to Course computer environment to remind you how to do that.
Q9: How well does it work?
Q10: What words should the model be able to recognize?
Q11: How does this differ from what you did with images? What type of preprocessing is done?
EXTRA: There is plenty more cool stuff to test on the tutorial page Links to an external site. so please dive in to learn more if you have time.
Want more
Go to the TensorFlow tutorial page Links to an external site. where there is plenty of more
You can also have a look inside the folder with tensorflow models.
cd ~/tensorflow-models/research/object_detection/
and browse around
Want more data, there is plenty of it out there, for example, Link Links to an external site.