Using Kaggle datasets in Colab

Here's how I got Kaggle datasets into Colab.

The book I'm following referred to the Dogs vs Cats dataset. After creating an account, I went to the dataset page.

Dogs vs. Cats | Kaggle
Create an algorithm to distinguish dogs from cats

One way would be to download it to my laptop, then upload it to Colab, but it would be more straightforward to download it once in Colab instead.

You can easily download the dataset to your machine

It turns out Kaggle made a CLI tool to download datasets from the commandline:

In Colab, I ran:

!pip install kaggle

Then I went to my account settings on Kaggle, and created an API key, as the doc suggests.

Finally I set up the .kaggle/kaggle.json file on Colab:

!mkdir .kaggle
!echo '{"username":"...","key":"..."}' > .kaggle/kaggle.json
!cat .kaggle/kaggle.json

But running !kaggle competitions download -c dogs-vs-cats would still complain that the file couldn't be found. It also suggests an alternative with environment variables. On the GitHub repo they also show how to use environment variables: https://github.com/Kaggle/kaggle-api#api-credentials.

However in Colab, environment variables are set up with %env instead:

%env KAGGLE_USERNAME=...
%env KAGGLE_KEY=...

That did the trick!

Once downloaded, unzip it and you're good to go.