In the previous post I have demonstrated some very basic EDA visualisation methods. They are helpful when a data scientist was trying to gain a better understanding of the data structure. The focus of this post would be to visualise geographical data in Tableau in order to verify the claim by the game developer that different Pokemon appear in different environment! Data is provided by user kveykva in Kaggle and again, this analysis was carried out as a team effort with Dorian, Revathy and Andrea.
Tableau is a very popular commercial data visualisation tool that is widely used in different industries. It is relatively easy to work with because it can accommodate users with different technical expertise. Business users can use their Excel files whilst advanced machine learning engineers can link Tableau to a remote SQL database. In both cases they will be able to produce great visualisations easily. Tableau employs a drag and drop operation hence users are not required to do any coding. This can greatly simplify the workflow in producing visualisations.
However Tableau also has its downside. One major downside being it is difficult to alter data once they are loaded into the application. I find it much easier to finish the data munging process before importing the resultant dataset into Tableau. Another downside is that it takes a bit of time to get used to how Tableau works. This is because Tableau’s user interface is not the most intuitive.
Merging datasets in Tableau
In order to obtain potentially interesting insight, two data sets are to be merged. The first one is the file that contains all the geographical data the the other is the CSV that we worked on in the previous post. After loading up Tableau, choose the “More…” option under the title Connect and pick your data file. Joining data set is relatively easy in Tableau. First make sure all the joining files are in the same directory. Then after loading up your first file, the other should appear under the “Files” column. After that all we have to do is drag the joining file and drop it next to the current file. We can see that Tableau allows us to specify the type of join used in the merge. We can also define the join clause. This is a simple concept if one is familiar with SQL.
Having merged the datasets we can now start exploring. It is good to familiarise ourselves with the basics of Tableau. This can be done for example, by trying to recreate plots in the previous post. First we have to switch to a worksheet using the tabs at the bottom. You can either create a new sheet or use the preloaded sheet(which should be empty). The following diagram shows the empty sheet layout.
All the column titles in the data sets are listed on the left hand side. All we need to do for plotting is to drag the correct data into the relevant places. Here is a good rule of thumb that I find useful. “Whatever that is you want to appear on the x-axis, put it in the Columns slot”. For example if we would like to re-create the box plot form the previous post, we would drag ‘Type 1’ in the ‘Columns’ slot and ‘Max HP’ in the ‘Rows’ slot. Note that sometimes the data are aggregated and become, for example ‘SUM(Max HP)’. This can be changed by right clicking the tag and select the proper option. To quickly select plotting type, a user can click the ‘Show Me’ button on the top right and it would allow the user to choose the desired plot type. In this case we would pick the ‘Box-and-wisker-plot’ option. The resulting plot is essentially the one shown in the previous post (apart from the colour and order).
Using a similar logic, we can plot the geographical data in Tableau. First we have to make sure they are of the correct type. Right click on the relevant item (‘lat’ and ‘lng’) and set their Geographical Roles to latitude and longitude respectively. Then drag ‘lat’ into the ‘Rows’ slot and ‘leg’ into the ‘Columns’ slot. Tableau will then come up with a map automatically! Then we can follow this tutorial to overlay Mapbox’s background maps to make the plot look more colourful. The resultant plot looks like this.
It turns out that the data has recorded the appearance of Pokemon in different places. In order for us to have a clearer view, it is decided that only the records near San Fransisco Bay area would be investigated. Since we would like to look at the relationship between the type of Pokemon and the environment that it can be found, the ‘Type 1’ tag is dragged to the ‘Mark’ shelf and the ‘Filter’ shelf. Inside the ‘Mark’ shelf the user can edit the colours for different types of Pokemon and this is convenient as the default colour scheme is not intuitive. Similarly we can also alter the sizes of the data points too.
Using the ‘Filter’ shelf we can choose to display the selected Pokemon Types only. This results in the following interesting plots!
This is a zoomed in plot showing the distribution of Pokemon of type ‘Water’. It can be clearly seen that the coastline (W and NE of San Fransisco) has a very high density of them! Apart from that they appear more frequently near little ponds or generally where you expect a fair amount of water.
Similarly (although not as obviious) grass Pokemon appears more near parks than in cities as seen in the figure.
Pokemon that are of similar types usually appear in proximity with each other. For example the above figure shows the distribution of Rock and Ground type Pokemon. The majority of them appears on the east side of the city. It is also worth to note that the east is relatively hilly as well.
On the other hand Pokemon of ‘opposite’ type tend to appear in different places as the diagram shown above. Water Pokemon can easily be seen in the northern regions whilst Fire Pokemon are way more common on the south.
However there are some types of Pokemons that are not very affected by the geographical locations. For example ‘Normal’ type Pokemons are in fact quite normal. They can be seen all across the recorded area.
Bugs are also everywhere!
Another interesting observation from this geographical plot is that there are certail areas that have absolutely no traces of Pokemon. For example the airport areas are free of any Pokemons. Considering the safety of air operations, this is a good sign. It also demonstrate that the developer is able to influence where Pokemons appear (or not appear)!
This blog post has briefly demonstrated the power of Tableau in plotting different data in various forms. It is a nice to use package in terms of producing beautiful visualisation but it is quite limited in terms of data manipulations. At the end we are able to verify the claim by the developer that Pokemon of different types appear in different environments. Despite some of the draw backs in Tableau, it is still a very valuable tool for a data scientist as visualisation can greatly improve the effectiveness of communication, and communication is arguably the most important part in a data science role!