The Eurotrip Planner- Part 1

python, coding

Holiday Project 2019 #

When my friend came to Berlin visit me this holidays, I was secretly relieved. You know how much it sucks to be away from your family during the holidays? (A lot!) During our conversations, we realized that we could plan for a nice, short EuroTrip in January. We looked for different countries, airports, flights but in the end, the prices were, as usual, terribly expensive. But hey! we just wasted our 1 hour for nothing 😪. Then it struck me, why not write a program to do all the dirty work for us and we find the ideal Eurotrip (and some other vacations as well) quickly? After all, the sequence steps are the same most of the times. It will be the perfect opportunity to learn something new and make plans for future trips a lot easier! And so I did!

This is the story of a holiday code, but it also touches a lot of different concepts in software engineering. We will also use a number of tools like APIs, HTTP Requests, JSON , MongoDB, Docker and so on. I will explain how I progressed with the idea and clarify the important stuff as we go. I’ll also link all the important resources and publish the code on Github for anyone to try by themselves. It might not be something you can create a product out of or even something utterly useless for practical use. But maybe we can also learn about how to design the whole software stack. In the end, it’s just a toy problem to have fun! So without further adieu, let’s go!

The Problem #

First, let me explain what I had in mind. Consider that we are planning for a nice, 7 day, multi-city vacation in Spain. There are a few airports near me from which it is feasible to take off and return back to, for example, Berlin Tegel and Berlin Schoenefeld. Let’s call them “source” airports. We have a bunch of places (or “destinations”) in Spain on our bucket list: Seville, Madrid, Barcelona, and Valencia. Now, I prefer to use buses and trains when travelling to close cities like these. In this way, you can enjoy the country and it is often the cheaper, environment-friendly and convenient way. But this means that we can start our trip in one destination and end it in another destination. And as cheap airlines like Ryanair and Easyjet have no discount for return flights, we can go for multi-city flights! For example, I can go from Berlin → Barcelona and come back from Madrid → Berlin. Now let’s talk about the dates of the travel. Generally, I have a range of days ( say 1st to 15th Jan), of which I can only spare 7 days for the trip. Which means I can fly in 1 to 7 Jan and come back on 7 to 15 Jan, depending on the takeoff and arrival days. The goal here is to find the cheapest flight combination which satisfies our trip conditions.

The solution: Digging around #

I was surprised to find out that none of the major booking companies does what I exactly want to do. Kayak comes the closest. You can select +-3 days for your travel and then that it finds the cheapest. kind of . I don’t want a flight planner. I want a trip planner. So I decided to write one on my own. But where to start? Option 1 is to write a web crawler In Selenium or similar which crawls sites like these for all our queries. But crawlers are not reliable, there might be captcha and even the slightest change in the website can render our crawler useless. Also, I am not so proficient in JavaScript, which is almost essential for writing a crawler. Thankfully, we can avoid all of this mess by simply going by the other route, APIs.

Fun With APIs!

Application Programming Interface or API for short is sort of functions as a service. Let’s say you want to look up the current temperature. For this, you go visit sites such as weather.com or Accuweather. You give your city or zip code and the website shows you current and forecast weather, along with nice animations, graphs and symbols. For humans, this is intuitive. But if you want to write a program to turn up the heater based on current temperature, this extra information is quite useless. We simply send a pure GET request on sites like Accuweather with our location specifics and it will return the weather in machine-readable JSON format. Easy - peasy! Check out this blog which explains in detail about what is an API and JSON. And if you are interested, this is the API for Accuweather.

Likewise, using the free Skyscanner API, we can request for Quotes of future flight routes. A query can be simply asking what are the available flights between A to B on a given day (dd-mm-yyyy) . We get back results with the minimum price offered on Skyscanner. The API is available through RapidAPI broker acting as a middleman. It is also available in several programming languages and you can check it out here. I am sticking with Python as it is a really good language to set up something working, quickly. I could go on and on about design choices but things will get more clear as we go on. Let’s start coding!

To begin, I created a simple search for flights going from Berlin (BERL-sky) to London (LON-sky) on 22 Jan using requests library in Python. (Note that header contains custom API key which you get when signing up with RapidAPI)

import requests, json
headers = {
    'x-rapidapi-host': "skyscanner-skyscanner-flight-search-v1.p.rapidapi.com",
    'x-rapidapi-key': "YOUR-CUSTOM-API-KEY"
    }

Now I am making a “GET” HTTP request to the specific URL endpoint mentioned in myurl variable. We will get a JSON response.

origin = "BERL-sky"
destination = "LOND-sky"
currancy = "EUR"
originCountry = "DE"
locale = "en-US"

myurl = "https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com/apiservices/browsequotes/v1.0/" + originCountry + "/" + currancy + "/" + locale + "/"  + destination + "/" + origin + "/"+ "2020-01-22"
response = requests.request("GET", myurl, headers=headers)
print(response.text)
{"Quotes":[{"QuoteId":1,"MinPrice":18.0,"Direct":true,"OutboundLeg":{"CarrierIds":[1090],"OriginId":82398,"DestinationId":82582,"DepartureDate":"2020-01-22T00:00:00"},"QuoteDateTime":"2019-12-27T11:03:00"}],"Places":[{"PlaceId":66270,"IataCode":"LTN","Name":"London Luton","Type":"Station","SkyscannerCode":"LTN","CityName":"London","CityId":"LOND","CountryName":"United Kingdom"},{"PlaceId":81678,"IataCode":"SEN","Name":"London Southend","Type":"Station","SkyscannerCode":"SEN","CityName":"London","CityId":"LOND","CountryName":"United Kingdom"},{"PlaceId":82398,"IataCode":"STN","Name":"London Stansted","Type":"Station","SkyscannerCode":"STN","CityName":"London","CityId":"LOND","CountryName":"United Kingdom"},{"PlaceId":82582,"IataCode":"SXF","Name":"Berlin Schoenefeld","Type":"Station","SkyscannerCode":"SXF","CityName":"Berlin","CityId":"BERL","CountryName":"Germany"}],"Carriers":[{"CarrierId":50441,"Name":"easyJet"},{"CarrierId":1090,"Name":"Ryanair"}],"Currencies":[{"Code":"EUR","Symbol":"€","ThousandsSeparator":".","DecimalSeparator":",","SymbolOnLeft":false,"SpaceBetweenAmountAndSymbol":true,"RoundingCoefficient":0,"DecimalDigits":2}]}

Cool! We got our first result. But there is a lot to unpack about what just happened! First, let’s have a look at what we requested. You see the last field in the myurl? That is the date we intend to fly. Then we have origin and destination, which is specified as the Skyscanner’s location format. Now Berlin has 2 airports: Tegel and Schoenefeld, but in the API, we can just use BERL-sky which will give results from both! (Similarly, London has 4!) This location is available through another endpoint where you can simply query for a location string and it will return a JSON of possible Skyscanner locations.

originCountry is the place where we are doing the search from. A little side note: changing this can drastically change the prices!! I found the German prices to be the cheapest. However, changing the currency didn’t affect the prices.

Let’s parse and pretty-print the response JSON so we can clearly understand is the information that we are getting back.

j =json.loads(response.text)
print(json.dumps(j, indent =2))
{
  "Quotes": [
    {
      "QuoteId": 1,
      "MinPrice": 18.0,
      "Direct": true,
      "OutboundLeg": {
        "CarrierIds": [
          1090
        ],
        "OriginId": 82398,
        "DestinationId": 82582,
        "DepartureDate": "2020-01-22T00:00:00"
      },
      "QuoteDateTime": "2019-12-27T11:03:00"
    }
  ],
  "Places": [
    {
      "PlaceId": 66270,
      "IataCode": "LTN",
      "Name": "London Luton",
      "Type": "Station",
      "SkyscannerCode": "LTN",
      "CityName": "London",
      "CityId": "LOND",
      "CountryName": "United Kingdom"
    },
    {
      "PlaceId": 81678,
      "IataCode": "SEN",
      "Name": "London Southend",
      "Type": "Station",
      "SkyscannerCode": "SEN",
      "CityName": "London",
      "CityId": "LOND",
      "CountryName": "United Kingdom"
    },
    {
      "PlaceId": 82398,
      "IataCode": "STN",
      "Name": "London Stansted",
      "Type": "Station",
      "SkyscannerCode": "STN",
      "CityName": "London",
      "CityId": "LOND",
      "CountryName": "United Kingdom"
    },
    {
      "PlaceId": 82582,
      "IataCode": "SXF",
      "Name": "Berlin Schoenefeld",
      "Type": "Station",
      "SkyscannerCode": "SXF",
      "CityName": "Berlin",
      "CityId": "BERL",
      "CountryName": "Germany"
    }
  ],
  "Carriers": [
    {
      "CarrierId": 50441,
      "Name": "easyJet"
    },
    {
      "CarrierId": 1090,
      "Name": "Ryanair"
    }
  ],
  "Currencies": [
    {
      "Code": "EUR",
      "Symbol": "\u00e2\u201a\u00ac",
      "ThousandsSeparator": ".",
      "DecimalSeparator": ",",
      "SymbolOnLeft": false,
      "SpaceBetweenAmountAndSymbol": true,
      "RoundingCoefficient": 0,
      "DecimalDigits": 2
    }
  ]
}

So much information!

That is so much information!! First, we have "Quotes". This gives us the cheapest quote for that day and combination. We have exact airlines, origin and destination mentioned in form of IDs, which have to be resolved by looking at the subsequent fields in Carriers and Places. For example "Places": is the json list from where the flights are possible on that day.

But did you notice that we do not receive any kind of time? Look at the following:

       "DepartureDate": "2020-01-22T00:00:00"

Yes! This is what they call Browsing for the flights. Once we are more sure about the search, we have to get more information by asking for a different request. That will give us more details and also the exact URL of where to book this selection. Then the URL directs to booking website and our job is over.

For now, let’s can focus on collecting the data. The simplest way I could imagine is:

Let’s continue and create 2 arrays for origin and destination. The best thing about the API is that you can have whole countries as a place! So we can use IT-sky for all the airports in Italy and ES-sky as all the airports in Spain and so on!!! But if we do that, we include all the possible airports in the country. Which means for Spain, it selects Palma island, which we don’t want to go (at least for now!). Let’s just use all the airports that we mentioned earlier.

# Airports where we can fly from: Berlin
source_array = {"BERL-sky"} 

# Our destination airports: Madrid, Barcelona, Seville, Valencia
destination_array = {"MAD-sky", "BCN-sky", "SVQ-sky", "VLC-sky"}

### ( Note that technically these are Python **sets** and not arrays.)

# And to make our life easier
rootURL = "https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com/apiservices/browsequotes/v1.0/"

Now we loop through all possible options in the array and request the results for each pair. We are still looking for one way result for 22 of January.

But we can already start making our algorithm smarter. Every time, we get a list of airports which we then have to cross-reference with the Places field. Let’s create a simple python dictionary which will save all the airports and their IDs. As python dictionaries are hashmaps, we will get the result in O(1) average case complexity. To top it off, let’s add print statements which print the exact airports and price instead of the whole response JSON.

airports = { }
for destination in destination_array:
    for source in source_array:
        myurl = rootURL + originCountry + "/" + currancy + "/" + locale + "/" + source + "/"  + destination + "/" + "2020-01-22"
        response = requests.request("GET", myurl, headers=headers)
        temp = json.loads(response.text)
        
        # This checks if we have a quote or there were no flights
        if("Quotes" in temp):
            for Places in temp["Places"]:
                # Add the airport in the dictionary.
                airports[Places["PlaceId"]] = Places["Name"] 
            for Quotes in temp["Quotes"]:
                print("************")
                # print("%s --> to  -->%s" %(origin,destination))
                ori = Quotes["OutboundLeg"]["OriginId"]
                dest = Quotes["OutboundLeg"]["DestinationId"]
                # Look for Airports in the dictionary
                print("Journy:  %s  --> %s"%(airports[ori],airports[dest]))
                print("Price: %s EUR" %Quotes["MinPrice"])

We get following result back:


Journy:  Berlin Tegel  --> Barcelona
Price: 28.0 EUR
************
Journy:  Berlin Schoenefeld  --> Barcelona
Price: 56.0 EUR
************
Journy:  Berlin Schoenefeld  --> Seville
Price: 26.0 EUR
************
Journy:  Berlin Tegel  --> Valencia
Price: 30.0 EUR
************
Journy:  Berlin Schoenefeld  --> Madrid
Price: 23.0 EUR

Interesting! We have 5 flight options and we already see that flying to Madrid will be the cheapest! Now, let’s say we want to fly on some date in 18th Jan - 24 th Jan. We will have to add another for loop which loops through all possible dates. And to ignore expensive flights, let us add a maxbudget variable which sets our one way budget to 40 €.

import time, datetime, dateutil
import pandas as pd

source_begin_date = "2020-01-18"
source_end_date =  "2020-01-24"  
daterange = pd.date_range(source_begin_date, source_end_date)
airports = { }
maxbudget = 40

I want to create a class so we can create a neat system. I know this would be overkill! But bear with me, it might be useful later!

class findingCheapestFlights:
    
    def __init__(self, originCountry = "DE", currency = "EUR", locale = "en-US", rootURL="https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com"):
        self.currency = currency
        self.locale =  locale
        self.rootURL = rootURL
        self.originCountry = originCountry

    def setHeaders(self, headers):
        self.headers =  headers

    def browseQuotes(self, source, destination, date):
        quoteRequestPath = "/apiservices/browsequotes/v1.0/"
        browseQuotesURL = self.rootURL + quoteRequestPath + self.originCountry + "/" + self.currency + "/" + self.locale + "/" + source + "/" + destination + "/" + date.strftime("%Y-%m-%d")
        response = requests.request("GET", url = browseQuotesURL, headers = self.headers)
        resultJSON = json.loads(response.text)
        return resultJSON

To analyze the performance of the code, I want to see which parts the program is spending it’s most time. This is known as benchmarking.

import time
cheapest_flight_finder = findingCheapestFlights()
cheapest_flight_finder.setHeaders(headers)

total_compute_time = 0.0
total_request_time = 0.0

function_start = time.time()
for single_date in daterange:
    for destination in destination_array:
        for source in source_array:
            request_start = time.time()
            resultJSON = cheapest_flight_finder.browseQuotes(source, destination,single_date)
            request_end = time.time()
            if("Quotes" in resultJSON):
                for Places in resultJSON["Places"]:
                    # Add the airport in the dictionary.
                    airports[Places["PlaceId"]] = Places["Name"] 
                for Quotes in resultJSON["Quotes"]:
                    if(Quotes["MinPrice"]<maxbudget):                        
                        print("************")
                        print(single_date.strftime("%d-%b %a"))
                        # print("%s --> to  -->%s" %(origin,destination))
                        source = Quotes["OutboundLeg"]["OriginId"]
                        dest = Quotes["OutboundLeg"]["DestinationId"]
                        # Look for Airports in the dictionary
                        print("Journy:  %s  --> %s"%(airports[source],airports[dest]))
                        print("Price: %s EUR" %Quotes["MinPrice"])
            calculation_end = time.time()
            total_compute_time += calculation_end - request_end 
            total_request_time += request_end - request_start
print("\nBenchmark Stats :")
print("Time spent in computing: %f seconds"%total_compute_time )
print("Time spent in requesting: %f seconds"%total_request_time )
print("Time spent in program: %f seconds"%(time.time()-function_start))

And we get back the following:


18-Jan Sat
Journy:  Berlin Schoenefeld  --> Barcelona
Price: 21.0 EUR
************
18-Jan Sat
Journy:  Berlin Schoenefeld  --> Seville
Price: 34.0 EUR
************
18-Jan Sat
Journy:  Berlin Schoenefeld  --> Valencia
Price: 23.0 EUR
************
18-Jan Sat
Journy:  Berlin Schoenefeld  --> Madrid
Price: 36.0 EUR
************
.
.
_A few more lines_
.
.
************
23-Jan Thu
Journy:  Berlin Schoenefeld  --> Madrid
Price: 19.0 EUR
************
24-Jan Fri
Journy:  Berlin Schoenefeld  --> Barcelona
Price: 26.0 EUR
************
24-Jan Fri
Journy:  Berlin Schoenefeld  --> Valencia
Price: 32.0 EUR
Benchmark Stats :
Time spent in computing: 0.013052 seconds
Time spent in requesting: 6.783518 seconds
Time spent in program: 6.797550 seconds

Wow, this means our program is pretty quick. But the API is a bit slow. Hmm.. What if somehow we can make multiple requests at the same time?…


Continued in Part 2 #


In part 2 of the series, we will look at Parallelization, Docker, MongoDB and try out some other cool stuff. In the end, we will have our complete application ready for you to find your next trip! So stay tuned! Also, what do you think of the idea? Will you use it if turned into a website? What other features would you like to add? Or have you found a website which does something like this already?? Let me know in the comments below or contact me via social links! Until next time! Ciao 👋

The jupyter notebook until this step is available here


Part 2 of the series is now availble! #

Read it here


The complete project is also available on GitHub.