Jan 13, 2020 - ' reading

The Eurotrip Planner- Part 2

Holiday Project 2019 #

Part 2 #

If you haven’t read part 1 yet, what are you waiting for?! Read it here. #

Hi there again! In the last blog, we were able to get some results. But we noticed that our program spent the majority of its time requesting results from the API. As we are spending a lot of time in requests, it is good to have them done in parallel. So, let’s leverage threading in Python to spawn multiple threads at the same time. Thankfully, as our threads are not interdependent on each other, we can be chill about concurrency issues. Keep in mind that threading does not mean that our program will run on multiple cores.

import concurrent.futures, threading

Another improvement that we can do is instead of sending all of our information data again and again ( response = requests.request("GET", url = browseQuotesURL, headers = self.headers) ), we create a session. This session will be remembered by our object and the next time, we just have to make a direct request. This will be also helpful in the future if we want to get the url for ticket purchase.

class finder:
    
    def __init__(self, originCountry = "DE", currency = "EUR", locale = "en-US", rootURL="https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com"):
        self.currency = currency
        self.locale =  locale
        self.rootURL = rootURL
        self.originCountry = originCountry
        self.airports = {}
        
    def setHeaders(self, headers):
        self.headers =  headers
        self.createSession()

    # Create a session
    def createSession(self):
        self.session = requests.Session() 
        self.session.headers.update(self.headers)
        return self.session
        
    def browseQuotes(self, source, destination, date):
        quoteRequestPath = "/apiservices/browsequotes/v1.0/"
        browseQuotesURL = self.rootURL + quoteRequestPath + self.originCountry + "/" + self.currency + "/" + self.locale + "/" + source + "/" + destination + "/" + date.strftime("%Y-%m-%d")
        # Use the same session to request again and again
        response = self.session.get(browseQuotesURL)
        resultJSON = json.loads(response.text)
        self.printResult(resultJSON,date)
        
    # A bit more elegant print
    def printResult(self, resultJSON,date):
        if("Quotes" in resultJSON):
            for Places in resultJSON["Places"]:
                self.airports[Places["PlaceId"]] = Places["Name"] 
            for Quotes in resultJSON["Quotes"]:
                source = Quotes["OutboundLeg"]["OriginId"]
                dest = Quotes["OutboundLeg"]["DestinationId"]
                print(date.strftime("%d-%b %a") + " | " + "%s  --> %s"%(self.airports[source],self.airports[dest]) + " | " + "%s EUR" %Quotes["MinPrice"])

As this is a new Jupyter Notebook, let’s dump all our variables from previous code here

import requests, json, timeit, time, datetime, dateutil, osmapi
import calendar
import pandas as pd
import time

rootURL = "https://skyscanner-skyscanner-flight-search-v1.p.rapidapi.com"
originCountry = "DE"
currancy = "EUR"
locale = "en-US"

source_begin_date = "2020-01-18"
source_end_date =  "2020-01-24"  
daterange_source = pd.date_range(source_begin_date, source_end_date)
airports = { }
source_array = {"BERL-sky"} 
destination_array = {"MAD-sky", "BCN-sky", "SVQ-sky", "VLC-sky"}


headers = {
    'x-rapidapi-host': "skyscanner-skyscanner-flight-search-v1.p.rapidapi.com",
    'x-rapidapi-key': "ae922034c6mshbd47a2c270cbe96p127c54jsnfec4819a7799"
    }

Now here comes the multi-threading part. The concurrent features library hands it for us. Using threadpoolexecutor as a wrapper, we just submit our task and its parameters to a pool of threads. The executor automatically schedules them as they arrive and they get executed in parallel. Easiest. Multithreading. Ever!

cheapest_flight_finder2 = finder()
cheapest_flight_finder2.setHeaders(headers)

function_start = time.time()

with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
    for single_date in daterange_source:
        for destination in destination_array:
            for source in source_array:
                request_start = time.time()
                executor.submit(cheapest_flight_finder2.browseQuotes,source, destination,single_date)

print("\nBenchmark Stats :")
print("Time spent in program: %f seconds"%(time.time()-function_start))

18-Jan Sat | Berlin Schoenefeld  --> Seville | 49.0 EUR23-Jan Thu | Berlin Tegel  --> Madrid | 49.0 EUR
23-Jan Thu | Berlin Tegel  --> Madrid | 53.0 EUR
24-Jan Fri | Berlin Schoenefeld  --> Valencia | 44.0 EUR
19-Jan Sun | Berlin Tegel  --> Madrid | 49.0 EUR

21-Jan Tue | Berlin Schoenefeld  --> Madrid | 44.0 EUR
21-Jan Tue | Berlin Tegel  --> Madrid | 46.0 EUR
22-Jan Wed | Berlin Tegel  --> Barcelona | 48.0 EUR
22-Jan Wed | Berlin Schoenefeld  --> Barcelona | 95.0 EUR
22-Jan Wed | Berlin Schoenefeld  --> Seville | 45.0 EUR22-Jan Wed | Berlin Schoenefeld  --> Valencia | 45.0 EUR22-Jan Wed | Berlin Tegel  --> Madrid | 44.0 EUR

24-Jan Fri | Berlin Tegel  --> Madrid | 44.0 EUR
18-Jan Sat | Berlin Tegel  --> Madrid | 45.0 EUR
21-Jan Tue | Berlin Schoenefeld  --> Seville | 118.0 EUR20-Jan Mon | Berlin Tegel  --> Barcelona | 57.0 EUR21-Jan Tue | Berlin Tegel  --> Valencia | 36.0 EUR
20-Jan Mon | Berlin Schoenefeld  --> Valencia | 67.0 EUR

24-Jan Fri | Berlin Schoenefeld  --> Barcelona | 49.0 EUR
24-Jan Fri | Berlin Tegel  --> Barcelona | 66.0 EUR

21-Jan Tue | Berlin Tegel  --> Seville | 50.0 EUR
24-Jan Fri | Berlin Schoenefeld  --> Seville | 58.0 EUR
20-Jan Mon | Berlin Schoenefeld  --> Barcelona | 74.0 EUR
19-Jan Sun | Berlin Tegel  --> Seville | 74.0 EUR
21-Jan Tue | Berlin Schoenefeld  --> Valencia | 236.0 EUR

18-Jan Sat | Berlin Schoenefeld  --> Valencia | 62.0 EUR23-Jan Thu | Berlin Tegel  --> Barcelona | 53.0 EUR
19-Jan Sun | Berlin Tegel  --> Barcelona | 100.0 EUR
23-Jan Thu | Berlin Tegel  --> Barcelona | 60.0 EUR

20-Jan Mon | Berlin Tegel  --> Madrid | 46.0 EUR19-Jan Sun | Berlin Schoenefeld  --> Barcelona | 73.0 EUR

23-Jan Thu | Berlin Tegel  --> Valencia | 44.0 EUR
19-Jan Sun | Berlin Schoenefeld  --> Valencia | 60.0 EUR18-Jan Sat | Berlin Schoenefeld  --> Barcelona | 71.0 EUR
18-Jan Sat | Berlin Schoenefeld  --> Barcelona | 204.0 EUR

23-Jan Thu | Berlin Tegel  --> Seville | 58.0 EUR
21-Jan Tue | Berlin Schoenefeld  --> Barcelona | 103.0 EUR
21-Jan Tue | Berlin Tegel  --> Barcelona | 46.0 EUR

Benchmark Stats :
Time spent in program: 2.338449 seconds

Actual Photo of our schedular runnig our code

Whoa, thats a lot faster! But just how fast? Lets compare the same code using single thread.

function_start = time.time()

for single_date in daterange_source:
    for destination in destination_array:
        for source in source_array:
            request_start = time.time()
            cheapest_flight_finder2.browseQuotes(source, destination,single_date)

print("\nBenchmark Stats :")
print("Time spent in program: %f seconds"%(time.time()-function_start))

18-Jan Sat | Berlin Schoenefeld  --> Seville | 49.0 EUR
18-Jan Sat | Berlin Schoenefeld  --> Barcelona | 204.0 EUR
18-Jan Sat | Berlin Schoenefeld  --> Barcelona | 71.0 EUR
18-Jan Sat | Berlin Schoenefeld  --> Valencia | 62.0 EUR
18-Jan Sat | Berlin Tegel  --> Madrid | 45.0 EUR
19-Jan Sun | Berlin Tegel  --> Seville | 74.0 EUR
19-Jan Sun | Berlin Schoenefeld  --> Barcelona | 73.0 EUR
19-Jan Sun | Berlin Tegel  --> Barcelona | 100.0 EUR
19-Jan Sun | Berlin Schoenefeld  --> Valencia | 60.0 EUR
19-Jan Sun | Berlin Tegel  --> Madrid | 49.0 EUR
20-Jan Mon | Berlin Schoenefeld  --> Barcelona | 74.0 EUR

----

23-Jan Thu | Berlin Tegel  --> Madrid | 49.0 EUR
24-Jan Fri | Berlin Schoenefeld  --> Seville | 58.0 EUR
24-Jan Fri | Berlin Tegel  --> Barcelona | 66.0 EUR
24-Jan Fri | Berlin Schoenefeld  --> Barcelona | 49.0 EUR
24-Jan Fri | Berlin Schoenefeld  --> Valencia | 44.0 EUR
24-Jan Fri | Berlin Tegel  --> Madrid | 44.0 EUR

Benchmark Stats :
Time spent in program: 16.313295 seconds

That’s a tremendous improvement. Our program is almost 7-8 times faster! You have to do a bit of trial and error for max_thread value in order to get the optimum number of threads. For me, 32 was the best.

Also, did you notice the text glitches in our multi-thread version? Our threads are independent, so it could happen that two threads want to use print at the exact same time. Thus, somehow, the next line is not inserted properly. However, if our threads needed a synchronisation, we would have to deal with a lot of stuff like semaphores and mutexes. But, we are safe for now!

Now that we have finished our thread thread , we move back to our application

Databases and Docker #

You can see that we are generating lots of data for a single request. Now we also have to operate on it to find the cheapest trip.We can use python’s data structures or even pandas. But why not use what is actually used in real world products? Databases!

Databases are an organized way of holding data. Think of something like an Excel Spreadsheet, but for easy access from other programs. Like Pokemons, there are 100s of different databases. But there 2 major types: Relational ones (SQL) and Non-relational ones (NoSQL). If you want to know more about their differences and how they function, this blog has explained it in much detail. But in our case, we are working with JSON anyway and MongoDB is just meant for that kind of data! I could go on and on about why we chose a particular database. For now, let’s use MongoDB.

We can install it like a normal program using .exe or .deb packages. But, let’s consider a practical scenario. If this reaches production, it will likely be running on a server. And if it, god forbids, gets popular(!); we might be getting a ton of different requests per second from different parts of the globe. In this case, if our code fails, everything will just stop.

The clever way of overcoming this is using microservices, i.e. using Kubernetes and Docker. Now, this is definitely a topic of another blog. But now, to explain to you how easy it is to set up a service, let’s use MongoDB’s official docker container. Follow the installation process for docker. Once you have done that, just run following command in the terminal:

docker run --name eurotrip-planner-mongo mongo:latest

This will pull the latest MongoDB image and all it’s required parts, build a container and then start a MongoDB server! All in one line! You can check if it is working or not using MongoDB client application. Our setup is done!

Now, I am cheating a bit, but I already wrote a wrapper driver module for MongoDB. You can read the docs on GitHub but it is fairly easy to understand. I am just going to use it directly here.

Our MongoDB parameters are here. We create 2 different collections or tables for our Incoming and Outgoing flights.

import wrapymongo
authdb = "admin"
monogdbport = "27017"
host = "localhost"
link = "mongodb://" + host + ":" + monogdbport
database =  "SkyScanner"
outgoingTable = "Outgoing"
incomingTable = "Incoming"
placesTable = "Places"

I am adding a function that acts like a template maker that gets an object instance of our MongoDB class.

# Function to make wrapymongo object
def makeObject(link,dbName = "SkyScanner", dbCollection="test"):
    mdbobject = wrapymongo.driver(link)
    mdbobject.defineDB(dbName)
    mdbobject.defineCollection(dbCollection)
    return mdbobject

We instantiate our objects and clear their contents if they have any.

mdbOutgoing = makeObject(link,dbName = database,dbCollection = outgoingTable)
mdbPlaces = makeObject(link,dbName = database,dbCollection = placesTable)
mdbIncoming = makeObject(link,dbName = database,dbCollection = incomingTable)

mdbOutgoing.dropCollection()
mdbPlaces.dropCollection()
mdbIncoming.dropCollection()

And we initialize our arrays as usual. I saved our finder class in another file called flightfinder.py to make it easier to work with.

import flightfinder as ff

airports = { }


outgoing_flight_finder = ff.finder()
outgoing_flight_finder.setHeaders(headers)

incoming_flight_finder = ff.finder()
incoming_flight_finder.setHeaders(headers)

source_array = {"BERL-sky"} 
destination_array = {"MAD-sky", "BCN-sky", "SVQ-sky", "VLC-sky"}

Let it rip!

processing_start = time.time()

with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
    for single_date in daterange_source:
        for destination in destination_array:
            for source in source_array:
                request_start = time.time()
                executor.submit(outgoing_flight_finder.browseQuotes,source, destination,single_date)

outgoingQuotes = outgoing_flight_finder.getQuotes()

for quote in outgoingQuotes:
    mdbOutgoing.insertRecords(quote)                

airports.update(outgoing_flight_finder.getAirports())

At this point, you can go on the previously installed MongoDB client application and see how the database has been updated with a tablename outgoing and all the entries related to it! We are just adding all the quotes, one by one, in the database. Now, let’s do the same for the “coming back” part of the trip.

destination_begin_date = "2020-01-24"
destination_end_date =  "2020-01-30"  
daterange_destination = pd.date_range(destination_begin_date, destination_end_date)

# We reverse the arrays here
with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
    for single_date in daterange_destination:
        for destination in source_array:
            for source in destination_array:
                request_start = time.time()
                executor.submit(incoming_flight_finder.browseQuotes,source, destination,single_date)

incomingQuotes = incoming_flight_finder.getQuotes()

for quote in incomingQuotes:
    mdbIncoming.insertRecords(quote)      

airports.update(incoming_flight_finder.getAirports())

At this point, we have all we need, stored in the database. We just have to make sense of all the data. So first, let’s get the top 20 cheapest entries from each of the collection. We just make a query for sortRecords with MinPrice as key and sorted from lowest to highest (indicated by 1)

# Sort both dbs by cheapest
cheapestOutgoingFlights = {}
cheapestOutgoingFlights = mdbOutgoing.sortRecords([('MinPrice', 1)], 20)

cheapestIncomingFlights = {}
cheapestIncomingFlights = mdbIncoming.sortRecords([('MinPrice', 1)], 20)

To get the cheapest trip, we check for all possible combinations between incoming and outgoing quotes. Let’s just combine the data first and print it.

finalListElement = {}
finalList = []
for incomingQuotes in cheapestIncomingFlights:
    for outgoingQuotes in cheapestOutgoingFlights:
        finalListElement = {}
        finalListElement["TotalPrice"] = incomingQuotes["MinPrice"] + outgoingQuotes["MinPrice"]
        finalListElement["TakeOff1"] =  airports[outgoingQuotes["OutboundLeg"]["OriginId"]]              
        finalListElement["Land1"] =  airports[outgoingQuotes["OutboundLeg"]["DestinationId"]]  
        finalListElement["TakeOff2"] =  airports[incomingQuotes["OutboundLeg"]["OriginId"]]  
        finalListElement["Land2"] =  airports[incomingQuotes["OutboundLeg"]["DestinationId"]] 
        finalListElement["Date1"] = outgoingQuotes["OutboundLeg"]["DepartureDate"]
        finalListElement["Date2"] = incomingQuotes["OutboundLeg"]["DepartureDate"]
        finalList.append(finalListElement)

print(finalList[:10])

[{'TotalPrice': 54.0, 'TakeOff1': 'Berlin Tegel', 'Land1': 'Valencia', 'TakeOff2': 'Seville', 'Land2': 'Berlin Schoenefeld', 'Date1': '2020-01-21T00:00:00', 'Date2': '2020-01-28T00:00:00'}, {'TotalPrice': 54.0, 'TakeOff1': 'Berlin Tegel', 'Land1': 'Valencia', 'TakeOff2': 'Seville', 'Land2': 'Berlin Schoenefeld', 'Date1': '2020-01-21T00:00:00', 'Date2': '2020-01-28T00:00:00'}, {'TotalPrice': 62.0, 'TakeOff1': 'Berlin Schoenefeld', 'Land1': 'Madrid', 'TakeOff2': 'Seville', 'Land2': 'Berlin Schoenefeld', 'Date1': '2020-01-21T00:00:00', 'Date2': '2020-01-28T00:00:00'}, {'TotalPrice': 62.0, 'TakeOff1': 'Berlin Tegel', 'Land1': 'Madrid', 'TakeOff2': 'Seville', 'Land2': 'Berlin Schoenefeld', 'Date1': '2020-01-24T00:00:00', 'Date2': '2020-01-28T00:00:00'}, {'TotalPrice': 62.0, 'TakeOff1': 'Berlin Tegel', 'Land1': 'Valencia', 'TakeOff2': 'Seville', 'Land2': 'Berlin Schoenefeld', 'Date1': '2020-01-23T00:00:00', 'Date2': '2020-01-28T00:00:00'}, {'TotalPrice': 62.0, 'TakeOff1': 'Berlin Tegel', 'Land1': 'Madrid', 'TakeOff2': 'Seville', 'Land2': 'Berlin Schoenefeld', 'Date1': '2020-01-22T00:00:00', 'Date2': '2020-01-28T00:00:00'}, {'TotalPrice': 62.0, 'TakeOff1': 'Berlin Schoenefeld', 'Land1': 'Valencia', 'TakeOff2': 'Seville', 'Land2': 'Berlin Schoenefeld', 'Date1': '2020-01-24T00:00:00', 'Date2': '2020-01-28T00:00:00'}, {'TotalPrice': 62.0, 'TakeOff1': 'Berlin Schoenefeld', 'Land1': 'Madrid', 'TakeOff2': 'Seville', 'Land2': 'Berlin Schoenefeld', 'Date1': '2020-01-21T00:00:00', 'Date2': '2020-01-28T00:00:00'}, {'TotalPrice': 62.0, 'TakeOff1': 'Berlin Schoenefeld', 'Land1': 'Valencia', 'TakeOff2': 'Seville', 'Land2': 'Berlin Schoenefeld', 'Date1': '2020-01-24T00:00:00', 'Date2': '2020-01-28T00:00:00'}, {'TotalPrice': 62.0, 'TakeOff1': 'Berlin Tegel', 'Land1': 'Valencia', 'TakeOff2': 'Seville', 'Land2': 'Berlin Schoenefeld', 'Date1': '2020-01-23T00:00:00', 'Date2': '2020-01-28T00:00:00'}]

Awesome!! Now we just leverage our mongodb to put all these records in the database and then sort them by their total cost.

mdbFinal = makeObject(link, dbName=database, dbCollection="FinalDatabase")
mdbFinal.dropCollection()
mdbFinal.insertRecords(finalList)

print("The Top ten cheapest flights are:")
topQuotes = mdbFinal.sortRecords([('TotalPrice', 1)], 10)
for quote in topQuotes:
    print("\n*****\nOnwards: " + quote["Date1"] + " " + quote["TakeOff1"] + " --> " + quote["Land1"] + " \nReturn: " +
          quote["Date2"] + " " + quote["TakeOff2"] + " --> " + quote["Land2"] + " \n \t   | " + "%s EUR" % quote["TotalPrice"])

The Top ten cheapest flights are:

*****
Onwards: 2020-01-21T00:00:00 Berlin Tegel --> Valencia 
Return: 2020-01-28T00:00:00 Seville --> Berlin Schoenefeld 
 	   | 54.0 EUR

*****
Onwards: 2020-01-21T00:00:00 Berlin Tegel --> Valencia 
Return: 2020-01-28T00:00:00 Seville --> Berlin Schoenefeld 
 	   | 54.0 EUR

*****
Onwards: 2020-01-21T00:00:00 Berlin Tegel --> Valencia 
Return: 2020-01-28T00:00:00 Seville --> Berlin Schoenefeld 
 	   | 54.0 EUR

*****
Onwards: 2020-01-21T00:00:00 Berlin Tegel --> Valencia 
Return: 2020-01-28T00:00:00 Seville --> Berlin Schoenefeld 
 	   | 54.0 EUR

*****
Onwards: 2020-01-21T00:00:00 Berlin Tegel --> Valencia 
Return: 2020-01-30T00:00:00 Valencia --> Berlin Schoenefeld 
 	   | 55.0 EUR

*****
Onwards: 2020-01-21T00:00:00 Berlin Tegel --> Valencia 
Return: 2020-01-30T00:00:00 Valencia --> Berlin Schoenefeld 
 	   | 55.0 EUR

*****
Onwards: 2020-01-21T00:00:00 Berlin Tegel --> Valencia 
Return: 2020-01-30T00:00:00 Barcelona --> Berlin Schoenefeld 
 	   | 55.0 EUR

*****
Onwards: 2020-01-21T00:00:00 Berlin Tegel --> Valencia 
Return: 2020-01-30T00:00:00 Barcelona --> Berlin Schoenefeld 
 	   | 55.0 EUR

*****
Onwards: 2020-01-21T00:00:00 Berlin Tegel --> Valencia 
Return: 2020-01-29T00:00:00 Valencia --> Berlin Schoenefeld 
 	   | 55.0 EUR

*****
Onwards: 2020-01-21T00:00:00 Berlin Tegel --> Valencia 
Return: 2020-01-29T00:00:00 Valencia --> Berlin Schoenefeld 
 	   | 55.0 EUR

Woohoo! We have finally received what we were looking for. Now I have some cool options to choose my trip from.

If you have made it until this point, then congratulations!! You have conquered the mountain and the summit is yours! Now its time for retrospection. Was this really necessary, or you could have just flown on the top of the using helicopter? (Wiz: could you just have used Google Flights over and over to do this?) Yes!! Is our path (solution) the most elegant and the easiest of all? Of course not! Does it even make sense to use Docker and MongoDB for such small tasks? Mostly not! Is it over-engineered? You bet! Is it at least useful? Mostly not as we don’t even get the timings!

But then, even in this toy problem, we went through major steps in software designing. We developed a real, scalable system which can give us some results. It may seem useless to just get the cheapest flights, but we can easily extend it to any number of parameters that we want. We could sort using dates, airlines, stopovers and create a real product. This was a somewhat real problem, and we found a real solution. I think that’s a win!

THE END #

The source code of this project is available on GitHub. If you would like to contribute towards it, just send me a pull request. Also, if you think this project can be developed into a real website which a lot of people would like to use it for, hit me up!!

Also, feel free to send me questions and corrections for this blog. Also also, what do you think the next topic should be? Fee free to message me on social platoform or email me!

The jupyter notebook until for the second part is available here

The Most avaited GitHub Code is here