mercredi 8 juillet 2015

Yellowbrick of the past

A couple of years ago Yellobrick started to track the races I participated at. At this time, they had issues because too many people were connecting to their services.

I decided to take a look at their implementation, among all the scripts and contents that are loaded in your browser was a huge XML file containing all the positions of all the boats, this file could reach a size of more than 10MB, no wonder why they had load issues...

At this time, we wanted to know our competitors positions, so in order to do that we could either:
- Stay close to shore and hope to get cell phone data
- Pay $5000 get a top-notch satellite system
- Pay $75 for a very basic satellite phone plan and work a little bit

We decided to pay $75 and work a little bit, The idea was that we made a server that was responding to our requests and get only the information we needed. For that we had to understand that XML format they were using.

XML data could be fetched by calling the Yellowbrick API this way :http://gae.yb.tl/Flash/nb2012/TeamSetup/?rnd=99912 or that way http://gae.yb.tl/Flash/nb2012/LatestPositions/?rnd=31073
They had many files there: Team setup, race setup, latest positions, all positions, weather...

After extracting the useful information from that XML data we could either render the information in a webpage or send it by email.
Using that system we save ourselves downloading megabytes of useless data, thus saving tremendous amounts of time and money.

I also made a little chart plotter in order to follow every boats around us:


This was pretty good, but later in 2014 yellow brick moved to a new binary format. This was a very smart move as their new way of exchanging data is very efficient.

The race definition was stored in a JSON formatted while the positions and team information was in binary. They created a specific binary format that was hopefully easy to understand and parse. From there it was then easy to send again only the fleet information we needed and save us from downloading the entire binary file.

I would have to say that Yellobrick did a nice job in fixing their issues fast.
We had a clear advantage compare to everybody else by retrieving only chunks of this public data instead of the whole thing which saved us a lot of time over our cheap satellite connection.

Now Yellowbrick made another smart move, they published an API and provide low bandwidth text files that contains the data you need.

All of the work described above is of limited use now as they provide these new ways of accessing the data, but luckily we have other homemade tools this year :)

3 commentaires:

  1. Ciao David,

    I would like to access all data from YB for one race.
    I tried using http://live.adventuretracking.com/xml/4kvda2016?n=10
    and scaling n to duration of race ... but there's probably a timeout :(

    Do you know any other way? I am a noob.
    Thanks for any hint!

    RépondreSupprimer
    Réponses
    1. I may be a few things, when you open that link in your browser does it work? Because from my computer it does and I receive the full XML content.

      If it does not work when you call it programmatically it may be because you have to set the UserAgent for your HttpRequestHeader. This will make the server belive that you are in fact a browser and not a robot, then he's going to respond. You should also add in that header that you support gzip decompression.

      A good way to know what works is to try in your browser and use the Live HTTP Headers extension: https://chrome.google.com/webstore/detail/live-http-headers/iaiioopjkcekapmldfgbebdclcnpgnlo?hl=en

      For a little bit of info concerning this in C# have a look at: http://stackoverflow.com/questions/8278057/httpwebrequest-how-to-identify-as-a-browser and http://stackoverflow.com/questions/2815721/net-is-it-possible-to-get-httpwebrequest-to-automatically-decompress-gzipd-re

      Supprimer
    2. Thanks a lot David!

      First of all what i wrote works well for limited amounts of data.
      The problem i encounter is that there was a recording for every participant (about 600) every 5 seconds.
      As the race lasted officially 159h that the complete dataset has 600 times 114.480 recordings.
      (114.480 = 12 recordings per min * 60 minutes per hour * 159 hours).
      And when i try http://live.adventuretracking.com/xml/4kvda2016?n=114480 i get what i guess is a timeout.

      Is there a way to portion my requests?
      like saying only get n[1000-2000] ?

      Or a way to only get all positions from a specific device serial ? e.g. :


      Thanks a lot for your help !

      Supprimer