Working with Omni has been an exhilarating experience. Across our company we have been impressed with how quickly the development team got to grips with our workflow, and designed innovative solutions to make it work better. From our initial concept Omni have created a bespoke tool which is beyond anything we could have imagined by ourselves. - Project Lead for SWNS

Omni Digital worked with SWNS to produce the world’s first real-time news syndication platform - a single source for news outlets to find copy, images and video.

Since its inception in 1970, SWNS has become the UK’s largest news syndicator. With nine offices stretching from London to New York. They produce, verify and distribute original and engaging copy, pictures and video around the clock.

The project was co-funded by SWNS and the Google DNI innovation fund.

The system aimed to solve three core problems within the news industry:

  • delivery of a single unit of publication-ready news, comprising of images, copy and video
  • real-time attribution and sourcing of content
  • complete transparency of provenance of all news

The challenge this project proposed was massive: scale of storage, scale of day-to-day processing, and the complexities of machine learning.

History

The projects roots go back a fair few years - as the SWNS team grew and became the UK’s largest news syndicator, they realised that the industry was changing and moving towards the content-is-king model.

Independently, Omni had been working on ideas around content attribution on the web - a tricky problem given the scale of data involved.

Omni ran a series of requirements discovery workshops with the SWNS team, and physically shadowed key members of staff as they went about their daily routines. Together, we realised very quickly the scale of a system which was needed far exceeded original expectations.

We made a joint pitch to the Google DNI fund, and we were delighted to receive the joint funding we needed to get going in early 2016.

The Scope

The scale of meeting the requirements of the service - both in terms of storage and processing - are what make this an exciting project. At beta launch, the system was expected to receive a gigabyte of raw news stories a day, including text, image, video and audio. This data must be processed over to extract all relevant file-level and content-level metadata upon submission - and the resultant blob of information must be passed against the output of all client media outlets in order to establish evidence for a the usage of a given story. The size of memory and processing involved was considerable.

The most fundamental part of the successful delivery on this project is the Machine Learning - and ours and SWNS’ joint ability to train it in time for launch. We needed to answer some fundamental questions about the news stories themselves - who are the protagonists; what are the themes, who is saying what, to whom, where - and with this data we must demonstrate a better understanding of what news really is, and ultimately where it is being used.

First Steps to Success

The system can store all stories and their constituent media - text, images, video and audio - and all file-level metadata is extracted automatically.

The architecture of the system itself is service-based - meaning that the whole system is in fact the output of several smaller services. Each service is currently hosted with Digital Ocean, and is built largely in Python. Django and PostgreSQL have been used for the services which provide storage and handle serving data to the users.

The project is run with an Agile methodology, with the first four weeks were focussed solely on establishing a basic understanding of the system and SWNS’ own domain knowledge. Omni did this by running a series of workshops and testing assumptions made. We prototyped quickly in Python and Django using test-driven-development, and using our early prototypes we gathered feedback from SWNS and further refined our ideas.

A key benefit of this approach is that many of the SWNS staff members would effectively grow their confidence in the system as the system itself grew in functionality. Our experience showed that this approach means that there’s less of a system shock on launch for admin users, and the burden of training is greatly reduced.

Precisely because we ran the project in this way, we were able to identify and adapt to change and deliver far greater value with far more appropriate software than could have been planned.

The Next Phase

After the storage mechanism and overall architecture were in place, our workflow divideD somewhat. On one hand, we began load testing the architecture against the expected volumes at launch, as well as against certain growth scenarios. Our aim here was to test our implementation, improve and refine where possible, and assess the lifecycle of the system.

At the same time, we moved on to build the billing and editorial workflow systems, which allowed SWNS admins to trace the provenance of all stories and pay contributors in the most open, transparent and swift way the news industry has ever seen.

Following this, we focused on creating the bulk of the Machine Learning suite and writing the spider and the story matching function, which fetched global news articles for the system to attempt to create weighted matches with stories and assets loaded by contributors.

Once the system was feature-complete, we moved onto a series of sprints to create the most fast and fluid experience for end-users.