How Datacratic Applies Machine Learning Algorithms to Real-Time Data Flows
by Ciaran O'Kane on 30th Aug 2012 in News
James Prudhomme, Datacratic CEO & Jeremy Barnes, Founder & CTO, discuss the company’s real-time decicion engine for automated buying, the flexibility of the solution as well key trnds they currently see in the real-time market.
Can you give a brief overview of the Datacratic proposition?
James Prudhomme, Datacratic CEO: Datacratic is a software company that has built a real-time machine learning and predictive modeling platform. The platform itself is quite agnostic to the types of data it can optimise, however it is especially well suited to the real-time marketing and digital advertising world. We’ve built a number of applications on the platform, which are specific to the needs of DSPs, DMPs, Ad Networks, Media Trading Desks and e-commerce web sites.
There is a great deal of real-time data being generated, both by consumers and machines, across addressable digital marketing channels. Traditional methods of capturing data and applying predictive algorithms in an offline environment do not provide sufficient value. This is the problem we are solving.
It’s important to mention that we do not sell media or data, and are not brokers of either one. We sell software on a SaaS model. We are tech and not ads.
How is the product stack assembled? What are the component parts? Can you outline practical examples of the types of companies with which Datacratic has been able to partner?
Jeremy Barnes, Datacratic Founder and CTO: First we start with the platform. This has been designed with a highly flexible ETL for taking in vast amounts of real-time and batched data. The data is then made available to a modeling pipeline where various machine learning algorithms are used to train predictive models, the models are continuously updated and back-tested and the outputs or predictions are made available to our products and applications.
We have two primary applications: our Real-Time Bidder stack and our Audience Optimizer. The latter is used to generate lookalike models in real-time. These are both built on our machine learning platform and include a number of product support components.
So far, our customers tend to be ad technology companies or e-commerce web sites who have data and are looking to add an intelligent optimisation and real-time decisioning layer to their stack.
What problems does Datacratic solve right now? Isn't the core product capability something that can be "done in-house" by various different platforms in this space?
James: That’s an interesting question, and of course it is one we get quite often. First of all, the core problem we solve is around the ability to apply machine learning algorithms to real-time data flows. Up to now, people have been focused on capturing and storing “big data”, and there are a number of solutions to deal with this. Our platform is different in that we are able to make predictive models available in real-time, and we have very limited requirements in terms of hardware and infrastructure to capture and store data. We think of data as rivers and streams, as opposed to pools.
The real-time marketing and advertising ecosystem demands this kind of solution because the reality is that web users behave in real-time and businesses evolve in real-time. The minute you stop those real-time flows the data begins to decay and loses value. Secondly, it’s very difficult to deliver useful real-time predictions at a cost-per-transaction that makes sense. That is why a lot of companies that experiment with this kind of technology don’t end up doing anything significant with it.
It’s a highly competitive environment, and our solution can provide significant advantage to companies who may not have their main expertise in data science and machine learning. We like to say that we are democratising machine learning and predictive analytics and making it available to the 99% of companies who do not have access to this type of technology.
Broadly speaking, how important is the machine learning layer?
Jeremy: It is extremely important and is really at the core of our value proposition. We have machine learning in our DNA, but we look at the problem in a different way than most. When we step back and look at what the constituents in the ad tech ecosystem need from machine learning, we see that it is certainly not the best black-box algorithm developed by retired astronauts, nuclear scientists and former military operatives. Instead, it is a reliable way to use data to make predictions in real-time and at scale. Let’s get the algorithms into production where they can create value, and then see inside them so people can understand what data is really driving performance.
At Datacratic, we have focused our efforts on taking both existing and proprietary algorithms, putting them into a robust, real-time and data-agnostic platform, and making sure that they are unbiased and don’t fail in production. It is also extremely important to understand and illustrate why an algorithm has made a particular decision in a particular case. We draw from the latest developments in machine learning, but not at the expense of practicality and usability.
You recently did an integration with BlueKai. Will this lookalike modelling “module” start making this type of product offering more accessible and scalable to multiple numbers of companies?
James: Yes, that is why BlueKai decided to integrate our software. They want their customers to have access to sophisticated data modeling from directly within their DMP. Our functionality is integrated directly into the BlueKai interface, and their users can create lookalikes of their best customers. The workflow is entirely automated and the lookalike segments can appear in the existing BluaKai segment marketplace, or within their clients’ private first-party taxonomy.
Do you see Ad Networks being able to leverage the Datacratic offering to ultimately start improving their own data-led decisioning?
James: Absolutely, and we have gotten a lot of traction with ad-networks. Most successful ad networks tend to be data-driven in one way or another, and many are built around some notion of an integrated data platform. This allows us to plug in easily. Rather than building segments by hand, and then targeting them in a binary manner, we allow ad networks to model their users using their own data and then target those who have the highest probability of clicking or converting. This can provide significant performance lift, which translates into competitive advantage for our clients. Of course ad networks also tend to be very interested in our bidder stack, as they look to add RTB to their inventory supply chain.
In terms of the bidder stack you've created, how does this vary to other existing bidders in the market that are perhaps more static and rules-based?
Jeremy: Many of our clients and prospects have taken to calling our RTB stack “Bidder 2.0”. It was designed from the ground up to be integrated into other platforms and systems, and we offer it to clients on a hosted SaaS basis. We’ve used a very open and service-oriented architecture, this means that it scales easily. It’s been designed in such a way that it can either be used full stack, or from the exchange gateway on down. However, people who have an existing bidder can integrate directly into our real-time decisioning and optimisation layer. It’s all really quite flexible.
The key point of differentiation is our ability to bring user-level behavioural data, contextual information from the bid-stream, as well as click-stream data directly to the real-time bid decision. For example, the probability scores assigned to users by our Audience Optimizer can be used as a feature in the algorithmic model used by the bidder as it makes real-time impression level decisions.
There are theoretically no limits to the number of data sources our platform can use when making a bid decision and we can deal very well with both structured and unstructured data. We are not at all limited to making binary yes/no decisions based on whether or not a user is in a segment, or is visiting a specific url, or some other rules-based combination of those variables.
This means our clients can focus on creating and developing unique signals, be they behavioural, contextual, location-based or whatever -- and our platform will ensure these signals are used at bid time.
What observations have you made with regards to the current RTB landscape?
James: There are several, and there are a few which I think are having a fairly dramatic impact. First, I think the Facebook RTB announcement is a first step toward almost all PPC advertising going real-time, and this includes search. We’ve seen initial signs of this, and I think in the future SEM will include the ability to bid in real-time on both the keyword and the user; as opposed to just the keyword alone.
Next, I think there is a lot more sophistication around the use of data, and many of the traditional players in offline data-driven marketing are beginning to identify their opportunity in the real-time ecosystem. I expect to see this trend continue and expect to see some consolidation on the data side, driven by some of the larger traditional players.
Finally, I think we will continue to see both buyers and sellers join around the exchange and a gradual elimination of the layers that currently separate the two. We expect to see a massive increase in the number of real-time buy side actors. Be they agencies, ad networks, DSPs and others, and I expect to see more and more of them become active participants in the auction. This increase in bid density, competition for impressions and intelligence on the buy side, should ultimately result in improved yield for publishers. That, in turn, should result in publishers making more premium, high-quality inventory available on the exchange, and providing richer data around that inventory to allow it to be distinguished from the rest.
In terms of engineering, what does it take to create and maintain the products you've developed?
Jeremy: The way we develop products is to first build connectors into specific real-time data flows (for example, bid-stream and ad delivery data), then we configure the algorithms to predict the desired outcome (for example, impression value or conversion probabilities). Finally, we glue that together behind a control API and add in some product-specific machinery (for example, our economic modelling and RTB pacing technology). The entire stack is very componentised, which gives us a big advantage both in development speed and our ability to use additional data sources for decisioning and optimisation.
Inside the components there is a lot of low-level engineering work; we have domain experts in several areas. Currently, our team are approximately one-third data science, and two-thirds software engineering, so there is no question that we are an engineering-focused organisation.
Being a Canada-based organisation, where does the majority of business/partnerships come from? Presumably Canada itself is not the largest market for you?
James: Canada is a great place to start a software company, and Montreal is an amazing city with world-class schools and access to great engineering talent. The biggest drawback to being Canadian is that there are not that many of us, relatively speaking, and so yes, we are very focused on the larger markets in the Americas and are now beginning to actively explore opportunities in Europe.
What is on the roadmap? Any plans to open the API and let others build around the core machine learning stack?
Jeremy: That is definitely on the roadmap and forms a core part of the company’s big-picture vision. Internally, we already treat the machine learning platform in exactly this way. This is why you often hear us refer to our own products as applications built on our platform. In order to be successful as a platform, we need an ecosystem to develop around it. Our product set is intended to seed that ecosystem and we’ll look to our customers to drive this innovation over the next few years.
Ad NetworkAd ServerDataDisplayDMPDSPProgrammaticTrading
Follow ExchangeWire