Content Optimization Using AI, ML and NLP

Please Share the good word

Content Optimization – Just What the Doctor Ordered

We are living in the Information Age where there is a surfeit of information and content. This presents a conundrum for both producers and consumers of content. On the one hand, consumers can’t quite figure out what to read and what to omit. On the other hand, content publishers want content optimization. They want to tailor content for the audience, that is afflicted with attention deficit and information overload.

The Anatomy of Content Optimization

In order to address this problem, we can combine some aspects of Artificial Intelligence, Machine Learning and Natural Language Processing (AI, ML and NLP) for content optimization. We created a tool called DiCAP (Digital Content Analyzer and Predictor) precisely to address this content optimization problem. The objective  was content optimization for our social media channel, especially LinkedIn. The results exceeded even our most optimistic expectations, in terms of increase in impressions and engagement.

More importantly, we obtained an intuitive understanding of the critical success factors of the content optimization process. The algorithm is not a mere black box meant only for data scientists. We were able to understand the model and explain the impact of various factors in an intuitive manner. I have benefited from the insights gained, long after creating and first using the tool.

Icing on the Cake

However, I couldn’t help but think – Am I kicked about the tool only because it is my baby? After all, I am neither a trained linguist nor engineer / programmer and certainly not a data scientist / statistician. I would naturally be excited if I could put together something combining Content, AI, ML and NLP!

We wanted external validation for this content optimization tool. We wanted to figure out how strong does it emerge when pitted against other AI-based innovations? With this thought process, we entered the fray for the Financial Times Asia-Pacific Innovation award. We were now up against global organizations with innovation budgets that were higher than our revenues itself!

This turned out to be the proverbial icing on the cake and more. This project won the Financial Times award for the Most Innovative use of Technology & Data in Asia-Pacific. This achievement was even more significant because we did this without a formal in-house data scientist, without relying on external analytics and / or AI partners.

Key Learning about Content Optimization

It would be a shame if I omit to capture some of the key learning from this process. I have distilled some aspects common to AI, ML and NLP based solutions to business problems. Many of these themes can be applied in any business context, and not just content optimization. Without further ado, here are some of my thoughts:

  • Data Strategy

Data science professionals spend most  of their time on mundane tasks to acquire, clean and massage the data. This has been true in the eras of Business Intelligence, Analytics, Big Data, Artificial Intelligence and Data Science. The common problem – irrespective of the age that we live in and the jargon that we swear by – continues to be quantity and quality of data.

Consequently, a well-thought through data strategy is critical. One of the elements that worked well for us was the fact that we had compiled data sets and annotated features by hand. We were doing this even before conceptualizing the tool to gain insights by analyzing or merely glossing over the data.

The availability of good quality data also ensured that the noise was much lower. Hence, we could not only build a model quickly, but also build our model with a considerably smaller training sample. This is often one of the biggest stumbling blocks in applying advanced quantitative techniques in a B2B  context.

  • Domain Matters

In addition to the good quality of data, the data science world spends a lot of time on feature engineering. In layman terms, this really means understanding the domain well and creating variables that are likely to impact the outcome.

We built the model based on insights of the functional people who understood the domain. For instance, we added several variables to our content optimization problem, such as category and genre of content, people mentioned in the post, etc.  This supplemented features that would be relevant to any content optimization problem such as day of the week, weekday or weekend, gap since previous post, etc.

  • Pictures Speak a Thousand Words

This traditional adage is even more relevant in the era of information explosion. The rationale is obvious. The readers are inundated with content and their attention spans are now officially lower than that of a goldfish!

Under such circumstances, it is not reams of text but a picture that catches one’s eye. We did some analysis to review the impact of incorporating images in our posts. This is only the starting point and the analysis can become far more sophisticated. Some of the ideas include categorization of images using Deep Learning and then using the output of the model as a feature into our content optimization problem. Something that can be done far more easily is manual classification into 2-3 major types of images, depending on the intuitive understanding of the impact that it may have on the content engagement. One of the useful dimensions was whether the image contains faces / people or abstract patterns with the intuition that the former is likely to be more engaging.

  • Meaning Based Computation

Significant advances have been made in the last decade in the area of Natural Language Processing (NLP). One of the corollaries of these advances is that the meaning of words can be encoded in mathematical vectors. This mathematical representation can then be fed into AI algorithms to predict outcomes. For instance, a post about a Tribunal proceeding will automatically be treated as more proximate to a hearing in a High Court as compared to an M&A transaction. This will be done by the inherently intelligent AI models, despite presence of common words in the posts and of course, without manual labor.

Such inputs significantly enhanced the quality of the model and it’s results. This is because each word of a 100 word post would get represented in vectors that had hundreds of dimensions. The manual equivalent of this would be to sit through each word and categorize each word and post on a scale of 0 to 100 on hundreds of dimensions! Thanks to the AI teams at Google, Facebook and Stanford University, who have made such data and techniques available to the world at large.

  • Emotional Intelligence, Redefined

The power of meaning based computation does not stop there. In fact, the fun has just started. Since we have been able to decode meaning of words and articles in mathematical forms, we can literally think of any application.

The entire traditional world of text processing has suddenly metamorphosed into the digitally enhanced and naturally beautiful butterfly!

One of the most common use cases is binary classification of text, such as spam or ham (not spam). However, in the context of content optimization, we can look at other dimensions. The functional marketeer would want to appeal to the emotions of the reader. Hence, we looked at the emotional valence of each of our posts and this formed an important component of our final model. Does the content make one happy or sad; angry or amused?

As I mentioned, the fun has only started. You can dig deeper into your reservoirs of creativity and configure as many features as might appeal to your intuitive sense.

  • Connect the Dots

We have discussed several factors that originate across various disciplines. The key in achieving the desired business outcomes, is to bring this all together. Or in Steve Jobs’ immortal words, “Connect the Dots”.

This is probably more important now than when these words were made famous. Digital disruption has immense potential to bring about a paradigm shift in most of the things that we take for granted.

However, we need to be curious and question even ‘common sense’ or ‘ground truths’. Only then can we discover the next generation of possibilities.

However, that may be too much to ask of most people. Instead, if we can merely be observant and borrow ideas from one field and connect the same ideas to another area, a lot of magic is waiting to happen!

  • Design Thinking

In case most of the above appeared to be too technical or functional, I have saved the best for the last! While creating such models, we looked at how the user would want to interact with a tool such as the DiCAP. Will your average marketing or content person want to really dive deep into all the above mentioned factors? How will the creative soul react to being bounded by the rigors and constraints of the scientific world?

In order to take care of such reservations, we need to hide technical details from the user. Instead, we need to integrate the functionality with the workflow of the user or simplify it even further. Customer-centricity is the cornerstone of Design Thinking. It’s yet another example of how such simple concepts have been articulated and a new fad and jargon of “Design Thinking” has been born.

Anyways, let us get back to the point. This tool is intended to help Content and Marketing professionals. Since they are the users, the tool needs to be retro-fitted neatly into their workflow. It should answer the question of whether a given piece of content is likely to do well. While doing so, it can spin AI and NLP magic on the history of content engagement with the target audience, but the average user doesn’t care. It’s as simple as that. In case one wants to drill down further and understand the reasons for the predicted outcome, one can introspect the impact of various variables.

In Conclusion – democratAIze!

Sometimes, we get carried away by hype. Scratch that. We get carried away by hype most of the times. Applications of Artificial Intelligence are no exceptions. NLP is yet to reach that stage, since it’s hype index is relatively low.

It is important to go back to basics and fundamentals and ignore the hype. If we are looking at content optimization, we should obviously examine the key aspects.

Who is the audience? Are there different audience segments that have significantly different needs? Which needs are more important from a business standpoint? What is the platform? What is the emotional connect of the user with the content? How would you describe the Call to Action? Which emotions does it need to arouse? Do these parameters vary by the genre of content? What are the dimensions of the content that are important?

We must keep these questions in front of us while applying the cutting edge tools (whatever they may be called by the time you are reading this article). We must ask such fundamental questions before creating tools to solve business problems. After that, things will fall into place. Implementation of AI, ML and NLP for content optimization will appear to be common sense, rather than an arcane and fuzzy sci-fi concept.

If only common sense was so common, we would have democratized AI. Or as I like to say democratAIze!

Post Script

The cover image of this article appears to contain colorful stars in the backdrop of the Christmas night sky. However, what if each star represented a piece of content? And it’s color drawn from the genre of the post? The height in the sky determined by the impressions and the size of the star determined by the engagement? A picture then can speak much more than a thousand words, can’t it?

The visualization techniques were part of our endeavor to convert numbers and unstructured text into pictures and tell a story. Something we called Knowledge Visualization, or KnowViz. These were some of our offerings on a delicious innovation platter. You can read more about the Innovation Platter that we served at the Financial Times Innovation event in Hong Kong in 2018.

And finally, since this post is being published on Christmas, wish you all a Merry Christmas. The references to platter, cake and the Christmas night sky were not unintended. The trick, as I have been saying all along, is in deriving meaning from unstructured text!

Please Share the good word


  1. Hey Rajiv, well expressed.. and well presented with enough food for thought .. Kudos .. and Merry Christmas and Happy New Year to you too.. Cheers

    • Thanks Dinesh. Until I worked with you, I had never in my wildest dreams imagined that I would be doing SAP projects. And at that time, I never imagined that I would be getting so ‘deep’ into it. You never know what you can become!

Leave a Reply