We built our ETFs with the aim to diversify across a number of sub-themes. This not only helps avoid concentration in a narrow set of areas and technologies, it also aims to capture the complete value chain of each theme. Taking digital economy as an example, this theme goes well beyond e-commerce – the full value chain also includes online payments, digital advertising, social media, cybersecurity and the sharing economy.
Because themes are forward-looking by nature, we find limited value in identifying relevant companies based on their historical data. Instead, we look to unconventional sources of non-financial data such as filings, news analyses and other media.
So how can you ensure nothing is missed when investing in a theme? That’s where AI plays a role. Stock selection is based on a series of ‘seed words’ chosen by MSCI and the thematic experts. These are terms relevant to each theme, and are used to identify appropriate companies based on their business activities.
But scanning companies’ public records for those terms only takes you so far. By using AI techniques such as natural language processing, MSCI can identify a much broader set of relevant keywords. These are then used to better target the theme and its entire value chain.
*Application Programming Interface (e.g. extraction of relevant Wikipedia pages). **Term Frequency-Inverse Document Frequency (TF-IDF) measures the importance of a word in a document within a collection of documents (“corpus”). ***Word Embedding, a Natural Language Processing (NLP) technique, helps identify keywords contextually similar to the seed word.
Corpus generation (via open-source search API)*
Word filtering (TF-IDF)**
Contextual similarity (Word Embedding)***
Fine tuning (exclude generic words, add synonyms..)
2Source: MSCI, Lyxor International Asset Management, as at 29/05/2019. For illustrative purposes only.
Using big data analysis, MSCI cross checks public records (e.g. corporate filings, annual reports) of all companies in the parent index – the MSCI ACWI IMI – for relevant key words defined in the previous step. Stock selection is based on two identifiers:
Smart Cities example3 Different scenarios for eligible universe inclusion
3Source: MSCI, Lyxor International Asset Management, as at 29/05/2019. For illustrative purposes only.