Attributed Node to Vector

Social media platforms present themselves as large networks of users communicating and interacting through written messages. Their interactions are highly structured, and linked to a variety of factors such as their socio-demographic variables, their linguistic usage patterns, the distribution of their interests, or the structure of interaction networks (either conversation- or follower-based).

To our knowledge, current approaches to describing the dynamics of such systems do not capture the dependencies between these dimensions, as each corresponds to a different type of information on nodes and edges of a network. However, linguistic usage patterns are deeply linked to socio-economic variables, and topological network structure is a fundamental and coevolving component of the spread of information in the system. So there is much to be gained by integrating all the available information to identify patterns in the system.

Language and network on Twitter

In this project I focused on the relationship between language evolution and network structure, using a francophone Twitter dataset of 200M+ tweets and 2M+ users collected over a two-year period. With Jacob Levy Abitbol and Márton Karsai I explored the capabilities of deep learning approaches – which had not yet been fully applied to network-structured data – to unify the different sources of information available in Twitter and explore the relationship between topology-based communities and patterns of language use. The goal was to construct an embedding of users that let us infer correlations between linguistic variables, network structure, and socio-economic attributes.

Check out our paper Joint embedding of structure and features via graph convolutional networks, or the GitHub repo!