LinkedIn User Uniqueness Study: Computation Methods and Selection Strategies

cover
30 May 2024

Authors:

(1) Ángel Merino, Department of Telematic Engineering Universidad Carlos III de Madrid {angel.merino@uc3m.es};

(2) José González-Cabañas, UC3M-Santander Big Data Institute {jose.gonzalez.cabanas@uc3m.es}

(3) Ángel Cuevas, Department of Telematic Engineering Universidad Carlos III de Madrid & UC3M-Santander Big Data Institute {acrumin@it.uc3m.es};

(4) Rubén Cuevas, Department of Telematic Engineering Universidad Carlos III de Madrid & UC3M-Santander Big Data Institute {rcuevas@it.uc3m.es}.

Abstract and Introduction

LinkedIn Advertising Platform Background

Dataset

Methodology

User’s Uniqueness on LinkedIn

Nanotargeting proof of concept

Discussion

Related work

Ethics and legal considerations

Conclusions, Acknowledgments, and References

Appendix

4 Methodology

4.1 Location Dimension

Using the location may substantially reduce the number of skills that make a user unique on LinkedIn. Because 99.47% of the profiles in our data sample released their location, it does not make sense not to use the location in a nanotargeting campaign in case it is available. Still, some users do not report their location. We aim to analyze both cases and estimate the number of skills that make a user unique when they report their location and when they do not report their location.

The initial conditions for the audience size in each of these cases differ. For instance, if we only consider skills, the starting point is the worldwide audience size reported by the LinkedIn Ads manager when carrying out our research, which included 780M users. In contrast, if we also consider the locations, the starting point for a given user is the audience size of the reported location. For instance, at the time of writing this paper, the audience size for the US, the state of New York, and New York City was: 200M, 12M, and 7M, respectively.

4.2 Methodology to compute Np

For each user, ui(i ∈ [1, 1699]), we leverage the LinkedIn Campaign Manager to obtain the audience size from a combination of N skills with N ranging from 1 to 50. We limit the number of skills to 50 because it is the highest number of skills a user can report on LinkedIn.

For N=1, we obtain a vector with 1699 different audience sizes by selecting one skill per user from the skills they have reported in their profile. For N=2, we get another vector with 1699 different audience sizes by selecting two skills per user from the skills they have reported in their profile. We repeat the same operation for N = 3, 4, ..., 50. At the end of the process, we have 50 vectors that show the audience size distribution for each value of N.

We note the number of users reporting N skills in their profile is lower as N increases. Therefore for a given value of N, the vector will include as many samples as the number of users in our dataset that have reported N or more skills in their profile. Figure 3 shows the vector length for each value of N and the two cases considered in our analysis: skills and location+skills. We acknowledge that the number of samples for N ≥ 30 may be small (< 500 users), but as we will show in the next section, our methodology will not rely on those vectors to compute the number of skills that make a user unique on LinkedIn.

4.3 Skills Selection and Scenarios

In this work, we will compare two different strategies. In the first strategy, we randomly select the skills from the skill set. In contrast, the second strategy sequentially adds skills from the least to the most popular. We refer to the former strategy as Random Selection and the latter as Least Popular Selection.

Overall, we will apply the described methodology in 4 different scenarios:

• Sk_R : In this scenario, we only use the user’s skills, selected following the random strategy.

• Sk_LP : In this scenario, we only use the user’s skills, selected following the least popular strategy.

• Lo_R : In this scenario, we use the reported location and skills of the user, selected following the random strategy.

• Lo_LP : In this scenario, we use the reported location and skills of the user, selected following the least popular strategy.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.