Showing posts with label network science. Show all posts
Showing posts with label network science. Show all posts

Tuesday, 19 April 2016

What Twitter suggests for the New York primary

With Donald Trump's overwhelming dominance, there are no surprises on the Republican front. However, the remaining primary elections hold the answer to perhaps the most important question for the Democrats: Clinton or Sanders? According to Twitter, Sanders

In order to gain a better understanding of the possible outcomes of the upcoming, and highly critical, New York primary, our team at Diktio Labs took a different approach and left the polls behind. Instead, we monitored and analyzed activity on Twitter. 

Between April 11 and April 15, 2016, we analyzed 151,965 tweets by 36,703 accounts containing the hashtag #NYPrimary, with the purpose of identifying key influencers, topic clusters, and of course the popularity of candidates in the Empire State. 

#Trump2016 vs. #FeelTheBern

Despite the fact that Hillary Clinton is more widely considered to emerge victorious from the NY primary, and eventually become the candidate of the Democrats, Bernie Sanders and his supporters have a much stronger online presence. In fact, Sanders is the only candidate, thanks to his active supporters, who has a slight chance of diluting Trump's online dominance. 

Sanders supporters tweet vigorously, hence they represent over 50% of the Top 30 most active accounts. (We have manually removed bots.)
When it comes to influence however, results look slightly different. The Top 30 most retweeted accounts are in favor of Trump, but only with a mere 47%.

We categorized the most influential accounts by different dimensions of: well-connected, most mentioned, accounts with most outward activity and community brokers.

Well-connected accounts are those that receive the most attention in the NY primary conversations on Twitter from other accounts that have many followers. Trump’s supporters dominate this list. Noticeable are media outlets, such as NY1, CNN, POLITICO, and ABC News, and journalists, like Jeanine Ramirez covering stories around the primaries, who are often cited and followed by users interested in the primaries.

For most mentioned, there is only one account that makes it to the list for Cruz, four supporting Clinton, while the rest of the top 30 are Trump and Sanders supporters.

Most outward activity: accounts producing the most tweets in support for a candidate seem to be overwhelmingly pro-Sanders. However, most of them do not make it into the well-connected list because they are rather poorly embedded into Twitter conversations around the election.
    
Community brokers are accounts that bridge conversations about their preferred candidates between groups of users who would otherwise not communicate. They mostly engage in debated topics by following conversations in other camps and replying to those.  

In our analysis, we also examined the top 30 hashtags mentioned in order to gain a better understanding of the topics surrounding the New York primary.

Many hashtags are used to reference events or places. They are usually used in conjunction with candidate-related hashtags to mobilize followers to join rallies, follow ongoing or future events, and encourage voting.

Trump’s campaign-related hashtags receive the most attention, followed by Sanders. Clinton is referenced mostly through the hashtag #ImWithHer, rather than nominally. The supporters of Cruz mention their preferred candidate much less than the other supporters.

The hashtag #NeverTrump is considered the only negative tag in the Top 30.

In order to identify the top 30 most mentioned (interesting/debated/adored/hated) candidates, we took a look at candidate mentions, on a user level. Example: how many users talked about candidates, during the analyzed time frame. We have visualized our results on a network map. The size of nodes and labels indicates the number of users mentioning the name of the candidate, while the connecting lines (edges) reflect how many users talked about both candidates. The color of the node indicates party affiliations: red is for Republicans, blue is for Democrats.



The most mentioned candidate was Donald Trump with more than 2,000 more users talking about him than about Bernie Sanders. Kasich was the least mentioned candidate. Clinton and Cruz are mentioned almost by the same number of users, which is less than half the number of users mentioning Trump.

In terms of users talking about two candidates, the biggest overlap is between Clinton and Sanders, followed by Trump and Cruz. The difference is more than 1,000 users. The remaining combinations were not so frequent apart from that of Clinton and Trump, mentioned by two percent of the users using the hashtag #NYPrimary.



This interactive, clickable and zoomable network map, (filtered to indicate top connections only) depicts what hashtags were mentioned together within the same tweet in our dataset. The thicker the edge, the more often the two hashtags were mentioned together. Node and label sizes indicate how often the given hashtag was mentioned together with other hashtags [network science metric: weighted degree].

Over 19% of the tweets associated #NYPrimary with #Trump2016, and almost 16% with #FeelTheBern. #ImWithHer: 8.6% of tweets.  The different color clusters depict various topics around the NY Primary. Dark orange hashtags represent upcoming primaries: #PAPrimary, #MDPrimary, #DEPrimary, #CTPrimary, #RIPrimary, dark blue ones are the most associated hashtags with New York, while the light orange hashtags with the Democratic party. 

Will the domination on Twitter manifest in the elections?

So far, our analyses clearly show that Trump and Sanders dominate Twitter conversations about the NY primary, each in their own community of supporters and opponents. The NY primary are expected to have a higher than average turnout in the state of New York, which might bring surprises, at least for the Democrats, where, according to polls, the two candidates are in a tight race for the win.

Please bear in mind that an online analysis does not intend to replace the offline polls or in itself cannot predict the results of the actual elections. Our experience in political analysis suggests that smaller parties and “anti-establishment” candidates tend to be more active on social media. The NY primaries are further complicated by the fact that the registration was over six months ago, thus however successful a candidate is on social media now, it might be too late to turn the popularity gained into actual votes. Given the race is not over between the candidates, there is a lot to learn in the forthcoming months.

Are you interested in the methods behind our analysis, and how you could benefit from online community mapping? Reach out to us via email at info@diktiolabs.com

Wednesday, 16 July 2014

Maven7 Introduces Dedicated Brand for Community and Influencer Mapping

Diktio Labs aims to transform the marketing sector by bringing network science to the fingertips of all marketing professionals


Maven Seven Inc. (Maven7), one of the world's leading network research and data mining companies, has announced the introduction of Diktio Labs™, its new suite of online community and influencer mapping solutions.

Developed to empower marketing professionals by integrating network science into the campaign planning process, Diktio Labs is set to transform the digital marketing sector. For the first time, any proactive informational outreach campaign will have at its disposal a range of innovative and cost-effective planning tools driven by powerful network science.

Diktio Labs' data-driven approach stands in stark contrast to commonly used volume-based monitoring tools and methods. Traditional tools track campaign outputs, monitoring sentiment, conversations, trends and behavior. By contrast, Diktio Labs aims to map networks, hidden structures, relationships and both individual and group behavior across a range of client-specific criteria (issues, behavior, beliefs, preferences, patterns, etc.). Its algorithms and scrapers are tailor-made to each campaign, offering insight and intelligence to a level never before possible.

"Modern companies have to understand better how their customers seek out information online," said Chris Dobson, a seasoned communications agency manager who is driving Diktio Labs' to-market strategy.

"A huge amount of money is wasted through trial and error within digital and, indeed, traditional marketing, mostly completely unnecessarily. We now have the technology to identify communities of interest by a myriad of criteria and help transform any outreach campaign, be it political-, marketing- or social-based, in terms of insight and accuracy. Anyone in these areas not embracing network science now will soon be left behind."

A co-founder of Maven7 is the globally-acclaimed pioneer of network theory, Albert-László Barabási, based at Northeastern University in Boston. He also believes that the time has come for all marketing to take full advantage of network science: "We are constantly surrounded by hopelessly complex systems, from our society to computers and cell phones, to even the networks of the billions of neurons in our brains. These systems, although random looking at first, display endless signatures of order and self-organization, which can be understood, quantified, predicted, and eventually controlled. Show me a marketer that wouldn't want to take advantage of these advances to exploit that information."

Diktio Labs' products are available in almost all languages, and have already been used across over 15 countries, as diverse as Columbia, China, Mexico, Russia, Hungary, the UK and the US. Tailor-made algorithms specific to each client and project guarantee accuracy and relevance levels far beyond 'off the shelf' models.





About Maven7 
Maven7 supports business decisions by transforming large amounts of hard-to-interpret data into actionable business intelligence. Based on the methodologies of network analysis and data mining, Maven7 has developed its own proprietary network mapping tools to conduct analyses in organizational development, social media, the pharmaceutical/medical industries and many other fields.

About Diktio Labs 
Diktio Labs takes marketing to a whole new level of targeting and efficiency by enabling organizations to know exactly who will prove receptive, influential and responsive to their messaging prior to a campaign launch. Diktio Labs works directly with end-user clients and also partners with a number of marketing advisors and agencies to deliver its technology into a fully integrated campaign.

Monday, 27 May 2013

Is football really a simple game?! The hidden networks behind Bayern's success!



The infographic was created by Avalanche. CLICK FOR FULL SIZE

With the power of network visualization, dynamics of football games can be understood better than ever. Maven7’s analyst team is a huge fan of sports (check out our last analysis about the chances of the Hungarian water-polo team at London Olympics), especially football. 

As everybody knows it, "football is a simple game; 22 men chase a ball for 90 minutes and at the end, the Germans always win". So then why do so many people admire this simple form of entertainment? Why do dozens of analysts try to predict who will win a certain game or championship? Why is betting a huge business? The answer is as simple as football, because this game is not simple at all! Behind every pass, attack and goal, human dynamics have a strong impact. Network Analysis can give a new approach to understanding team dynamics during football games. 

Our recent infographic shows the hidden networks of two finalists of Champions League’s 2013. Let’s face the big question; can network science provide the answer why Bayern won and not Dortmund? 

If you look at the pictures, similarities and differences are easily noticeable. Network structures and patterns resemble each other because of the same line-up structure. Two defenders (greens) had strong mutual pass connections at both teams, but Dortmund focused on the right and Bayern on the left back. Teams have preferred defensive midfielders - Schweinsteiger and Gündogan, they were the top choice to pass to in midfield. OK, so both teams are German and both have same line-ups, but what isthe difference then?

Why did Bayern win?

Dortmund’s midfielder, Reus was the preferred player to pass to from the attacking midfielders. The penalty that Dortmund received also came from a situation after a pass to Reus. 

At the attacking midfield, Bayern is more active on the wings, and their whole network is not that centralized as Dortmund’s. Bayern’s midfield played in a better cooperation; their network shows more mutual connections, and Ribery’s supportive role on the left wing makes the whole attacking part very successful. Unfortunately, Dortmund’s attacking midfield has no mutual connection, and the whole midfield has only one as well. In comparison; Bayern’s attacking midfield has mutual connection between Robben and Ribery, and the midfield also has 3 mutual connections (Schweinsteiger - Ribery, Müller – Robben, Ribery – Martinez), which may show stronger cohesion in the midfield. 

Also, the midfield players’ performance of the two teams indicates their teams’ performance. Schweinsteiger played and passed more actively and punctual (87 tries, 73 times successful – 84%) than Gündonan (56 tries, 31 times successful – 62%), and while Bayern had altogether 640 passes and their efficiency was 72%, Dortmund had only 448 passes with 60% efficiency. 

An interesting fact is, that those attacks, which started from the goalkeeper, are more likely happening by the players of Dortmund. In general, Dortmund’s defense played a more attacking role; while Dante passed mostly to the back, Boateng passed to the front. 

Monday, 18 March 2013

The Harlem Shake Story - aka. Birth of a Meme

If you still have not heard of the Harlem Shake you must be living in a cave. Much has been written about the rapid and global spread of this catchy internet meme, yet little is understood about how it spread. A series of remixed videos along with a number of key communities around the world triggered a rapid escalation, giving the meme widespread global visibility. Who were the initial communities behind this mega-trend? SocialFlow took a look at 1.9 million tweets during a two-week period that included the words ’harlem shake’, or some versions of it.

The Harlem Shake itself is a dance style born in New York City more than 30 years ago. During halftime at street ball games held in Rucker Park, a skinny man known in the neighborhood as Al. B. would entertain the crowd with his own brand of moves, a dance that around Harlem became known as 'The Al. B. Though it started in 1981, the Harlem Shake became mainstream in 2001 when G. Dep featured the dance in his music video "Let's GetIt". While mining Twitter data, references to Harlem Shake (the original dance) were seen quite often prior to it becoming a popular meme. When someone tweets, "I just passed my final exams! *harlem shakes*," it's the equivalent of saying "I just passed my final exams! Look at me dancing!" While Bauuer's now infamous track was released on Diplo's Mad Decent label back in August 2012 (posted to YouTube on August 23 2012), it only accrued minor visibility for the first few months. Then February hit, and something changed.

The timeline below highlights the very first days as the meme was taking off. In blue, we see references to the 1980's dance *harlem shakes*, while the green curve represents Tweets that use the phrase 'The Harlem Shake', many of them linking to one of the first three versions of the meme on YouTube.

On February 2, The Sunny Coast Skate (TSCS) group establish the form of the meme in a YouTube video they upload. On the 5, PHL_On_NAN posts a remix (v2), gaining 300,000 views within 24 hours, and prompting further parodies shortly after. On Feb. 7, YouTuber hiimrawn uploaded a version titled "Harlem Shake v3 (office edition)" featuring the staff of online video production company Maker Studios. The video becomes is a hit, amassing more than 7.4 million views over the following week, and inspiring a number of contributions from well-known Internet companies, including BuzzFeed, CollegeHumor, Vimeo and Facebook.



Social Flow looked at the social connections amongst users who were posting to the meme. This gave them the ability to identify the underlying communities engaging with the meme at a very early stage. In the graph above each node represents a user that was actively posting and referencing the Harlem Shake meme on Feb 7 or 8 to Twitter. Connections between users reflect follow/friendship relationships. The graph is organized using a force directed algorithm, and colored based on modularity, highlighting dominant clusters - regions in the graph which are much more interconnected. These clusters represent groups of users who tend to have some attribute in common. The purple region in the graph (left side) represents African American Twitter users who are referencing Harlem Shake in its original context. There's very little density there as it is not really a tight-knit community, but rather a segment of users who are culturally aligned, and are clearly much more interconnected amongst themselves than with other groups.



After a similar analysis on the following two days (Feb 9 and 10) different communities can be seen emerging, resulting in a much more tightly knit graph structure. While the same dense cluster of musicians and DJs (in turquoise) still exists, there are substantially more self-identified YouTubers both across the US and the UK. At the same time there's a significant gamer / machinima cluster that's also participating, as well as a growing Jamaican contingent, and quite a few dutch profiles (purple -- left). Additionally, we see various celebrity and media accounts who caught on to the meme -- @jimmyfallon, @mashable and @huffingtonpost. By capturing the two snapshots, we can also make sense of the evolution of the meme as it becomes more and more visible. At first, loosely connected communities separately humored by the videos. Within days, we see major media outlets jump on board, and a much more intertwined landscape. We see different regions in the world light up, and identify communities of important YouTube enthusiasts who effectively get this content to spread.



Memes have become a sort of distributed mass spectacle, a mechanism that both capture people's attention, and define what is "cool" or "trendy." We see more and more companies and brands try to associate themselves with certain memes, as a way to maintain a connection with their audience, gain the cool factor. Pepsi did this with the Harlem Shake and saw an incredibly positive response. 


As we get better at identifying these trends and trend-setting communities early on, the pressure to participate will rise. As social networks become globally-intertwined, we're witnessing a growing number of memes conquer the world at large. These moments are critical points in time, where there are significant levels of attention given towards a specific entity - be it a joke, funny video or a political topic. Piecing together data from social networks can help us identify critical points in time, as well as the underlying communities and trendsetters for the humor-based memes, or the agenda setters for politically-slanted ones. The only question is: what will be the next one, cashing in on it 15 minutes?

Hungry for more? Read the full article on HuffPost.

Wednesday, 13 February 2013

Manchester City vs Liverpool: Passing network analysis


At the beginning of February, Manchester City drew 2-2 with Liverpool at the Etihad, so a football loving blog decided to take a look at the match from a network point of view, resulting in the following research. We have already reported about something similar regarding basketball.



The positions of the players are loosely based on the formations played by the two teams, although some creative license is employed for clarity. It is important to note that these are fixed positions, which will not always be representative of where a player passed/received the ball. Only the starting eleven is shown on the pitch, as the substitutes weren’t hugely interesting from a passing perspective in this instance. Only completed passes are shown. Darker and thicker arrows indicate more passes between each player. The player markers are sized according to their passing influence, the larger the marker, the greater their influence. The size and colour of the markers is relative to the players on their own team i.e. they are on different scales for each team.

In the reverse fixture, Yaya Touré and De Jong were very influential for City but Touré was away at the African Cup of Nations, while De Jong joined Milan shortly after that fixture. Their replacements in this game, Barry and Garcia, were less influential, although Barry had the strongest passing influence for City in this match, with Milner second. The central midfield two, Lucas and Gerrard, were very influential for Liverpool and strongly dictated the passing patterns of the team. They both linked well with the fullbacks and wider players, while Lucas also had strong links with Suárez and Sturridge. Certainly in this area of the pitch, Liverpool had the upper hand over City and this provided a solid base for Liverpool in the match.
Similarly to the Arsenal game, Liverpool showed less of an emphasis upon recycling the ball in deeper areas. Instead, they favoured moving the ball forward more directly, with Enrique often being an outlet for this via Reina and Agger. Liverpool’s fullbacks combined well with their respective wide-players, while also being strong options for Lucas and Gerrard. Strurridge was generally excellent in this match and was more influential in terms of passing than in his previous games against Norwich and Arsenal, combining well with Suárez, Lucas and Gerrard.
At least based on the past few games, Liverpool have shown the ability to alter their passing approach with a heavily possession orientated game against Norwich, followed up by more direct counter-attacking performances against Arsenal and Manchester City. The game against City was particularly impressive as this was mixed in with some good control in midfield via Lucas and Gerrard, which was absent against Arsenal. How this progresses during Liverpool’s next run of fixtures will be something to look out for.


Thursday, 3 January 2013

Social Media and the Power of Networks 2. – Key Opinion Leaders on Twitter


The increasing impact of social media gives modern marketing a lot to think about; Facebook, Twitter, Tumblr, Flickr, Pinterest, Google+ and hundreds of blogs are only the tip of the iceberg, and it seems impossible to be up-to-date on all the channels. To look at them one by one seems illogical, since the key aspect of the generated content lays in the network effect, that enables the vast exchange of information. What remains to be done? This three-part series introduces Maven7’s newest research focusing on the network effect, and therefore making life easier for online marketing, PR, and product management experts.
In contrast to the Facebook-boom that began 2-3 years ago, and reached it’s 3 million user population in Hungary last year, the Twitter community seems to be growing at a slower pace. The Twitter company was launched in 2006 in San Franscisco, and has around 30 thousand Hungarian visitors a day, similar to the blog hosting site Tumblr.
Why bother with them at all – you may ask? The majority of Twitter and Tumblr users come from an urban environment, most of them are high-status people living in Budapest. Microblogs spread information – especially negative ones – very fast. Here is a comparison: a „tradiotional” online medium might be busy with a story for a whole week, whereas on Twitter – given that the right person spreads it – the same information is distributed within 2.5 hours! Therefore it is of great importance, to keep these outlets under control as much as possible. It is not a coincidence, that Hollywood celebrities like Charlie Sheen (with his 7.5  million followers) get paid around 50thousand dollars per tweet. Our survey conducted during Spanish election season showed that even an average person can have substantial effect on voters. This leaves no second thoughts about monitoring the information that gets to these loyal, high presitge consumers.
National key opinion leaders (famous journalists, bloggers, athletes) are active on multiple scial media platforms, but the small number of follower bases point to the fact, that the person with the most followers is not neccesary the most influental one, when it comes to information distribution. We need to find out, which tweeter is the most relevant one, and has the power to form opinions when it comes to our products. We can achive this through Twitter data using the methods of data mining. The user’s position in the network is another key factor (i.e. how many followers does the user have in common with our competing brand). Compared to Twitter, Facebook has open activity data, which means that we can easily access information regarding the users network of contacts.


Social Media and the Power of Networks 2. – Key Opinion Leaders
Social Media and the Power of Networks 2. – Key Opinion Leaders


There are multiple ways we can build networks from the connections of Twitter users. First of all we can regard the distributors (people related to the brand,  or the brand’s official page) as the source of information, and link individual users to them, based on who retweeted the source’s message. Furthermore, the users themselves have followers and friends online, the latter one representing a stronger status, that can be interpreted as a network itself (
for more, check our previous article on a follower- andfriend-based network). The picture shows a network of retweeted messages related to an FMCG product distributor and its competitors.
Social Media and the Power of Networks 2. – Key Opinion Leaders on Twitter
Social Media and the Power of Networks 2. – Key Opinion Leaders on Twitter pic 2.
The second picture represents the choice between data sources, that have the most influence on our consumer basis. The yellow boxes are the key opinion leaders(KOLs), who can reach out to the major part of the community in only three steps. They hold a central position in the network, because they have the biggest follower- and friendbasis.
Through analysis of Twitter data we can not only locate the key opinion leaders and characters of a brand, but with the help of location information we can also interpret product placement related research. A good example of using location data is our previous article on the optimallocalization af ATMs. 

To be continued.

Friday, 26 October 2012

If Achilles Used Facebook...

In a study published in Europhysics Letters, scientists use a mathematical approach to examine the social networks in three narratives: “The Iliad”, “Beowulf” and the Irish epic “Tain BoCuailnge.” If the social networks depicted appeared realistic, they surmised, perhaps they would reflect some degree of historical reality.

When we pick up a mythological text like “The Iliad” or “Beowulf,” we like to imagine that the societies they describe existed. Even if the stories are fiction, we believe that they tell us something about ancient Greece or the Anglo-Saxons, and that some of the characters and events were based on reality.
1.Howard David Johnson - Victorious Achilles
“Beowulf” is an Anglo-Saxon heroic epic, set in Scandinavia. Notwithstanding obvious embellishments, archaeology supports the historical authenticity associated with some of its characters. The main character, Beowulf, is believed to be fictional. “The Iliad,” is an epic poem attributed to Homer dating from the eighth century B.C. Some archaeological evidence suggests that the story is based on an actual conflict. We contrasted those two narratives with the Irish epic “Tain Bo Cuailnge” (usually called the “Tain”), which most believe to be completely fictional. The “Tain,” which survives in three manuscripts from between the 12th and 14th centuries, concerns a conflict between Connaught and Ulster, Ireland’s western and northern provinces.
2.Hans W. Schmidt Beowulf Illustration
To construct the social networks in each of the narratives, researchers created databases for the characters and their interactions, and categorized their relationships as hostile or friendly. The myth networks were found to have some of the characteristics, including the small-world property and structural balance (related to the idea that the enemy of my enemy is my friend), typical of real-world networks.
3.Táin Bó Cúailnge
The results showed, that all three were scale-free, unlike any of the intentionally fictional narratives they have examined. However, in the Irish myth, the top six characters are all unrealistically well connected. There are 398 other characters in the “Tain,” but after remove the weakest links (or single, direct encounters) between these characters and the Top 6, the narrative becomes as realistic as “Beowulf” from a social-network view. Perhaps these characters are amalgams of a number of entities that were fused as the narrative was passed down orally.

The studies approach is different from traditional approaches to comparative mythology. It is not literary analysis; it tells us nothing about events or the human condition. Instead, it promises a new way to analyze old material and find striking new perspectives and evidence — in this case, that which we call “myths” may not be as mythical as we thought.


Read the full article on NY Times.

The Web of Modernism - How Abstract Movements Spread Across The Globe

The Museum Of Modern Art (MoMA) in New York is currently hosting an exhibition called Inventing Abstraction: 1920-5. They took the opportunity to unreaval a graphic representation of the birth of modern art. 
The image, which was designed for their upcoming show Inventing Abstraction: 1910-1925 (December 23, 2012–April 15, 2013), is an obvious node to Alfred H. Barr Jr.’s important Cubism and Abstract Art chart that accompanied a show of the same name at the MoMA in 1936.
1. Barr's Original for the 1936 Exhibition 

This web of relationships goes beyond visual art to incorporate musicians like Claude Debussy, writers like Guillaume Apollinaire, and choreographers like Vaslav Nijinsky, and gives us the most complete picture of abstractions transcontinental roots we’ve ever seen.
2. Info Graphic for the 2012 Exhibition (Click Here to Enlarge)
The Americans, centered on photographer Alfred Stieglitz, branch out to include Max Weber, Marsden Hartley, and others. There are obvious Italian, Russian, British, Dutch and other clusters but the image connects the dots between figures we may not know were in contact. The Hungarian hub includes painter Sándor Bortnyik, and Bauhaus pioneer László Moholy-Nagy.The chart shows all known relationships that including those who have shared studios and even slept together.
For more, go to Hyperallergic or the MoMA homepage.

Friday, 21 September 2012

The Paradox Of Friendship – Why do our friends have more friends than we do?


What may look like a psychological phenomenon, is actually basic maths.

In a colossal study of Facebook by Johan Ugander, Brian Karrer, Lars Backstrom and Cameron Marlow,  examined all of Facebook’s active users, which at the time included 721 million people — about 10 percent of the world’s population — with 69 billion friendships among them. They found that a user’s friend count was less than the average friend count of his or her friends, 93 percent of the time. Next, they measured averages across Facebook as a whole, and found that users had an average of 190 friends, while their friends averaged 635 friends of their own.

Studies of offline social networks show the same trend. It has nothing to do with personalities; it follows from basic arithmetic. For any network where some people have more friends than others, it’s a theorem that the average number of friends of friends is always greater than the average number of friends of individuals.
This phenomenon has been called thefriendship paradox. Its explanation hinges on a numerical pattern — a particular kind of “weighted average” — that comes up in many other situations. Understanding that pattern will help you feel better about some of life’s little annoyances.


In this hypothetical example, Ross, Chandler, Phoebe and Rachel are four friends. Lines signify reciprocal friendships between them; two people are connected if they’ve named each other as friends.
Ross’s only friend is Chandler, a social butterfly who is friends with everyone. Phoebe and Rachel are friends with each other and with Chandler. So Ross has 1 friend, Chandler has 3, Phoebe has 2 and Rachel has 2. That adds up to 8 friends in total, and since there are 4 girls, the average friend count is 2 friends per girl. This average, 2, represents the “average number of friends of individuals” in the statement of the friendship paradox. Remember, the paradox asserts that this number is smaller than the “average number of friends of friends” — but is it? Part of what makes this question so dizzying is its sing-song language. Repeatedly saying, writing, or thinking about “friends of friends” can easily provoke nausea. So to avoid that, I’ll define a friend’s “score” to be the number of friends she has. Then the question becomes: What’s the average score of all the friends in the network?

Imagine each person calling out the scores of his/her friends. Meanwhile an accountant waits nearby to compute the average of these scores.
Ross: “Chandler has a score of 3.”
Chandler: “Ross has a score of 1. Phoebe has 2. Rachel has 2.”
Phoebe: “Chandler has 3. Rachel has 2.”
Rachel: “Chandler has 3. Phoebe has 2.”

These scores add up to 3 + 1 + 2 + 2 + 3 + 2 + 3 + 2, which equals 18. Since 8 scores were called out, the average score is 18 divided by 8, which equals 2.25.
Notice that 2.25 is greater than 2. The friends on average do have a higher score than the girls themselves. That’s what the friendship paradox said would happen.
The key point is why this happens. It’s because popular friends like Chandler contribute disproportionately to the average, since besides having a high score, they’re also named as friends more frequently. Watch how this plays out in the sum that became 18 above: Ross was mentioned once, since she has a score of 1 (there was only 1 friend to call her name) and therefore she contributes a total of 1 x 1 to the sum; Chandler was mentioned 3 times because she has a score of 3, so she contributes 3 x 3; Phoebe and Rachel were each mentioned twice and contribute 2 each time, thus adding 2 x 2 apiece to the sum. Hence the total score of the friends is (1 x 1) + (3 x 3) + (2 x 2) + (2 x 2), and the corresponding average score is


 Each individual’s score is multiplied by itself before being summed. In other words, the scores are squared before they’re added. That squaring operation gives extra weight to the largest numbers (like Chandler’s 3 in the example above) and thereby tilts the weighted average upward.
So that’s intuitively why friends have more friends, on average, than individuals do. The friends’ average — a weighted average boosted upward by the big squared terms — always beats the individuals’ average, which isn’t weighted in this way.

Like many of math’s beautiful ideas, the friendship paradox has led to exciting practical applications unforeseen by its discoverers. It recently inspired an early-warning system for detecting outbreaks of infectious diseases. In a study conducted at Harvard during the H1N1 flu pandemic of 2009, the network scientists Nicholas Christakis and James Fowler monitored the flu status of a large cohort of random undergraduates and found that people with more connections were infected faster.

For more analogies check out the whole article at a New York Times blog.

Monday, 17 September 2012

In the Mist of Drugs


A research from India takes a closer look at what our medicine cabinet is made of, with the help of network analysis.
It is a well-known phenomenon, that the demand on medicine increases year to year (the market produces an annual growth of 6%!). The industry has an income sum of 800 billion dollars per year, with India and China as the fastest growing markets, and an annual increase in demand over 15%. The top consumers are of course overseas. The Americans with their 320 billion dollar annual drug spending are responsible for more than one third of the industries income, a sum about three times larger than in Germany. Its hardly a coincident, that the number prescription drug abuse victims is growing as well. Last year only, about 27.000 people died prescription medicine related deaths, one in every 19 minutes. Livestock drugs are pretty common too, since factory farming procedures require to use antibiotics on animals.
The goal of the research was to understand drug consumption from a network point of view, and to learn what drugs consist of. American drug label databases served as sources of information, making over 70 thousand chemicals subjects of the analysis.

The picture above shows the whole network of ingredients, with 16,444 dots and 32,627 edges. You can notice at first sight, that clustering is present. the most common chemicals include Octinoxate, Titaniumdioxide, Octisalate, Oxybenzone and Avobenzone, that are ingredients in drugs and chemicals, sometimes even food colouring materials. Another center point is Triclozan, a commonly used antibacterial and antifungus chemical.


Alcohol is number 3 in the centrailty top 10.For more cool pictures and the top10 check out the original aricle at Web 2.0.

Monday, 25 June 2012

The power of network science, the beauty of network visualization


Albert-László Barabási started publishing a new network research book online. According to the author, this book aims to help anybody to understand the fundamental concepts of network research, so it presents many colorful and interesting real life examples, including the results of a research by Maven7The book is the result of a collaboration between a number of individuals, shaping everything, from content (Laszlo Barabasi) to visualizations and interactivetools (Mauro Martino), simulations and data analysis (Márton Pósfai).



You can download from the official website the first two chapters with slides and you may follow the development of the writing of the book on Facebook and Twitter.