CCCSM

Charting Collections of Connections in Social Media: Creating Maps and Measures with NodeXL

Thursday June 21, Full day workshop (morning and afternoon)

Abstract Networks are a data structure common found across all social media services.  Internet services that allow populations to author collections of connections are wildly popular and consequential. The Social Media Research Foundation‘s NodeXL project makes analysis of networks in general and social media networks in particular accessible to most users of the Excel spreadsheet application.  With NodeXL, network datasets become as easy to create as pie charts.  Applying the tool to a range of social media networks has already revealed the variations present in online social spaces.  A review of the tool and images of Twitter, flickr, YouTube, and email networks will be presented.

Description: We now live in a sea of emails, tweets, texts, posts, blogs, updates and check-ins coming from a significant fraction of the people in the connected world.  Our personal and professional relationships are now made up as much of texts, emails, phone calls, photos, videos, documents, slides, and game play as by face-to-face interactions. Social media can be a bewildering stream of comments, a daunting fire hose of content.  But with better tools and a few key concepts from the social sciences, the social media swarm of favorites, comments, tags, likes, ratings, and links can be brought into clearer focus to reveal key people, topics and sub-communities.  As more social interactions move through machine-readable data sets new insights and illustrations of human relationships and organizations become possible.  But new forms of data require new tools to collect, analyze, and communicate insights.

A new organization, the Social Media Research Foundation  (http://www.smrfoundation.org), has been formed to develop open tools and open data sets, and to foster open scholarship related to social media.  The Foundation’s current focus is on creating and publishing tools that enable social media network analysis and visualization from widely used services like email, Twitter, Facebook, flickr, YouTube and the WWW. The Foundation has released the free and open NodeXL project (http://www.codeplex.com/nodexl), a spreadsheet add-in that supports “networkoverview discovery and exploration”.  The tool fits inside your existing copy of Excel in Office 2007 or 2010 and makes creating a social network map similar to the process for making a pie chart. 

Using NodeXL, users can easily make a map of public social media conversations around topics that matter to them. Maps of the connections among the people who recently said the name of a product, brand or event can reveal key positions and clusters in the crowd.  Some people who talk about a topic are more in the “center” of the graph, they may be key influential members in the population.  NodeXL makes it a simple task to sort people in a population by their network location to find key people in core or bridge positions.  NodeXL supports the exploration of social media with import features that pull data from personal email indexes on the desktop, Twitter, Flickr, YouTube, Facebook and WWW hyper-links.  The tool allows non-programmers to quickly generate useful network statistics and metrics and create visualizations of network graphs.

A book Analyzing Social Media Networks with NodeXL: Insights from a connected world is available from Morgan-Kaufmann.  The book provides an introduction to the history and core concepts of social network analysis along with a series of step-by-step instructions that illustrate the use of the key features of NodeXL.  The second half of the books is dedicated to chapters written by a number of leading social media researchers that each focus on a single social media service and the networks it contains. Chapters on Twitter, email, YouTube, flickr, Facebook, Wikis, and the World Wide Web illustrate the network data structures that are common to all social media services.

Instructions for participants
A laptop is useful but not required during the workshop.  NodeXL requires Office 2007 or 2010 running on Windows. NodeXL can be installed here:  http://nodexl.codeplex.com. Mac and Linux users can use a virtual machine to run NodeXL, see:

http://www.connectedaction.net/2010/11/16/how-to-run-nodexl-on-a-connected-mac-or-other-platform-using-amazon-ec2/

Agenda for the workshop

  • 9:00 Introduction: social media, social networks, network analysis
  • 10:00 Hands-on introduction: creating simple networks with NodeXL
  • 10:30 Break
  • 10:45 Hands-on Session 2: Importing social media from Twitter, Facebook, YouTube, flickr, email, and more
  • 12:00 Lunch break
  • 1:00 Hands-On Session 3: Applying analysis “recipes” to your data
  • 2:00 Break
  • 2:15 Building analysis recipes in NodeXL: mapping data to display attributes
  • 3:00 Analyzing social media networks: insights from connected structures
  • 3:45 Break
  • 4:00 Final session: advanced topics: automation, scheduling, time series analysis

Workshop organizer
Dr. Marc A. Smith – Director, Social Media Research Foundation

Link to websites
http://www.smrfoundation.org/
http://www.connectedaction.net
http://nodexl.codeplex.com
http://nodexlgraphgallery.org

Remarks
Prior to the day of the session, students are welcome to send the instructor data sets or query terms that can be used to created sample maps for use in the workshop.

Recommended readings or materials

Group-in-a-box Layout for Multi-faceted Analysis of Communities
Eduarda Mendes Rodrigues, Natasa Milic-Frayling, Marc Smith, Ben Shneiderman, Derek Hansen
IEEE Third International Conference on Social Computing, October 9-11, 2011.
Boston, MA
Abstract: Communities in social networks emerge from interactions among individuals and can be analyzed through a combination of clustering and graph layout algorithms. These approaches result in 2D or 3D visualizations of clustered graphs, with groups of vertices representing individuals that form a community. However, in many instances the vertices have attributes that divide individuals into distinct categories such as gender, profession, geographic location, and similar. It is often important to investigate what categories of individuals comprise each community and vice-versa, how the community structures associate the individuals from the same category. Currently, there are no effective methods for analyzing both the community structure and the category-based partitions of social graphs. We propose Group-In-a-Box (GIB), a metalayout for clustered graphs that enables multi-faceted analysis of networks. It uses the treemap space filling technique to display each graph cluster or category group within its own box, sized according to the number of vertices therein. GIB optimizes visualization of the network sub-graphs, providing a semantic substrate for category-based and cluster-based partitions of social graphs. We illustrate the application of GIB to multi-faceted analysis of real social networks and discuss desirable properties of GIB using synthetic datasets.
—————————————————————————————————-
EventGraphs: charting collections of conference connections
Hansen, D., Smith, M., Shneiderman, B.

Hawaii International Conference on System Sciences. Forty-Forth Annual Hawaii International Conference on System Sciences (HICSS). January 4-7, 2011. Kauai, Hawaii.

http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2010-13

Abstract: EventGraphs are social media network diagrams constructed from content selected by its association with time-bounded events, such as conferences. Many conferences now communicate a common “hashtag” or keyword to identify messages related to the event. EventGraphs help make sense of the collections of connections that form when people follow, reply or mention one another and a keyword. This paper defines EventGraphs, characterizes different types, and shows how the social media network analysis add-in NodeXL supports their creation and analysis. The paper also identifies the structural and conversational patterns to look for and highlight in EventGraphs and provides design ideas for their improvement.

—————————————————————————————————-

Visualizing the Signatures of Social Roles in Online Discussion Groups
Welser, Howard T., Eric Gleave, Danyel Fisher, and Marc Smith.

Journal of Social Structure, Vol 8. 2007.

http://www.cmu.edu/joss/content/articles/volume8/Welser/

Abstract: Social roles in online discussion forums can be described by patterned characteristics of communication between network members which we conceive of as ‘structural signatures.’ This paper uses visualization methods to reveal these structural signatures and regression analysis to confirm the relationship between these signatures and their associated roles in Usenet newsgroups. Our analysis focuses on distinguishing the signatures of one role from others, the role of “answer people.” Answer people are individuals whose dominant behavior is to respond to questions posed by other users. We found that answer people predominantly contribute one or a few messages to discussions initiated by others, are disproportionately tied to relative isolates, have few intense ties and have few triangles in their local networks. OLS regression shows that these signatures are strongly correlated with role behavior and, in combination, provide a strongly predictive model for identifying role behavior (R2=.72). To conclude, we consider strategies for further improving the identification of role behavior in online discussion settings and consider how the development of a taxonomy of author types could be extended to a taxonomy of newsgroups in particular and discussion systems in general.

—————————————————————————————————-

Discussion catalysts in online political discussions: Content importers and conversation starters

Himelboim, Itai, Eric Gleave, and Marc Smith. 2009

Journal of Computer-Mediated Communication, Vol. 14 (JCMC)

http://jcmc.indiana.edu/ at http://ping.fm/7NF5T

Abstract: This study addresses 3 research questions in the context of online political discussions: What is the distribution of successful topic starting practices, what characterizes the content of large thread-starting messages, and what is the source of that content? A 6-month analysis of almost 40,000 authors in 20 political Usenet newsgroups identified authors who received a disproportionate number of replies. We labeled these authors ‘‘discussion catalysts.’’ Content analysis revealed that 95 percent of discussion catalysts’ messages contained content imported from elsewhere on the web, about 2/3 from traditional news organizations. We conclude that the flow of information from the content creators to the readers and writers continues to be mediated by a few individuals who act as filters and amplifiers.

—————————————————————————————————-

Analyzing (Social Media) Networks with NodeXL
Smith, M., Shneiderman, B., Milic-Frayling, N., Rodrigues, E.M., Barash, V., Dunne, C., Capone, T., Perer, A. & Gleave, E. (2009)

C&T ’09: Proceedings of the Fourth International Conference on Communities and Technologies. Springer.

http://www.connectedaction.net/wp-content/uploads/2009/08/2009-CT-NodeXL-and-Social-Queries-a-social-media-network-analysis-toolkit.pdf

Abstract: In this paper we present NodeXL, an extendible toolkit for network data analysis and visualization, implemented as an add-in to the Microsoft Excel 2007 spreadsheet software. We demonstrate NodeXL features through analysis of a data sample drawn from an enterprise intranet social network, discussion, and wiki. Through a sequence of steps we show how NodeXL leverages and extends the broadly used spreadsheet paradigm to support common operations in network analysis. This ranges from data import to computation of network statistics and refinement of network visualization through a selection of ready-to-use sorting, filtering, and clustering functions.

————————————————————————————————–

Whither the experts: Social affordances and the cultivation of experts in community Q&A systems

SIN ’09: Proc. international symposium on Social Intelligence and Networking. IEEE Computer Society Press.

Howard Welser, Eric Gleave, Marc Smith, Vladimir Barash, Jessica Meckes.

http://www.connectedaction.net/wp-content/uploads/2009/08/2009-Social-Computing-Whither-the-Experts.pdf

Abstract: Community based Question and Answer systems have been promoted as web 2.0 solutions to the problem of finding expert knowledge. This promise depends on systems’ capacity to attract and sustain experts capable of offering high quality, factual answers. Content analysis of dedicated contributors’ messages in the Live QnA system found: (1) few contributors who focused on providing technical answers (2) a preponderance of attention paid to opinion and discussion, especially in non-technical threads. This paucity of experts raises an important general question: how do the social affordances of a site alter the ecology of roles found there? Using insights from recent research in online community, we generate a series of expectations about how social affordances are likely to alter the role ecology of online systems.

—————————————————————————————————-

First steps to NetViz Nirvana: evaluating social network analysis with NodeXL

SIN ’09: Proc. international symposium on Social Intelligence and Networking. IEEE Computer Society Press.

Bonsignore, E.M., Dunne, C., Rotman, D., Smith, M., Capone, T., Hansen, D.L. & Shneiderman, B. (2009)

http://www.cs.umd.edu/~cdunne/pubs/Bonsignore09Firststepsto.pdf

Abstract: Social Network Analysis (SNA) has evolved as a popular, standard method for modeling meaningful, often hidden structural relationships in communities. Existing SNA tools often involve extensive pre-processing or intensive programming skills that can challenge practitioners and students alike. NodeXL, an open-source template for Microsoft Excel, integrates a library of common network metrics and graph layout algorithms within the familiar spreadsheet format, offering a potentially low-barrier to-entry framework for teaching and learning SNA. We present the preliminary findings of 2 user studies of 21 graduate students who engaged in SNA using NodeXL. The majority of students, while information professionals, had little technical background or experience with SNA techniques. Six of the participants had more technical backgrounds and were chosen specifically for their experience with graph drawing and information visualization. Our primary objectives were (1) to evaluate NodeXL as an SNA tool for a broad base of users and (2) to explore methods for teaching SNA. Our complementary dual case-study format demonstrates the usability of NodeXL for a diverse set of users, and significantly, the power of a tightly integrated metrics/visualization tool to spark insight and facilitate sensemaking for students of SNA.

—————————————————————————————————-

Do You Know the Way to SNA?: A Process Model for Analyzing and Visualizing Social Media Data [44]

Hansen, D., Rotman, D., Bonsignore, E., Milic-Frayling, N., Rodrigues, E., Smith, M., Shneiderman, B. (July 2009)
University of Maryland Tech Report: HCIL-2009-17

Abstract: Voluminous online activity data from users of social media can shed light on individual behavior, social relationships, and community efficacy. However, tools and processes to analyze this data are just beginning to evolve. We studied 15 graduate students who were taught to use NodeXL to analyze social media data sets. Based on these observations, we present a process model of social network analysis (SNA) and visualization, then use it to identify stages where intervention from peers, experts, and computational aids are most useful. We offer implications for designers of SNA tools, educators, and community & organizational analysts.