Part I Microanalysis // 1 R Basics 3 // 1.1 Introduction 3 // 1.2 Download and Install R 4 // 1.3 Download and Install RStudio 5 // 1.4 Download the Supporting Materials 5 // 1.5 RStudio 6 // 1.6 Let’s Get Started 7 // 1.7 Saving Commands and R Scripts 9 // 1.8 Assignment Operators 11 // 1.9 Practice 11 // References 13 // 2 First Foray into Text Analysis with R 15 // 2.1 Loading the First Text File 15 // 2.2 A Word About Warnings, Errors, Typos, and Crashes 17 // 2.3 Separate Content from Metadata 19 // 2.4 Reprocessing the Content 22 // 2.5 Beginning Some Analysis 27 // 2.6 Practice 29 // 3 Accessing and Comparing Word Frequency Data 31 // 3.1 Introduction 31 // 3.2 Start Up Code 31 // 3.3 Accessing Word Data 32 // 3.4 Recycling 34 // 3.5 Practice 35 // 4 Token Distribution and Regular Expressions 37 // 4.1 Introduction 37 // 4.2 Start Up Code 37 // 4.3 A Word About Coding Style 38 // 4.4 Dispersion Plots 38 // 4.5 Searching with grep 41 // 4.6 Practice 46 // Reference 47 // 5 Token Distribution Analysis 49 // 5.1 Cleaning the Workspace 49 // 5.2 Start Up Code 50 // 5.3 Identifying Chapter Breaks with grep 51 // 5.4 The for Loop and if Conditional 53 // 5.5 The for Loop in Eight Parts 56 // 5.6 Accessing and Processing List Items 59 // 5.7 Practice 57 // 6 Correlation 59 // 6.1 Introduction 59 // 6.2 Start Up Code 59 // 6.3 Correlation Analysis 70 // 6.4 A Word About Data Frames 73 // 6.5 Testing Correlation with Randomization 76 // 6.6 Practice 79 // 7 Measures of Lexical Variety 31 // 7.1 Lexical Variety and the Type-Token Ratio 81 // 7.2 Start Up Code 82 // 7.3 Mean Word Frequency 82 // 7.4 Extracting Word Usage Means 84 // 7.5 Ranking the Values 87 // 7.6 Calculating the TTR Inside lapply 88 // 7.7 A Further Use of Correlation 90 // 7.8 Practice 90 // Reference 91 // 8 ??? Richness 93 // 8.1 Introduction 93 // 8.2 Start Up Code 93 // 8.3 supply 94 //
8.4 An Inline Conditional Function 94 // 8.5 Practice 97 // 9 Do It KWIC 99 // 9.1 Introduction 99 // 9.2 Custom Functions 100 // 9.3 A Tokenization Function 103 // 9.4 Finding Keywords and Their Contextual Neighbors 105 // 9.5 Practice 107 // Reference 108 // 10 Do It KWIC (er) (and Better) 109 // 10.1 Getting Organized 109 // 10.2 Separating Functions for Reuse 110 // 10.3 User Interaction Ill // 10.4 readline Ill // 10.5 Building a Better KWIC Function 112 // 10.6 Fixing Some Problems 115 // 10.7 Practice 117 // Part II Metadata // 11 Introduction to dplyr 121 // 11.1 Start Up Code 121 // 11.2 Using stack to Create a Data Frame 122 // 11.3 Installing and Loading dplyr 124 // 11.4 Using mutate, filter, arrange, and select 125 // 11.5 Practice 130 // 12 Parsing TEI XML 133 // 12.1 Introduction 133 // 12.2 The Text Encoding Initiative (TEI) 134 // 12.3 Parsing XML with R Using the Xml2 Package 135 // 12.4 Accessing the Textual Content 138 // 12.5 Calculating the Word Frequencies 140 // 12.6 Practice 143 // Reference 144 // 13 Parsing and Analyzing Hamlet 145 // 13.1 Background 145 // 13.2 Collecting the Speakers 146 // 13.3 Collecting the Speeches 148 // 13.4 A Better Pairing 151 // 13.5 Practice 157 // Reference 157 // 14 Sentiment Analysis 159 // 14.1 A Brief Overview 159 // 14.2 Loading syuzhet 160 // 14.3 Loading a Text 160 // 14.4 Getting Sentiment Values 161 // 14.5 Accessing Sentiment 162 // 14.6 Plotting 164 // 14.7 Smoothing 166 // 14.8 Computing Plot Similarity 169 // 14.9 Practice 173 // References 174 // Part III Macroanalysis // 15 Clustering 177 // 15.1 Introduction 177 // 15.2 Corpus Ingestion 177 // 15.3 Custom Functions 181 // 15.4 Unsupervised Clustering and the Euclidean Metric 184 // 15.5 Converting an R List into a Data Matrix 187 // 15.6 Reshaping from Long to Wide Format 188 // 15.7 Preparing Data for Clustering 189 //
15.8 Clustering the Data 192 // 15.9 Practice 193 // Reference 194 // 16 Classification 195 // 16.1 Introduction 195 // 16.2 A Small Authorship Experiment 196 // 16.3 Text Segmentation 196 // 16.4 Reshaping from Long to Wide Format 202 // 16.5 Mapping the Data to the Metadata 203 // 16.6 Reducing the Feature Set 205 // 16.7 Performing the Classification with SVM 206 // 16.8 Practice 209 // Reference 210 // 17 Topic Modeling 211 // 17.1 Introduction 211 // 17.2 R and Topic Modeling 212 // 17.3 Text Segmentation and Preparation 212 // 17.4 The R Mallet Package 219 // 17.5 Simple Topic Modeling with a Standard Stop List 220 // 17.6 Unpacking the Model 225 // 17.7 Topic Visualization 229 // 17.8 Topic Coherence and Topic Probability 230 // 17.9 Practice 235 // References 235 // 18 Part of Speech Tagging and Named Entity Recognition . . . 237 // 18.1 Pre-processing Text with a Part-of-Speech Tagger 237 // 18.2 Saving and Loading .Rdata Files 242 // 18.3 Topic Modeling the Noun Data 242 // 18.4 Named Entity Recognition 243 // 18.5 Practice 245 // Appendix A: Variable Scope Example 247 // Appendix B: The LDA Buffet 249 // Appendix C: Practice Exercise Solutions 253 // C.l Solutions for Chap. 1 253 // C.2 Solutions for Chap. 2 253 // C.3 Solutions for Chap. 3 254 // C.4 Solutions for Chap. 4 256 // C.5 Solutions for Chap. 5 257 // C.6 Solutions for Chap. 6 257 // C.7 Solutions for Chap. 7 258 // C.8 Solutions for Chap. 8 259 // C.9 Solutions for Chap. 9 261 // C.10 Solutions for Chap. 10 262 // C.ll SolutionsforChap.il 267 // C.12 Solutions for Chap. 12 268 // C.13 Solutions for Chap. 13 269 // C.14 Solutions for Chap. 14 270 // C.15 Solutions for Chap. 15 271 // C.16 Solutions for Chap. 16 272 // C.17 Solutions for Chap. 17 273 // C.18 Solutions for Chap. 18 274 // Index 275