Univerzitní knihovna OU - Úplné zobrazení záznamu

	.
	0 (hodnocen0 x )
	BK
	Silge, Julia, 1978-@orcid@ntk2018996968@/orcid@
	Text mining with R : a tidy approach / Julia Silge and David Robinson
	First edition
	Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo : O’Reilly, 2017
	xii, 178 stran : ilustrace ; 24 cm


	ISBN 978-1-4919-8165-8 (brožováno)
	Robinson, David@orcid@ntk2018996990@/orcid@ (autor)
	Obsahuje bibliografii na stranách 173-174 a rejstřík
	zpracování přirozeného jazyka
	R (software)
	statistický software
	počítačová lingvistika
	dotazování v přirozeném jazyce
	* kolektivní monografie
	004.8 - Umělá inteligence
	004.4
	004.8
	81`32
	001652733
	Preface vii // 1. The Tidy Text Format 1 // Contrasting Tidy Text with Other Data Structures 2 // The unnest_tokens Function 2 // Tidying the Works of Jane Austen 4 // The gutenbergr Package 7 // Word Frequencies 8 // Summary 12 // 2. Sentiment Analysis with Tidy Data 13 // The sentiments Dataset 14 // Sentiment Analysis with Inner Join 16 // Comparing the Three Sentiment Dictionaries 19 // Most Common Positive and Negative Words 22 // Wordclouds 25 // Looking at Units Beyond Just Words 27 // Summary 29 // 3. Analyzing Word and Document Frequency: tf-idf 31 // Term Frequency in Jane Austens Novels 32 // Zipfs Law 34 // The bind_tf_idf Function 37 // A Corpus of Physics Texts 40 // Summary 44 // 4. Relationships Between Words: N-grams and Correlations 45 // Tokenizing by N-gram 45 // Hi // Counting and Filtering N-grams 46 // Analyzing Bigrams 48 // Using Bigrams to Provide Context in Sentiment Analysis 51 // Visualizing a Network of Bigrams with ggraph 54 // Visualizing Bigrams in Other Texts 59 // Counting and Correlating Pairs of Words with the widyr Package 61 // Counting and Correlating Among Sections 62 // Examining Pairwise Correlation 63 // Summary 67 // 5. Converting to and from Nontidy Formats 69 // Tidying a Document-Term Matrix 70 // Tidying DocumentTermMatrix Objects 71 // Tidying dfm Objects 74 // Casting Tidy Text Data into a Matrix 77 // Tidying Corpus Objects with Metadata 79 // Example: Mining Financial Articles 81 // Summary 87 //
	6. Topic Modeling 89 // Latent Dirichlet Allocation 90 // Word-Topic Probabilities 91 // Document-Topic Probabilities 95 // Example: The Great Library Fleist 96 // LDA on Chapters 97 // Per- Document Classification 100 // By-Word Assignments: augment 103 // Alternative LDA Implementations 107 // Summary 108 // 7. Case Study: Comparing Twitter Archives 109 // Getting the Data and Distribution of Tweets 109 // Word Frequencies 110 // Comparing Word Usage 114 // Changes in Word Use 116 // Favorites and Retweets 120 // Summary 124 // 8. Case Study: Mining NASA Metadata 125 // How Data Is Organized at NASA 126 // Wrangling and Tidying the Data 126 // Some Initial Simple Exploration 129 // iv I Table of Contents // Word Co-ocurrences and Correlations 130 // Networks of Description and Title Words 131 // Networks of Keywords \\ 34 // Calculating tf-idf for the Description Fields I37 // What Is tf-idf for the Description Field Words? I37 // Connecting Description Fields to Keywords 138 // Topic Modeling 140 // Casting to a Document-Term Matrix 140 // Ready for Topic Modeling 141 // Interpreting the Topic Model 142 // Connecting Topic Modeling with Keywords I49 // Summary 152 // 9. Case Study: Analyzing Usenet Text I53 // Preprocessing I53 // Preprocessing Text 155 // Words in Newsgroups 155 // Finding tf-idf Within Newsgroups I57 // Topic Modeling 150 // Sentiment Analysis 153 // Sentiment Analysis by Word 154 // Sentiment Analysis by Message I57 // N-gram Analysis 159 // Summary 171 // Bibliography // Index 175

Anotace

Annotation