<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="ru">
	<id>http://digida.mgpu.ru/index.php?action=history&amp;feed=atom&amp;title=%D0%90%D0%BD%D0%B0%D0%BB%D0%B8%D0%B7_%D1%87%D0%B0%D1%81%D1%82%D0%BE%D1%82_%D0%B2%D1%81%D1%82%D1%80%D0%B5%D1%87%D0%B0%D0%B5%D0%BC%D0%BE%D1%81%D1%82%D0%B8_%D1%81%D0%BB%D0%BE%D0%B2</id>
	<title>Анализ частот встречаемости слов - История изменений</title>
	<link rel="self" type="application/atom+xml" href="http://digida.mgpu.ru/index.php?action=history&amp;feed=atom&amp;title=%D0%90%D0%BD%D0%B0%D0%BB%D0%B8%D0%B7_%D1%87%D0%B0%D1%81%D1%82%D0%BE%D1%82_%D0%B2%D1%81%D1%82%D1%80%D0%B5%D1%87%D0%B0%D0%B5%D0%BC%D0%BE%D1%81%D1%82%D0%B8_%D1%81%D0%BB%D0%BE%D0%B2"/>
	<link rel="alternate" type="text/html" href="http://digida.mgpu.ru/index.php?title=%D0%90%D0%BD%D0%B0%D0%BB%D0%B8%D0%B7_%D1%87%D0%B0%D1%81%D1%82%D0%BE%D1%82_%D0%B2%D1%81%D1%82%D1%80%D0%B5%D1%87%D0%B0%D0%B5%D0%BC%D0%BE%D1%81%D1%82%D0%B8_%D1%81%D0%BB%D0%BE%D0%B2&amp;action=history"/>
	<updated>2026-06-08T03:29:59Z</updated>
	<subtitle>История изменений этой страницы в вики</subtitle>
	<generator>MediaWiki 1.44.0</generator>
	<entry>
		<id>http://digida.mgpu.ru/index.php?title=%D0%90%D0%BD%D0%B0%D0%BB%D0%B8%D0%B7_%D1%87%D0%B0%D1%81%D1%82%D0%BE%D1%82_%D0%B2%D1%81%D1%82%D1%80%D0%B5%D1%87%D0%B0%D0%B5%D0%BC%D0%BE%D1%81%D1%82%D0%B8_%D1%81%D0%BB%D0%BE%D0%B2&amp;diff=44395&amp;oldid=prev</id>
		<title>Patarakin: Новая страница: « Токенизация и частоты слов в литературе с tidytext Основные понятия: Токенизация (unnest_tokens), стоп-слова (stop_words, anti_join), загрузка корпусов (gutenberg_download), пропорции частот (proportion = n / sum(n)), переформатирование данных (pivot_wider/long).   &lt;syntaxhighlight lang=&quot;R&quot; line&gt; text &lt;- c(&quot;Because I could not stop fo...»</title>
		<link rel="alternate" type="text/html" href="http://digida.mgpu.ru/index.php?title=%D0%90%D0%BD%D0%B0%D0%BB%D0%B8%D0%B7_%D1%87%D0%B0%D1%81%D1%82%D0%BE%D1%82_%D0%B2%D1%81%D1%82%D1%80%D0%B5%D1%87%D0%B0%D0%B5%D0%BC%D0%BE%D1%81%D1%82%D0%B8_%D1%81%D0%BB%D0%BE%D0%B2&amp;diff=44395&amp;oldid=prev"/>
		<updated>2026-02-24T05:50:54Z</updated>

		<summary type="html">&lt;p&gt;Новая страница: « Токенизация и частоты слов в литературе с tidytext Основные понятия: Токенизация (unnest_tokens), стоп-слова (stop_words, anti_join), загрузка корпусов (gutenberg_download), пропорции частот (proportion = n / sum(n)), переформатирование данных (pivot_wider/long).   &amp;lt;syntaxhighlight lang=&amp;quot;R&amp;quot; line&amp;gt; text &amp;lt;- c(&amp;quot;Because I could not stop fo...»&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Новая страница&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;br /&gt;
Токенизация и частоты слов в литературе с tidytext&lt;br /&gt;
Основные понятия: Токенизация (unnest_tokens), стоп-слова (stop_words, anti_join), загрузка корпусов (gutenberg_download), пропорции частот (proportion = n / sum(n)), переформатирование данных (pivot_wider/long).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;R&amp;quot; line&amp;gt;&lt;br /&gt;
text &amp;lt;- c(&amp;quot;Because I could not stop for Death -&amp;quot;,&lt;br /&gt;
          &amp;quot;He kindly stopped for me -&amp;quot;,&lt;br /&gt;
          &amp;quot;The Carriage held but just Ourselves -&amp;quot;,&lt;br /&gt;
          &amp;quot;and Immortality&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
library(dplyr)&lt;br /&gt;
library(tidytext)&lt;br /&gt;
&lt;br /&gt;
text_df &amp;lt;- tibble(line = 1:4, text = text)&lt;br /&gt;
&lt;br /&gt;
text_df %&amp;gt;%&lt;br /&gt;
  unnest_tokens(word, text)&lt;br /&gt;
&lt;br /&gt;
data(stop_words)&lt;br /&gt;
&lt;br /&gt;
library(gutenbergr)&lt;br /&gt;
&lt;br /&gt;
hgwells &amp;lt;- gutenberg_download(c(35, 36, 5230, 159))&lt;br /&gt;
&lt;br /&gt;
tidy_hgwells &amp;lt;- hgwells %&amp;gt;%&lt;br /&gt;
  unnest_tokens(word, text) %&amp;gt;%&lt;br /&gt;
  anti_join(stop_words)&lt;br /&gt;
&lt;br /&gt;
tidy_hgwells %&amp;gt;%&lt;br /&gt;
  count(word, sort = TRUE)&lt;br /&gt;
&lt;br /&gt;
bronte &amp;lt;- gutenberg_download(c(1260, 768, 969, 9182, 767))&lt;br /&gt;
tidy_bronte &amp;lt;- bronte %&amp;gt;%&lt;br /&gt;
  unnest_tokens(word, text) %&amp;gt;%&lt;br /&gt;
  anti_join(stop_words)&lt;br /&gt;
&lt;br /&gt;
tidy_bronte %&amp;gt;%&lt;br /&gt;
  count(word, sort = TRUE)&lt;br /&gt;
&lt;br /&gt;
frequency &amp;lt;- bind_rows(mutate(tidy_bronte, author = &amp;quot;Brontë Sisters&amp;quot;),&lt;br /&gt;
                       mutate(tidy_hgwells, author = &amp;quot;H.G. Wells&amp;quot;))  %&amp;gt;% &lt;br /&gt;
  count(author, word) %&amp;gt;%&lt;br /&gt;
  group_by(author) %&amp;gt;%&lt;br /&gt;
  mutate(proportion = n / sum(n)) %&amp;gt;% &lt;br /&gt;
  select(-n) %&amp;gt;% &lt;br /&gt;
  pivot_wider(names_from = author, values_from = proportion) %&amp;gt;%&lt;br /&gt;
  pivot_longer(`Brontë Sisters`:`H.G. Wells`,&lt;br /&gt;
               names_to = &amp;quot;author&amp;quot;, values_to = &amp;quot;proportion&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
frequency&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
[[Категория:Lesson]]&lt;/div&gt;</summary>
		<author><name>Patarakin</name></author>
	</entry>
</feed>