Reddit Analysis
Reddit is a social news website and forum where content is socially curated and promoted by site members through voting. The site name is a play on the words "I read it". It combines web content, social news, a forum, and a social network into one giant beast of a platform. The site is composed of hundreds of subcommunities, known as subreddits. Each subreddit has a specific topic, such as technology, politics or music. Reddit's homepage, or the front page, as it is often called, is composed of the most popular posts from each default subreddit. The default list is predetermined and includes subreddits such as "pics," "funny," "videos," "news" and "gaming." Reddit site members, also known as redditors, submit content which is then voted upon by other members. The goal is to send well-regarded content to the top of the site's front page. Content is voted on via upvotes and downvotes: arrows on which users click to the left of a post. The more upvotes a post gets, the more popular it becomes, and the higher up it appears on its respective subreddit or the front page.
For the purpose of this project, the analysis is performed on a subreddit : Public Freakout. Public Freakout Videos are recorded footage of private individuals exhibiting extremely emotional or bizarre behaviour in public, typically featuring loud arguments, mental breakdowns, rants, or intoxicated rambling. Public Freakout is a subreddit dedicated to people freaking out, melting down, losing their cool, or being weird in public. The tagline of each post is a descriptive text regarding each post, and these taglines can be extracted to understand and work on this subreddit. People love consuming daily content, be it hilarious or silly or be it some informative or serious issue around the world. The users of this subreddit might use it to stay updated or scroll through some funny videos.
The goal of the portfolio is to provide answers to the following questions: 1. Business Question : Should Authors write long comments to increase score.
Technical Proposal: Use EDA to display the length of the post (number of characters) and compare it with the average score of comments for that length.
2. Business Question : Top 10 most frequent authors.
Technical Proposal: Use EDA to display the top 10 most frequent authors
3. Business Question : Are the top frequent users controversial?
Technical Proposal: Use EDA to display the mean controversiality of the top 10 authors.
4. Business Question : Is there a relationship between score and total awards for the top most frequent user.
Technical Proposal: Use EDA to plot the score of the posts for each number of awards for the most active user.
5. Business Question : Find the hottest comment time of the day.
Technical Proposal: Use EDA to visualize the number of comments for each hour of the day.
6. Business Question : Find the number of comments for every month of the timeframe.
Technical Proposal: Use EDA to visualize the number of comments for each month of the timeframe.
7. Business Question : Find the number of comments for each sentiment.
Technical Proposal: Use NLP to extract sentiment of post and then visualize the plot through barplot.
8. Business Question :Find the number of comments for every month of the timeframe for each sentiment.
Technical Proposal: Use NLP to extract sentiment of post and then visualize the plot through lineplot.
9. Business Question : Determine the score the comment will receive.
Technical Proposal: Use machine learning to create a model that can predict how many score the comment will receive.
10. Business Question : Determine the controversiality of the comment.
Technical Proposal: Use machine learning to create a model that can predict the controversiality of the comment.
Reference : TechTarget