Reddit facebook app wrapper4/7/2023 ![]() ![]() In order to derive a dataset for both content-based similarity as well as hybrid recommendation, we decided to treat a post including all of its comments as one document and the original author as well as all commenters on that post as users that interacted with this document.īy extracting the AskScience posts for the years 2013 to 2016 we managed to assemble a dataset of 102,962 documents with 226,084 users and 717,425 interactions. This data can then be sent to the LIP API for analysing document similarities. for submission in submissions: submission_text = lftext submission_title = submission.title submission_date = submission.created_utc # etc. This gives a list of all submissions between start_day and end_day from which information can be extracted like this: query = 'timestamp:%d.%d' % (start_day.timestamp(), end_day.timestamp()) submissions = list(arch(query, sort='new', limit=1000, syntax='cloudsearch') Following this discussion on redditdev, we used the search function of the Subreddit object and iterated over individual days: Therefore, to get all data for a specific period of time, it is easiest to iterate over short periods that will most likely have less results than this limit. One restriction of the API is that it only returns a limited number of results, usually up to 1000. See the PRAW documentation for more details on how to use the Subreddit object. # Get titles of submissions between two UNIX timestamps for submission in subreddit.submissions(start=1478592000, end=1478678400): print(submission.title) # Get authors of 25 most recent comments for comment in ments(limit=25): print(thor) This can be used to extract for example submissions or comments from AskScience with the option to specify various search criteria: subreddit = reddit.subreddit('AskScience') To work with a specific subreddit, in our case AskScience, a Subreddit object is generated: ![]() This returns an object for interacting with the different types of information on Reddit, such as subreddits, submissions, comments, etc. import praw reddit = praw.Reddit(client_id=YOUR_CLIENT_ID, client_secret=YOUR_CLIENT_SECRET, password=YOUR_ACCOUNT_PASSWORD, username=YOUR_ACCOUNT_USERNAME, user_agent=NAME_OF_YOUR_APP) ![]() Once you've set up a user account on Reddit and registered for a developer ID and secret, connecting to the API via PRAW is very easy: A particularly nice library for using the API from within Python is PRAW, the Python Reddit API Wrapper. Reddit provides an API for accessing its data. We wanted to take this data as an opportunity to develop a workflow for deriving datasets from Reddit and to analyse them using the Lateral Intelligence Platform. A particularly interesting subreddit is AskScience, where questions from various scientific disciplines are discussed in a very informed manner, giving a lot of references and different angles on a topic. One special kind of subreddits are the "Ask" forums where questions are posed and answered among subscribers. Reddit is a popular social news aggregator and discussion site with hundreds of thousands of subreddits devoted to every topic one can imagine. ![]()
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |