Comment Parsing¶
A common task for many bots and scripts is to parse a submission´s comments. In this tutorial we will go over how to do that as well as talking about comments in general. To illustrate the problems, we’ll write a small script that replies to any comment that contains the text “Hello”. Our reply will contain the text ” world!”.
Submission Comments¶
As usual, we start by importing PRAW and initializing our contact with
reddit.com. We also get a Submission
object, where our script will
do its work.
>>> import praw
>>> r = praw.Reddit('Comment Scraper 1.0 by u/_Daimon_ see '
... 'https://praw.readthedocs.io/en/latest/'
... 'pages/comment_parsing.html')
>>> submission = r.get_submission(submission_id='11v36o')
After getting the Submission
object we retrieve the comments and
look through them to find those that match our criteria. Comments are stored in
the attribute comments
in a comment forest, with each tree root a
toplevel comment. E.g., the comments are organized just like when you visit the
submission via the website. To get to a lower layer, use replies
to
get the list of replies to the comment. Note that this may include
MoreComments
objects and not just Comment
.
>>> forest_comments = submission.comments
As an alternative, we can flatten the comment forest to get a unordered list
with the function praw.helpers.flatten_tree()
. This is the easiest way to
iterate through the comments and is preferable when you don’t care about
a comment’s place in the comment forest. We don’t, so this is what we are going
to use.
>>> flat_comments = praw.helpers.flatten_tree(submission.comments)
To find out whether any of those comments contains the text we are looking for, we simply iterate through the comments.
>>> for comment in flat_comments:
... if comment.body == "Hello":
... reply_world(comment)
Our program is going to make comments to a submission. If it has bugs, then it might flood a submission with replies or post gibberish. This is bad. So we test the bot in r/test before we let it loose on a “real” subreddit. As it happens, our bot as described so far contains a bug. It doesn’t test if we’ve already replied to a comment before replying. We fix this bug by storing the content_id of every comment we’ve replied to and test for membership of that list before replying. Just like in Writing a reddit Bot.
The number of comments¶
When we load a submission, the comments for the submission are also loaded, up
to a maximum, just like on the website. At reddit.com, this max is 200
comments. If we want more than the maximum number of comments, then we need
to replace the MoreComments
with the Comment
s they represent.
We use the replace_more_comments()
method to do this. Let’s use this
function to replace all MoreComments
with the Comment
s they
represent, so we get all comments in the thread.
>>> submission.replace_more_comments(limit=None, threshold=0)
>>> all_comments = submission.comments
The number of MoreComments
PRAW can replace with a single API
call is limited. Replacing all MoreComments
in a thread with many
comments will require many API calls and so take a while due to API delay between
each API call as specified in the
api guidelines.
Getting all recent comments to a subreddit or everywhere¶
We can get comments made to all subreddits by using
get_comments()
and setting the
subreddit argument to the value “all”.
>>> import praw
>>> r = praw.Reddit('Comment parser example by u/_Daimon_')
>>> all_comments = r.get_comments('all')
The results are equivalent to /r/all/comments.
We can also choose to only get the comments from a specific subreddit. This is much simpler than getting all comments made to a reddit and filtering them. It also reduces the load on the reddit.
>>> subreddit = r.get_subreddit('python')
>>> subreddit_comments = subreddit.get_comments()
The results are equivalent to r/python/comments.
You can use multi-reddits to get the comments from multiple subreddits.
>>> multi_reddits = r.get_subreddit('python+learnpython')
>>> multi_reddits_comments = multi_reddits.get_comments()
Which is equivalent to r/python+learnpython/comments.
The full program¶
import praw
r = praw.Reddit('Comment Scraper 1.0 by u/_Daimon_ see '
'https://praw.readthedocs.io/en/latest/'
'pages/comment_parsing.html')
r.login('bot_username', 'bot_password')
submission = r.get_submission(submission_id='11v36o')
flat_comments = praw.helpers.flatten_tree(submission.comments)
already_done = set()
for comment in flat_comments:
if comment.body == "Hello" and comment.id not in already_done:
comment.reply(' world!')
already_done.add(comment.id)
[deleted] comments¶
When a comment is deleted, in most cases, that comment will not be viewable with a
browser nor the API. However, if a comment is made, and then a reply to that comment
is made, and then the original comment is deleted, that comment will have its
body
and author
attributes be NoneType
via the API. The same goes with
removed comments, unless the authenticated account is a mod of the subreddit whose
comments you are getting. If you are a mod, and said comments are removed comments,
they are left intact.
If a comment is made and then the account that left that comment is deleted, the
comment body is left intact, while the author
attribute becomes NoneType
.