Python sad/happy face machine learning (Get rid of text) -


i use little advice on how make loop / if statement, can rid of unnecessary text in file.

i got txt file, large 153mb. know how open in python, still not best taking stuff (text don't need) out of it.

i posted example of txt file u can see under here:

@xirwinshemmo follow :) hii... if u want make new friend add me on facebook! :) xx      https:\/\/t.co\/rcyfvrmddg wanna if ever feel lonely or sad or bored, come , talk me. i'm   free anytime :) hope not spy someone. hope real on neautral side. because   trust. :-) @dessdim @bureemi not maybe :) \u201c@emilykathryn_17: funny how want , pray when want    same thing god wants.  :) #newheart #newdesires\u201d @philkomarny thank :) can follow me on twitter can dm you? rt @emrekavcoglu: @usher dj got fallin in love , yeah earth number 1 m\u00fcsic    listen thank king :-) @ 

what want rid of @ + names, first one:

@xirwinshemmo  

and have text "thanks follow :)"

there links can't use like:

https:\/\/t.co\/rcyfvrmddg 

also want remove this.

hope can maybe bit.

first, i'm going assume reading file line line. can first split each line individual words (strings):

for line in infile:     words = line.split() # splits long string array of single words 

then, loop on these words (still part of above loop)

i = 0 in xrange(len(words)):     if words[i].startswith('@'):         print words[i+1:len(words)] 

this code print words come after user name (@abc).

to remove http links, can use if statement

if not words[i].startswith('http'): 

Comments

Popular posts from this blog

javascript - Count length of each class -

What design pattern is this code in Javascript? -

hadoop - Restrict secondarynamenode to be installed and run on any other node in the cluster -