| View previous topic :: View next topic |
| Author |
Message |
Elfriede Halfop
Joined: 07 Aug 2007 Posts: 67
|
Posted: Wed Jan 26, 2011 10:14 am Post subject: Parsing from webcontent |
|
|
Hi @ all
I'd like to parse ->
| Code: | | <a href="/genre/Action">Action</a> <span>|</span> <a href="/genre/Adventure">Adventure</a> <span>|</span> <a href="/genre/Comedy">Comedy</a> |
the genres, like Action/Adventure/Comedy. What i have
| Code: |
foreach line [split $content \n] {
if {[regexp -nocase {<a\shref="\/genre\/(.*)">(.*)<\/a>} $line match genre1 genre2 genre3]} {
|
and i knwo, that is bad. Next problem: Im not knowing at the beginning how many genres ill have to parse. it can be just one or up to 4 or somethg like that.
Thank you ! |
|
| Back to top |
|
 |
Trixar_za Op

Joined: 18 Nov 2009 Posts: 143 Location: South Africa
|
Posted: Wed Jan 26, 2011 12:15 pm Post subject: |
|
|
Could you post the link to the website? Might be easier to see how it handles different kinds of input and how it changes the code.
foreach is a good start btw, but why don't you add each match to the end of a single variable like set real_var "$real_var|$match" - this way you don't need to make the regex match do all the work and you get a near unlimited amount of genres you could add so long as they match the regex. They'll all end up looking like Adventure|Action|Fantasy with the above example. _________________ http://www.trixarian.net/Projects |
|
| Back to top |
|
 |
Elfriede Halfop
Joined: 07 Aug 2007 Posts: 67
|
Posted: Wed Jan 26, 2011 12:39 pm Post subject: |
|
|
http://www.imdb.com/title/tt0942385/
| Quote: | | set real_var "$real_var|$match" |
Sounds good Im excited to see how that proc part will look like |
|
| Back to top |
|
 |
|