You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

68 lines
2.0 KiB

  1. # Tips and tricks
  2. ## Handle Non-UTF8 html Pages
  3. The `go.net/html` package used by `goquery` requires that the html document is UTF-8 encoded. When you know the encoding of the html page is not UTF-8, you can use the `iconv` package to convert it to UTF-8 (there are various implementation of the `iconv` API, see [godoc.org][iconv] for other options):
  4. ```
  5. $ go get -u github.com/djimenez/iconv-go
  6. ```
  7. and then:
  8. ```
  9. // Load the URL
  10. res, err := http.Get(url)
  11. if err != nil {
  12. // handle error
  13. }
  14. defer res.Body.Close()
  15. // Convert the designated charset HTML to utf-8 encoded HTML.
  16. // `charset` being one of the charsets known by the iconv package.
  17. utfBody, err := iconv.NewReader(res.Body, charset, "utf-8")
  18. if err != nil {
  19. // handler error
  20. }
  21. // use utfBody using goquery
  22. doc, err := goquery.NewDocumentFromReader(utfBody)
  23. if err != nil {
  24. // handler error
  25. }
  26. // use doc...
  27. ```
  28. Thanks to github user @YuheiNakasaka.
  29. Actually, the official go.text repository covers this use case too, see its [godoc page][text] for the details.
  30. ## Handle Javascript-based Pages
  31. `goquery` is great to handle normal html pages, but when most of the page is build dynamically using javascript, there's not much it can do. There are various options when faced with this problem:
  32. * Use a headless browser such as [webloop][].
  33. * Use a Go javascript parser package, such as [otto][].
  34. You can find a code example using `otto` [in this gist][exotto]. Thanks to github user @cryptix.
  35. ## For Loop
  36. If all you need is a normal `for` loop over all nodes in the current selection, where `Map/Each`-style iteration is not necessary, you can use the following:
  37. ```
  38. sel := Doc().Find(".selector")
  39. for i := range sel.Nodes {
  40. single := sel.Eq(i)
  41. // use `single` as a selection of 1 node
  42. }
  43. ```
  44. Thanks to github user @jmoiron.
  45. [webloop]: https://github.com/sourcegraph/webloop
  46. [otto]: https://github.com/robertkrimen/otto
  47. [exotto]: https://gist.github.com/cryptix/87127f76a94183747b53
  48. [iconv]: http://godoc.org/?q=iconv
  49. [text]: https://godoc.org/golang.org/x/text/encoding