ItsMods

Full Version: [C#] WebClient how to know real url [solved]
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
Hello,

I'm using a WebClient to download and parse a webpage to get all the images that are in <img.../> tags.

When the images paths are relatives (ex: 'images/hello.png'), and the url of the webpage contains url rewriting like "http://example.org/454/history-friday" i can't retrieve the real url to get the image.

If i have for example this url: "http://example.org/454/history"
and i get this image relative path: "images/hello.png"
how can i get the real path to download the image?

The real url of the page can be "http://example.org/index.php?cat_id=454&type=history" the url should be "http://example.org/images/hello.png"
or the page can be "http://example.org/454/index.php?type=history" the url should be "http://example.org/454/images/hello.png"....

I tried to use BaseAdresse property from WebClient but it still empty...

Thx for your help.
wat? real url is what your top line of text in browser shows o_O never met relative paths to something, I always get full paths
(08-24-2013, 11:58)Arteq Wrote: [ -> ]wat? real url is what your top line of text in browser shows o_O never met relative paths to something, I always get full paths

This answer is pretty retarded. Did you even read the thread?

@narkos, I have no idea. Sorry.
I don't want to get the url of the page....
I want to get the url to download the images...

With this informations:
page url: "http://example.org/454/history/september"
image path: "images/hello.png"

How can i get the url to download the image?

Thx
(08-24-2013, 12:11)narkos Wrote: [ -> ]I don't want to get the url of the page....
I want to get the url to download the images...

With this informations:
page url: "http://example.org/454/history/september"
image path: "images/hello.png"

How can i get the url to download the image?

Thx

Download a image with your browser. Then see it's address. Try to figure to addressess of images via it.
Code:
#!/usr/bin/perl

use strict;
use warnings;
use WWW::Mechanize;

my $url = 'http://creativecommons.org/image';
my $browser = WWW::Mechanize->new();

$browser->get( $url )
    or die "Unable to get $url!\n";

foreach my $img ( $browser->find_all_images() ) {
    print $img->url() . "\n";
}

results as

Code:
http://creativecommons.org/wp-content/themes/cc4/images/license-8.png
http://creativecommons.org/wp-content/themes/cc4/images/find-8.png
http://creativecommons.org/wp-content/themes/cc4/images/cc-title-8.png
/images/categories/image.png
/images/features/150illegalart.jpg
/images/features/150flickr.jpg
/images/commons/sc.png
/images/commons/cci.png
/images/commons/learn.png
http://i.creativecommons.org/l/by/3.0/88x31.png

source: http://www.webhostingtalk.com/showthread.php?t=702278
(08-24-2013, 12:07)Pozzuh Wrote: [ -> ]
(08-24-2013, 11:58)Arteq Wrote: [ -> ]wat? real url is what your top line of text in browser shows o_O never met relative paths to something, I always get full paths

This answer is pretty retarded. Did you even read the thread?

@narkos, I have no idea. Sorry.

Typical @Arteq retardness. We DO NOT NEED STUPID RETARD MODERATORS LIKE THIS ONE.

There is SVN downloader at codeplex. It shows you links to all files. It's freeware and source code released, google it "Svn downloader codeplex". Also it's C-Sharp and VB.
@Bandarigoda123 i don't understand your solution
@d0h! thx but my script already found the src of the images in the pages it scans

I need to find automatically the url of the page that is executed when i browse a link like "http://example.org/454/history/september", for example the executed script is maybe stored at http://example.org/454/showpage.php, if i found that, and i found an image with the relative path "images/hello.png", i can create the image link like this: http://example.org/454/images/hello.png

Someone understand what i mean?
Thank you!
(08-24-2013, 17:23)narkos Wrote: [ -> ]@Bandarigoda123 i don't understand your solution
@d0h! thx but my script already found the src of the images in the pages it scans

I need to find automatically the url of the page that is executed when i browse a link like "http://example.org/454/history/september", for example the executed script is maybe stored at http://example.org/454/showpage.php, if i found that, and i found an image with the relative path "images/hello.png", i can create the image link like this: http://example.org/454/images/hello.png

Someone understand what i mean?
Thank you!

http://downloadsvn.codeplex.com/

Check this out. It finds for all paths when you type address.

But if you want to access .php, i think it isn't possible.
This can be solved with splitting but I don't know if that's the best possible way

CSHARP Code
  1. public static string GetAbsoluteUrlFromRelative(string relativeUrl, string webpage)
  2. {
  3. string[] parts = webpage.Split('/');
  4. if (relativeUrl.StartsWith("/"))
  5. {
  6. string newUrl = parts[0] + "/" + "/" + parts[2] + relativeUrl;
  7. return newUrl;
  8. }
  9. else
  10. {
  11. string newUrl = "";
  12. parts.ToList().ForEach(x =>
  13. {
  14. if (x != parts[parts.Length - 1])
  15. {
  16. newUrl = newUrl + x + "/";
  17. }
  18. });
  19. newUrl = newUrl + relativeUrl;
  20. return newUrl;
  21. }
  22. }
Pages: 1 2 3