我有一个问题从URL www.example.com/example.pdf
通过代理下载文件并将其保存在java中的文件系统上.有没有人对这如何运作有所了解?如果我得到InputStream,我可以简单地将它保存到文件系统:
final ReadableByteChannel rbc = Channels.newChannel(httpUrlConnetion.getInputStream()); final FileOutputStream fos = new FileOutputStream(file); fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE); fos.close();
但如何通过代理获取网址的输入流?如果我这样做:
SocketAddress addr = new InetSocketAddress("my.proxy.com", 8080); Proxy proxy = new Proxy(Proxy.Type.HTTP, addr); URL url = new URL("http://my.real.url.com/"); URLConnection conn = url.openConnection(proxy);
我得到这个例外:
java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source) at sun.net.www.http.HttpClient.parseHTTP(Unknown Source) at sun.net.www.http.HttpClient.parseHTTP(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) at app.model.mail.crawler.newimpl.FileLoader.getSourceOfSiteViaProxy(FileLoader.java:167) at app.model.mail.crawler.newimpl.FileLoader.process(FileLoader.java:220) at app.model.mail.crawler.newimpl.FileLoader.run(FileLoader.java:57) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
使用这个:
final HttpURLConnection httpUrlConnetion = (HttpURLConnection) website.openConnection(proxy); httpUrlConnetion.setDoOutput(true); httpUrlConnetion.setDoInput(true); httpUrlConnetion.setRequestProperty("Content-type", "text/xml"); httpUrlConnetion.setRequestProperty("Accept", "text/xml, application/xml"); httpUrlConnetion.setRequestMethod("POST"); httpUrlConnetion.connect();
我能够下载一个网站的源代码是html,但不是一个文件,也许有人可以帮我处理我必须设置下载文件的属性.
可以使用Apache httpclient库来解决代理的大部分问题.要编译下面的代码,您可以使用以下maven:
Maven的:
4.0.0 stackoverflow.test proxyhttp 0.0.1-SNAPSHOT jar proxy http://maven.apache.org UTF-8 junit junit 3.8.1 test org.apache.httpcomponents httpclient 4.5.1
Java代码:
import org.apache.http.HttpHost; import org.apache.http.client.config.RequestConfig; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; /** * How to send a request via proxy. * * @since 4.0 */ public class ClientExecuteProxy { public static void main(String[] args)throws Exception { CloseableHttpClient httpclient = HttpClients.createDefault(); try { HttpHost target = new HttpHost("www.google.com", 80, "http"); HttpHost proxy = new HttpHost("127.0.0.1", 8889, "http"); RequestConfig config = RequestConfig.custom() .setProxy(proxy) .build(); HttpGet request = new HttpGet("/"); request.setConfig(config); System.out.println("Executing request " + request.getRequestLine() + " to " + target + " via " + proxy); CloseableHttpResponse response = httpclient.execute(target, request); try { System.out.println("----------------------------------------"); System.out.println(response.getStatusLine()); System.out.println(EntityUtils.toString(response.getEntity())); } finally { response.close(); } } finally { httpclient.close(); } } }
以下内容与其他答案不同,对我有用:在连接前设置以下属性:
System.getProperties().put("http.proxySet", "true"); System.getProperties().put("http.proxyHost", "my.proxy.com"); System.getProperties().put("http.proxyPort", "8080"); //port is String, not int
然后,打开URLConnection并尝试下载文件。