你做错了几件事:
您在尝试cookie会话之前尝试登录,但该站点要求您在发送登录请求之前进行cookie会话.
有一个CSRF令牌绑定到您的cookie会话,此处称为at
您需要从登录页面html解析并提供您的代码无法获取的登录请求.
最重要的是,有一个验证码图像绑定到您需要获取和解决的cookie会话,以及您需要附加到您的登录请求的文本,您的代码完全忽略了该请求.
您的登录请求需要标头x-requested-with: XMLHttpRequest
- 但您的代码不会添加该标头.
您的登录请求需要POST数据中的字段com=account
和t=submitLogin
字段,但您的代码不会添加其中任何一个(您尝试将它们添加到您的URL,但它们不应该在URL中,它们应该是在POST数据中,也就是你的$ postValues数组,而不是url)
这是你需要做的:
首先对登录页面执行正常的GET请求.这将为您提供会话cookie ID,CSRF令牌以及验证码图像的URL.
存储cookie id并确保为其提供所有进一步的请求,然后解析出csrf令牌(它在html中看起来像),以及验证码图像的URL(每个cookie会话的URL不同,所以不要硬编码它).
然后获取验证码图像,解决它,并将它们全部添加到您的登录请求的POST数据,以及用户名,密码,验证码答案,com
以及t
,并将http标头添加x-requested-with: XMLHttpRequest
到登录请求,发送给https://www.banggood.com/login.html
,然后你应该登录!
这是一个使用hhb_curl进行Web请求的示例实现(它是一个curl_包装器,用于处理cookie,将静态curl_错误转换为RuntimeExceptions等),DOMDocument用于解析CSRF令牌,以及deathbycaptcha.com用于打破验证码的api.
Ps:示例代码将不起作用,直到您在第6行和第7行提供真实信用的deathbycaptcha.com api用户名/密码,验证码看起来如此简单,以至于如果您有足够的动力我打破它可以自动化,我不是. - 编辑,似乎他们改进了他们的验证码,因为我写道,现在看起来非常困难.此外,banggood帐户只是一个临时测试帐户,它没有受到损害,这显然发生在我在这里发布用户名/密码)
exec ( 'https://www.banggood.com/login.html' )->getStdOut (); $domd = @DOMDocument::loadHTML ( $html ); $xp = new DOMXPath ( $domd ); $csrf_token = $xp->query ( '//input[@name="at"]' )->item ( 0 )->getAttribute ( "value" ); $captcha_image_url = 'https://www.banggood.com/' . $domd->getElementById ( "get_login_image" )->getAttribute ( "src" ); $captcha_image = $hc->exec ( $captcha_image_url )->getStdOut (); $captcha_answer = deathbycaptcha ( $captcha_image, $deathbycaptcha_username, $deathbycaptcha_password ); $html = $hc->setopt_array ( array ( CURLOPT_POST => 1, CURLOPT_POSTFIELDS => http_build_query ( array ( 'com' => 'account', 't' => 'submitlogin', 'email' => $banggood_username, 'pwd' => $banggood_password, 'at' => $csrf_token, 'login_image_code' => $captcha_answer ) ), CURLOPT_HTTPHEADER => array ( 'x-requested-with: XMLHttpRequest' ) ) )->exec ()->getStdOut (); var_dump ( // $hc->getStdErr (), $html ); function deathbycaptcha(string $imageBinary, string $apiUsername, string $apiPassword): string { $hc = new hhb_curl ( '', true ); $response = $hc->setopt_array ( array ( CURLOPT_URL => 'http://api.dbcapi.me/api/captcha', CURLOPT_POST => 1, CURLOPT_HTTPHEADER => array ( 'Accept: application/json' ), CURLOPT_POSTFIELDS => array ( 'username' => $apiUsername, 'password' => $apiPassword, 'captchafile' => 'base64:' . base64_encode ( $imageBinary ) // use base64 because CURLFile requires a file, and i cba with tmpfile() .. but it would save bandwidth. ), CURLOPT_FOLLOWLOCATION => 0 ) )->exec ()->getStdOut (); $response_code = $hc->getinfo ( CURLINFO_HTTP_CODE ); if ($response_code !== 303) { // some error $err = "DeathByCaptcha api retuned \"$response_code\", expected 303, "; switch ($response_code) { case 403 : $err .= " the api username/password was rejected"; break; case 400 : $err .= " we sent an invalid request to the api (maybe the API specs has been updated?)"; break; case 500 : $err .= " the api had an internal server error"; break; case 503 : $err .= " api is temorarily unreachable, try again later"; break; default : { $err .= " unknown error"; break; } } $err .= ' - ' . $response; throw new \RuntimeException ( $err ); } $response = json_decode ( $response, true ); if (! empty ( $response ['text'] ) && $response ['text'] !== '?') { return $response ['text']; // sometimes the answer might be available right away. } $id = $response ['captcha']; $url = 'http://api.dbcapi.me/api/captcha/' . urlencode ( $id ); while ( true ) { sleep ( 10 ); // check every 10 seconds $response = $hc->setopt ( CURLOPT_HTTPHEADER, array ( 'Accept: application/json' ) )->exec ( $url )->getStdOut (); $response = json_decode ( $response, true ); if (! empty ( $response ['text'] ) && $response ['text'] !== '?') { return $response ['text']; } } }