So I mentioned earlier, it's this kind of thing that can really make
you want to tear your hair out. As you can see, I've been in
lots of these situations. Okay, so we're facing a challenging situation, in that
we've got to do some scraping from a site that's really making us work
for it. Let's talk about a best practice for screen scraping. So the
first thing we want to do here is as I mentioned is, look
at how the browser itself makes requests. And with the developer tools, we
have quite a few ways in which we can do that. We can also
look at the wire traffic if we really needed to
using something like wire sharp. Then we want to emulate that
in our code. Now if everything blows up, we're going to
need to take a look at our HTTP traffic in some
way. And then just return to one and work through
this again until we get it right. If we follow this
process, we have a pretty effective strategy, for dealing with any
problems we have making requests from sites we'd like to scrape.
لقد ذكرت سابقًا، إن هذا الشيء من نوع يمكن أن يجعلك
تشتاط غضبًا بالفعل. وكما ترى، لقد واجهت
الكثير من تلك المواقف. حسنًا، نحن بصدد موقف صعب، وفي هذا الموقف
تعيّن علينا القيام ببعض أعمال التعديل من موقع يجعلنا نعمل من أجله
بالفعل. فلنتحدث عن أفضل طريقة لتعديل الشاشة. أول شيء
نريد القيام به هنا هو كما ذكرت النظر إلى
كيفية إجراء المتصفح نفسه للطلبات. وباستخدام أدوات المطور، يتوفر
لدينا عدد قليل من الطرق يمكننا تنفيذ ذلك بواسطتها. كما يمكننا
إلقاء نظرة على حركة المرور السلكية إذا احتجنا بالفعل إلى
استخدام شيء مثل الأسلاك الحادة. ثم نريد محاكاة هذا
في تعليماتنا البرمجية. والآن إذا تطورت الأمور، فسنحتاج
إلى إلقاء نظرة على حركة مرور HTTP بطريقة
ما. ثم ما عليك سوى العودة إلى إحدى حركات المرور والعمل من خلالها
مرة أخرى حتى تعديلها. إذا اتبعنا هذه العملية
فسنحصل على إستراتيجية فعالة تمامًا، للتعامل مع أي ،
.مشكلات تتعلق بإجراء طلبات من موقع نريد تعديلها
So I mentioned earlier, it's this kind of thing that can really make
you want to tear your hair out. As you can see, I've been in
lots of these situations. Okay, so we're facing a challenging situation, in that
we've got to do some scraping from a site that's really making us work
for it. Let's talk about a best practice for screen scraping. So the
first thing we want to do here is as I mentioned is, look
at how the browser itself makes requests. And with the developer tools, we
have quite a few ways in which we can do that. We can also
look at the wire traffic if we really needed to
using something like Wireshark. Then we want to emulate that
in our code. Now if everything blows up, we're going to
need to take a look at our HTTP traffic in some
way. And then just return to one and work through
this again until we get it right. If we follow this
process, we have a pretty effective strategy, for dealing with any
problems we have making requests from sites we'd like to scrape.
Como mencionei, esse é o tipo de coisa que realmente
faz você se desesperar. Como você pode ver, eu passei
por várias dessas situações. Estamos diante de uma situação difícil, na qual
tivemos capturar um local que está nos dando
trabalho. Vamos falar sobre a melhor prática para captura de tela. Como disse, a
primeira coisa que queremos fazer aqui é verificar
como o próprio navegador faz as solicitações. E com as ferramentas do desenvolvedor,
temos algumas maneiras para fazer isso. Também podemos verificar
o tráfego de conexões, se realmente for necessário
usar algo como gráfico de conexões. Depois, queremos emular isso
no nosso código. Agora se tudo piorar, teremos
a necessidade de dar uma olhada em nosso tráfego HTTP de alguma
forma. E depois basta voltar para o item um e repetir
isso novamente, até ficar tudo certo. Se seguirmos este processo,
teremos uma estratégia efetiva para lidar com quaisquer
problemas ao fazer solicitações de locais que queremos capturar.
我之前提到过 就是这种事情能让你
恨不得把头发拔下来 如你所见
我已经遇到过数次这种情况了 现在我们面临一种极具挑战性的情况
我们得从一个网站抓取数据
让我来将一种屏幕抓取的最佳实践吧
正如我所提到的 我想要做的第一件事是
观察浏览器自身是怎么发出请求的 利用开发者工具
我们有多种方法来实现
如果确实需要的话 我们也可以观察 wire traffic
采用某种类似 Wireshark 的东西 然后我们想要在代码中来模拟它
现在 如果这些都不奏效 我们就需要
以某种方式观察 HTTP
然后回到其中一个来继续完成这个
直到得到正确答案 如果我们遵循这个过程
就有了一套很有效的策略
来处理我们从想要抓取的网站请求的问题