-
[Python] BeautifulSoup을 이용해서 Table 크롤링하기Python 2021. 11. 6. 03:12
import requests from bs4 import BeautifulSoup url = "https://www.iban.com/currency-codes" iban_result = requests.get(url) iban_soup = BeautifulSoup(iban_result.text, 'html.parser') table = iban_soup.table trs = table.find_all('tr') currency_list=[] for idx, tr in enumerate(trs): if idx > 0: tds = tr.find_all('td') country = tds[0].text.strip() currency = tds[1].text.strip() code = tds[2].text.strip() number = tds[3].text.strip() currency_list.append({ "idx":idx, "country": country, "currency": currency, })
Beautiful Soup 내장함수
.find()
: 매개변수에다 긁어오고 싶은 태그 이름을 넣어주면 하나만 반환해준다soup.find('title') # <title>The Dormouse's story</title>
.find_all()
: 갖고올 태그 이름을 매개변수로 넣어준다. 그러면 파싱된 doc 내에서 해당하는 모든 태그를 긁어오며 list로 반환한다.soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
.text
.get_text()
: find 나 find_all로 가지고온 태그를 문자로 변환해준다print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ...
Python 내장함수
str.capitalize()
: 앞에 글짜만 대문자로 바꿔준다name = 'insub' name = name.caplitalize() print(name) => Insub
Beutiful Soup Doc
https://www.crummy.com/software/BeautifulSoup/bs4/doc/Beautiful Soup Documentation — Beautiful Soup 4.9.0 documentation
Non-pretty printing If you just want a string, with no fancy formatting, you can call str() on a BeautifulSoup object (unicode() in Python 2), or on a Tag within it: str(soup) # ' I linked to example.com ' str(soup.a) # ' I linked to example.com ' The str(
www.crummy.com
Requests Doc
https://docs.python-requests.org/en/latest/Requests: HTTP for Humans™ — Requests 2.26.0 documentation
Requests: HTTP for Humans™ Release v2.26.0. (Installation) Requests is an elegant and simple HTTP library for Python, built for human beings. Behold, the power of Requests: >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass')) >>> r.
docs.python-requests.org
'Python' 카테고리의 다른 글
[Python] black : Code Formatter (0) 2021.11.09 [Python] babel을 이용해서 통화기호와 comma 붙여주기 (0) 2021.11.06 [Python] os, sys 사용 해서 프로그램 파일 재 실행 (0) 2021.11.04 [Python] 원하는 파이썬 버젼에 패키지 다운받기 (0) 2021.11.04 [Python] 함수의 매개변수의 수를 자유롭게 받아보자 *args (0) 2021.11.04