元素:<section> class:"ArticleHappyLinks_content__1ctN6 TripWidget_happy_links__2iJPM" data-section="happygo" data-section:"happygo" 子元素:8個<div>
圖片:<div class="lazyload-wrapper "> <img src="https://image.cdn-eztravel.com.tw/yB9Hjc5RN1RfLV8BGAMNUHejujSF1I6-sRR11taMONU/g:ce/aHR0cHM6Ly9ldmVudC5jZG4tZXp0cmF2ZWwuY29tLnR3L2hwaW1nL2hhcHB5Z28vaGFwcHlnby10cmlwMV8zeC5wbmc.png" alt="火車旅遊"> </div> 大標題:<h3 class="HappyLinks_title__3AHoG"> <a href="https://trip.eztravel.com.tw/tra-hsr/" class="ezTag" target="_blank" rel="noreferrer" title="火車旅遊" data-track-action="{"val":"text"}" data-track-label="{"attr":"href"}">火車旅遊</a> </h3> 內容<ul> <li> <a href="https://www.eztravel.com.tw/activity/formosa/express/touristtrain/?p=focus" class="ezTag" rel="noreferrer" target="_blank" title="環島之星.環島遊" data-track-action="{"val":"火車旅遊_環島之星.環島遊"}" data-track-label="{"attr":"href"}">環島之星.環島遊</a> </li> </ul> 連結:<ul> <li> <a href="https://www.eztravel.com.tw/events/cruisestrain/index.html" class="ezTag" rel="noreferrer" target="_blank" title="郵輪式列車" data-track-action="{"val":"火車旅遊_郵輪式列車"}" data-track-label="{"attr":"href"}">郵輪式列車</a> </li> </ul>
import requests #網站網址 urlPath = "https://trip.eztravel.com.tw/?_gl=1*156x4hs*_gcl_aw*R0NMLjE2MzkzOTYzNjEuQ2p3S0NBaUEtOXVOQmhCVEVpd0FOM0lsTkozMHVwRU9rR2tvelo5QmdWNm9PMXZrcTQtVHJCYmpJLVlPdC1XN0lVTjR2WVdnWDFneGNCb0NXQlVRQXZEX0J3RQ..*_ga*MTg1NzA5NzUzMi4xNjM5Mzk1Mjgw*_ga_XS4XWTQS4B*MTYzOTQ1MjAyMS4zLjEuMTYzOTQ1MzQ3NS42MA..&_ga=2.42810533.1014596521.1639395280-1857097532.1639395280&_gac=1.87327338.1639396361.CjwKCAiA-9uNBhBTEiwAN3IlNJ30upEOkGkozZ9BgV6oO1vkq4-TrBbjI-YOt-W7IUN4vYWgX1gxcBoCWBUQAvD_BwE" #下載網頁 response = requests.get(urlPath) if response.status_code == 200: print("下載完成") #取出網頁文字檔 htmlcontent = response.text else: print("下載失敗")
from bs4 import BeautifulSoup soup = BeautifulSoup(htmlcontent,'html.parser') #取得根容器 rootElement = soup.find('section',attrs={'data-section':'happygo'}) print(rootElement.name) print(rootElement['class']) print(rootElement['data-section'])
#取得標題內容 #根容器內有8個<h3> #使用find_all('h3') for h3Element in rootElement.find_all('h3'): #h3內有子元素a #子元素a有title屬性 #有些h3沒有子元素,所以要用判斷式 if h3Element.find('a'): print(h3Element.a['title']) else: print(h3Element.string)
#取得內容和連結 #根容器內有8個<ul> #ul內有不確定數量的li for ulElement in rootElement.find_all('ul'): for liElement in ulElement.find_all('li'): #li內有a元素 print(liElement.a.string) print(liElement.a['href']) print("=========================")
徐國堂 老師
巨匠電腦 - AI程式設計講師 專長: 擁有20年的教學經驗,專攻於程式語言、手機程式設計、網頁程式、物聯網程式設計;已經有眾多學生就職於相關產業。 ⏰近期開課
免費學習資源不漏接