BIG DATA ANALYTICS OF THE COVID-19 IMPACT ON ONLINE LODGING REVIEWS
We used a lodging platform’s big data to investigate the factors that were associated with consumers’ reviews during the pandemic. A univariate analysis and fixed effects model analysis were conducted on an unbalanced panel dataset of 84,915 listings in Los Angeles and 16,143 listings in San Francisco over 21 months. Results showed that review ratings were higher for more COVID-19 cases, super host, and instantly bookable; however, negative associations were found with price, number of reviews, and host listing numbers. The profound implications brought by COVID-19 required online booking platforms to seek ways to improve customer satisfaction to survive. The key was to understand the new needs and requirements for travelers during COVID-19.
The coronavirus-19 (COVID-19) pandemic has crippled global business over the past two years. Tourism and hospitality industries were hit particularly hard, compared to other sectors (Krishnan et al., 2020). The lockdown and economic downturn due to COVID-19 took a heavy toll and caused profound implications. With the increasing number of vaccinations and decreasing restrictions on traveling, tourism and hospitality have begun to recover, but they are not nearly comparable to their prepandemic levels (Richter, 2021). It was estimated that tourism and hospitality might not fully recover until early 2023 (Valinsky, 2020; Adams, 2021) and that the loss from the pandemic could reach $4 trillion (Vanzetti & Peters, 2021). Both hotels and online booking platforms were greatly affected. Approximately 25% of hotels were facing possible foreclosure (Backman, 2020). Expedia’s revenue decreased 57% in 2020 (Soper, 2021), and Airbnb’s revenue decreased by $3.9 billion in the last quarter of 2020 (Koenig, 2021). The following reasons contributed to the loss of revenue for Airbnb. First, the overall occupancy rate decreased because customers had concerns about stays during the pandemic. Second, in response to the shrinking demand of customers, hosts had to lower listing prices (Lane, 2020). The profound implications brought by COVID-19 required online booking platforms to seek ways to improve customer satisfaction to survive. The key was to understand the new needs and requirements for travelers during COVID-19. In this study, we aimed to explore the impact of COVID-19 outbreaks on consumers’ review behaviors on the platform. How would COVID-19 cases affect customers’ review ratings? How would essential characteristics about the listings (e.g., price, instant booking option, etc.) and the hosts (e.g., super host, number of listings, etc.) affect consumers’ review ratings during the pandemic? What could hosts learn from past stays during the pandemic? To answer these questions, we used the public Airbnb Los Angeles and San Francisco listing data for 21 months (February 2020 to October 2021), combined with the COVID-19 cases reported by California Health and Human Services (CHHS). Empirical results showed that the number of COVID-19 cases was positively associated with the review ratings. Additionally, review ratings tended to be higher if the host was a super host and the listing provided the instant bookable option. However, we found that the review ratings were negatively affected by the price, the number of reviews, and the host listing numbers. Our results showed that public sentiment towards a listing is closely linked to the pandemic situation. Given the positive association between COVID-19 cases and review ratings, it is crucial to prioritize health and safety protocols, implement rigorous cleaning procedures, and ensure proper ventilation in the listings. These findings provide valuable insights for practitioners in understanding the dynamics that shape guest satisfaction in uncertain times. By considering these factors, the platforms and the hosts can better manage and enhance guest experiences, leading to improved review ratings and, ultimately, increased success in the industry.
DATA AND METHODOLOGY
Data
Two public datasets were used in this study: (a) Airbnb Los Angeles and San Francisco listing data, scraped monthly from November 2020 to October 2021 (Inside Airbnb, 2021); (b) daily COVID-19 cases data by city and county, reported on the California Health and Human Services (CHHS) Open Data Portal from February 1, 2020, to October 6, 2021 (CHHS, 2021). The first data set was scraped monthly from November 2020 to October 2021, but it contained Airbnb review data in Los Angeles and San Francisco from February 2020 to October 2021, including the review ratings, time for the latest review, prices, etc. Since the data were scraped monthly, if a listing had no new reviews during the month, the corresponding record became redundant in the dataset. We deleted these redundant records as well as records with incomplete information on important variables, such as review ratings and super hosts. From the second dataset, we extracted the daily COVID-19 cases in Los Angeles and San Francisco. Based upon the last review date in the first data set, we aggregated the total number of COVID-19 cases for the past 14 days to measure the severity of COVID-19 during the customers’ stays. This is because consumers have at most 14 days to post reviews after checking out, according to Airbnb’s policy. Lastly, because the daily COVID-19 cases were reported from February 1, 2020, we deleted the records whose latest review dates were earlier than February 15, 2020, to ensure that each record had the information of the total COVID-19 cases over the past 14 days. Finally, the integrated data set contained 29,708 different listing IDs with 101,058 different review records.
Methodology
First, we conducted a descriptive analysis and drew statistical plots to show the basic information associated with review ratings, price, and the number of reviews during the time window (February 2020 to October 2021). The descriptive analysis showed the trend of review ratings, which took a value between 1 and 5, with a higher value indicating a higher consumer perceived quality of a listing. In the empirical model, the dependent variable was consumer review ratings, and the primary independent variable was the total number of COVID-19 cases over the past 14 days, based on the review dates, which measured the severity of the pandemic during the customers’ stays. We also investigated some important characteristics of the listings and the hosts, including the room prices, whether the host was a super host, whether the host identity (ID) had been verified, whether instantly bookable was offered, host listing numbers, the number of reviews, and the location. First, we conducted a univariate analysis (analysis of variance) for the COVID-19 cases and each listing characteristic. Second, we used the Hausman test to choose between the fixed effects and random effects model. Based on the result, we fit a fixed effects model to the unbalanced panel data with 29,708 different listing IDs and 101,058 different review records. All of the analyses were performed using RStudio, version 1.2.5042.
RESULTS
This study analyzed Airbnb listings and reviews data in Los Angeles and San Francisco during the pandemic period. Figures 1 and 2 show the distribution of the review ratings and the number of months that had at least one new review. As shown in Figure 1, most listings had good review ratings (4.758 on average). Figure 2 plotted the number of months that the listing had at least one new review during the month. Most listings had new reviews in fewer than 4 months out of the 21 months range.



Citation: Performance Improvement Journal 62, 4; 10.56811/PFI-21-0043



Citation: Performance Improvement Journal 62, 4; 10.56811/PFI-21-0043
Based on the histogram results, we classified the review ratings into two levels: “high” (above 4.758) and “not high” (below 4.758). Then, we conducted a univariate analysis (Table 1) by treating the 29,708 different listings as 101,058 individual observations.
The ANOVA results showed that the mean number of COVID-19 cases was higher in the “high” reviews group than in the “not high” reviews group (36,655.35 for the high level and 33,685.81 for the not high level). We found similar results for the three listing characteristics (price, whether super host, and the number of total reviews). To be more specific, the average price for all was $201.25. For the high ratings group, the prices were higher (216.88 for the high level and 178.76 for the not high level), more were super hosts (0.68 for the high level and 0.21 for the not high level), and the number of reviews was higher (78.28 for the high level and 59.65 for the not high level). However, for the number of host listings and whether instantly bookable was offered, we found the opposite result, which showed that the high ratings group had lower values: 11.04 host listings for the high group and 17.65 for the not high group as well as 0.36 instantly bookable for the high group and 0.52 for the not high group. The opposite results were also found for verified host ID and city, but with little difference: 0.83 verified host ID for the high level and 0.84 for the not high level as well as 0.83 as Los Angeles for the high level and 0.87 as Los Angeles for the not high level.
Furthermore, we checked the relationships between the review ratings, the COVID-19 cases, and the listing features, including the room prices, whether super host, whether the host identity was verified, whether instantly bookable, the host listing numbers, the number of reviews, and the location. Due to the limitation of the dataset, a listing ID might have several review records, and the listings with more review records were weighted more than were those with fewer review records. To avoid this problem, we treated the dataset as unbalanced panel data (29,708 listing IDs) and conducted a Hausman test to choose the fixed effects model over the random effects model. The results are summarized in Table 2.
Our model suggested that review ratings were significantly positively correlated with the number of COVID-19 cases, whether super host, and whether instantly bookable. Moreover, review ratings had a significantly negative relationship with price, the number of reviews, whether the host was ID verified, the number of host listings, and the location. Specifically, the coefficient of the COVID-19 cases was 5.16 e−8 with a p value of less than 0.001. The coefficient of super host was 7.23 e−3 with a p value of 0.001; the coefficient of instant bookable was 1.19 e−2 with a p value of less than 0.001; the coefficient of price was −9.62 e−6 with a p value of 0.04; the coefficient of the number of reviews was −3.86 e−4 with a p value of less than 0.001; the coefficient of verified host ID was 4.57 e−3 with a p value of 0.50; the coefficient of the number of host listings was −6.41 e−4 with a p value of less than 0.001; and the coefficient of city was −7.64 e−3 with a p value of 0.93.
DISCUSSION
Our study explored the essential factors that were associated with Airbnb’s review ratings. Through empirical study, we found that the number of COVID-19 cases and listing features (price, super host, number of reviews, listing numbers, and instant booking) played significant roles in the review ratings, albeit in different directions.
First, a significant positive relationship was observed between the number of COVID-19 cases and the review ratings. One possible reason was that customers’ attitudes were more tolerant during their stays during the pandemic (Barry, 2020). It is possible that it was easier for these travelers to give higher ratings, given the hard time. Another possible reason is that most of Airbnb’s listings were hosted in home rooms, rather than hotels, and many travelers felt like they were at home when they were actually away (GuestTouch, 2020). This homelike feeling added value during the pandemic. It also gave Airbnb hosts a better chance to keep customers by providing a homelike environment, including both the atmosphere (such as family decoration of photos, paintings, plants, and toys) and services (such as a kitchen). Further, Airbnb has experienced a low occupancy rate since the COVID-19 pandemic in 2020 (Wu, 2020), and it might have been easier to manage the smaller demand of guests when COVID-19 was worse, leading to higher review ratings.
Second, we observed a negative relationship between price and the level of review ratings. Past studies showed that customers tended to have a higher expectation for more expensive hotels, and, therefore, they would be more inclined toward dissatisfaction when they paid a high price (Li et al., 2020; Supima, 2021). It was natural that customers cared more about all of the aspects of lodging services when they paid a high price to see whether the rooms were worth the prices. For example, Airbnb increased the cleaning fee much higher than before (Rental Scale-up, 2021), and customers would care more about sanitation, given the high cleaning fee and the health risk during the pandemic. Additionally, customers had more unexpected things, and there have been surging cancellations (Olver, 2020; Company Debt, 2021). So, customers wanted more lenient change and cancellation policies (Webrezpro, 2021). However, Airbnb’s cancellation policy (no full refund after 48 hours of booking) had been stricter than those of hotels (many could be freely canceled 48 hours before check-in). Because of this Airbnb policy, many customers who had paid high prices might have had to keep their original stays and thus were more inclined to give bad ratings.
Third, we found a negative correlation between the number of reviews and review ratings. A greater number of reviews might indicate that this room served more customers. However, during COVID-19, cleanliness was hard to manage due to the lack of cleaning workers (Schulz, 2021). Many hotels cut back daily room cleanings to save operational costs (Sumagaysay, 2021). Meanwhile, customers rated health risk as the top concern when traveling (Shin & Kang, 2020) and were pickier for cleanliness, ranking it as an essential factor in choosing a hotel during the COVID-19 pandemic (Macaron, 2020; O’Toole, 2020). Except for maintaining cleanliness, popular Airbnb hosts, such as cabins and resorts, had challenges in turning over for each new guest’s arrival (Kaysen, 2021). The mismatch between supply and demand during the pandemic might have led to more negative reviews.
Last, we did not find a significant relationship between the ratings with the host ID verified and the host locations. For the host ID, the possible reason for not being related was because customers trust Airbnb. They knew Airbnb would run an examination on hosts’ qualifications, and customers could also refer to the reviews for more information. So, the customers might have focused less on whether the host ID was verified. The location was not significantly correlated, either, which might reflect that customers’ standards would not change due to the location. Other factors, such as price and the number of reviews might weigh more and played dominant roles.
There were several limitations to our study. One was that we did not have the detailed information for each stay of the listings during the pandemic. The data were scraped monthly, meaning that we did not know each of the new stays during the two latest review dates. Additionally, the prepandemic review ratings from each listing would affect the correlation. The other limitation was that we did not process the text analysis of each review because the data did not connect comments with ratings. Processing the text might lead to finding more causal relationships. Last, in a future study, we would like to add more cities to further analyze the location effect. These data would be helpful in analyzing more factors that could influence the review ratings.
CONCLUSION
Through this work, we presented that the Airbnb review ratings had a positive correlation with COVID-19 cases and some important listing features, such as super host and instant book. However, our experimental results showed that review ratings were negatively associated with some other core listing characteristics, such as higher prices and the total numbers of reviews. Moreover, we gained some insights by which lodging platforms and hosts can enhance their management, postpandemic. During the COVID-19 pandemic, we realized that customers were more tolerant to some minor changes of lodging operations, but they had greater needs for cleanliness and lenient cancellation policies. Some policies, such as technology innovation to reduce social contact, flexible changing, and cancellation policies are urgently needed to improve the attractiveness to and the retention of customers, and they could eventually lead to future business success after the pandemic.

A plot of the review ratings from February 2020 to October 2021

A plot of the number of reviews from February 2020 to October 2021
Contributor Notes
FANG FANG is an associate professor of operations management in the Department of Management, College of Business and Economics, at California State University, Los Angeles. She received her PhD in operations management at the University of Miami. Her research interests include the operations-marketing interface, healthcare, and supply chain management. You may reach her at ffang2@calstatela.edu.
LUSI LI is an associate professor of information systems at California State University, Los Angeles. She received her Ph.D. in Management Science from the Naveen Jindal School of Management at the University of Texas at Dallas. Her research interests include the economics of information systems, the use of online recommender systems, and crowdfunding. You may reach her at lli57@calstatela.edu.
WENLU ZHANG is an associate professor in the Computer Engineering and Computer Science Department at California State University, Long Beach. She received her PhD in computer science from Old Dominion University and completed a postdoc at Washington State University. Her interests include machine learning, data mining, computational biology, and computational neuroscience. You may reach her at wenlu.zhang@csulb.edu.


