Saliency-based methods for automated video cropping in sidewalk footage

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: Costa, Suayder Milhomem
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/
Resumo: The condition of urban infrastructure is an important aspect in ensuring the safety and well-being of pedestrians. This is especially important when considering accessibility for individuals with mobility challenges, such as the elderly and visually impaired, who are especially vulnerable to poorly maintained sidewalks. Areas surrounding hospitals are of particular concernnot only due to the high volume of pedestrian and vehicle traffic, but also because they serve individuals in vulnerable health conditions who require safe and reliable access to medical services. In this context, many computational tools have already demonstrated their potential for urban infrastructure analysis, including surface material classification and obstacle detection; however, most solutions require labeled data, which is costly and time-consuming. To address this gap, two strategies for salience prediction in videos are proposed, aiming to reduce the dependence on manual labeling and contribute to sidewalk analysis. Both strategies ultimately serve to train saliency predictors tailored to specific sidewalk features. The first strategy leverages human visual attention by converting user clicks into attention maps through post-processing. This approach is particularly effective for identifying general sidewalk obstacles, such as cracks and surface defects. The second employs the \\acf model, enhanced with additional processing, to generate labeled video data more efficiently for specialized tactile features. This enables the training of saliency predictors that can recognize key elements of tactile paving, including directional changes and broken tiles. An important advantage of this strategy is its scalability -- it can potentially be extended to detect a wider range of features in the urban environment. These saliency models serve as the foundation for a proposed video cropping method designed to automatically trim frames and highlight the most relevant regions based on saliency map outputs. This approach enables the identification of key areas within each frame and supports applications such as content-aware video retargeting, object-focused attention, and sidewalk condition analysis by emphasizing defects and potential hazards. This research compiles our previous studies \\citep{costa2024videocropping, costa2024tactile, costa2025salience} presenting the following main contributions: (1) development of a click-based video annotation tool, (2) an annotated dataset of egocentric videos in sidewalk footage tailored for saliency prediction, (3) implementation of two saliency prediction strategies for sidewalk video cropping, (4) training and evaluation of saliency models for sidewalk structure analysis, and (5) integration of these models into a video cropping framework. Experimental results demonstrate that the saliency models effectively highlight relevant information in urban environments, achieving an AUC of 0.582 for human-based attention and 0.914 for tactile-based attention, thereby enhancing assistive technologies for visually impaired individuals.
id USP_4682912395459d6f02e888bad259efbe
oai_identifier_str oai:teses.usp.br:tde-13092025-192333
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Saliency-based methods for automated video cropping in sidewalk footageMétodos baseados em saliência para corte automatizado de vídeo em filmagens de calçadasInfraestrutura urbanaPavimento tátilPredição de saliênciaRecorte de vídeosSalience predictionTactile pavingUrban infrastructureVideo croppingThe condition of urban infrastructure is an important aspect in ensuring the safety and well-being of pedestrians. This is especially important when considering accessibility for individuals with mobility challenges, such as the elderly and visually impaired, who are especially vulnerable to poorly maintained sidewalks. Areas surrounding hospitals are of particular concernnot only due to the high volume of pedestrian and vehicle traffic, but also because they serve individuals in vulnerable health conditions who require safe and reliable access to medical services. In this context, many computational tools have already demonstrated their potential for urban infrastructure analysis, including surface material classification and obstacle detection; however, most solutions require labeled data, which is costly and time-consuming. To address this gap, two strategies for salience prediction in videos are proposed, aiming to reduce the dependence on manual labeling and contribute to sidewalk analysis. Both strategies ultimately serve to train saliency predictors tailored to specific sidewalk features. The first strategy leverages human visual attention by converting user clicks into attention maps through post-processing. This approach is particularly effective for identifying general sidewalk obstacles, such as cracks and surface defects. The second employs the \\acf model, enhanced with additional processing, to generate labeled video data more efficiently for specialized tactile features. This enables the training of saliency predictors that can recognize key elements of tactile paving, including directional changes and broken tiles. An important advantage of this strategy is its scalability -- it can potentially be extended to detect a wider range of features in the urban environment. These saliency models serve as the foundation for a proposed video cropping method designed to automatically trim frames and highlight the most relevant regions based on saliency map outputs. This approach enables the identification of key areas within each frame and supports applications such as content-aware video retargeting, object-focused attention, and sidewalk condition analysis by emphasizing defects and potential hazards. This research compiles our previous studies \\citep{costa2024videocropping, costa2024tactile, costa2025salience} presenting the following main contributions: (1) development of a click-based video annotation tool, (2) an annotated dataset of egocentric videos in sidewalk footage tailored for saliency prediction, (3) implementation of two saliency prediction strategies for sidewalk video cropping, (4) training and evaluation of saliency models for sidewalk structure analysis, and (5) integration of these models into a video cropping framework. Experimental results demonstrate that the saliency models effectively highlight relevant information in urban environments, achieving an AUC of 0.582 for human-based attention and 0.914 for tactile-based attention, thereby enhancing assistive technologies for visually impaired individuals.A condição da infraestrutura urbana é um aspecto fundamental para garantir a segurança e o bem-estar dos pedestres. Esse fator torna-se ainda mais relevante ao se considerar a acessibilidade para pessoas com mobilidade reduzida, como idosos e pessoas com deficiência visual, que são particularmente vulneráveis a calçadas mal conservadas. Regiões no entorno de hospitais merecem atenção especial não apenas pelo alto fluxo de pedestres e veículos, mas também por atenderem indivíduos em condições de saúde fragilizadas, que demandam acesso seguro e confiável aos serviços médicos. Nesse contexto, diversas ferramentas computacionais já demonstraram seu potencial para análise de infraestrutura urbana, como a classificação de materiais de superfície e a detecção de obstáculos; no entanto, a maioria das soluções existentes depende de dados rotulados, cuja obtenção é dispendiosa e demorada. Para suprir essa lacuna, propõem-se duas estratégias para predição de saliência em vídeos, com o objetivo de reduzir a dependência de rotulagem manual e contribuir para a análise de calçadas. Ambas as estratégias visam, em última instância, o treinamento de preditores de saliência adaptados a características específicas do ambiente urbano. A primeira estratégia explora a atenção visual humana, convertendo cliques de usuários em mapas de atenção por meio de pós-processamento. Essa abordagem demonstra particular eficácia na identificação de obstáculos genéricos em calçadas, como rachaduras e defeitos na superfície. A segunda estratégia emprega o modelo \\acf, aprimorado com etapas adicionais de processamento, para gerar de forma mais eficiente dados de vídeo rotulados voltados a características táteis especializadas. Isso possibilita o treinamento de preditores de saliência capazes de reconhecer elementos-chave do piso tátil, incluindo alterações de direção e placas táteis danificadas. Um diferencial dessa abordagem é sua escalabilidade -- com potencial para ser estendida à detecção de uma gama mais ampla de características no ambiente urbano. Esses modelos de saliência constituem a base para um método proposto de recorte automático de vídeos, que visa eliminar regiões irrelevantes dos quadros e destacar as áreas mais significativas com base nos mapas de saliência gerados. Essa abordagem permite identificar regiões-chave em cada quadro e viabiliza aplicações como redirecionamento de vídeo com consciência de conteúdo, foco de atenção em objetos e análise das condições das calçadas, ao evidenciar defeitos e riscos potenciais. Esta pesquisa consolida estudos anteriores \\citep{costa2024videocropping, costa2024tactile, costa2025salience}, apresentando as seguintes contribuições principais: (1) desenvolvimento de uma ferramenta de anotação de vídeos baseada em cliques, (2) um conjunto de dados anotados de vídeos egocêntricos de calçadas, voltado para predição de saliência, (3) implementação de duas estratégias de detecção de saliência para recorte de vídeos de calçadas, (4) treinamento e avaliação de modelos de saliência para análise estrutural de calçadas, e (5) integração desses modelos em um framework de recorte automático de vídeo. Os resultados experimentais demonstram que os modelos de saliência propostos destacam de forma eficaz informações relevantes em ambientes urbanos, alcançando AUC de 0,582 para atenção baseada em humanos e 0,914 para atenção baseada em elementos táteis, contribuindo assim para o aprimoramento de tecnologias assistivas voltadas a pessoas com deficiência visual.Biblioteca Digitais de Teses e Dissertações da USPCesar Junior, Roberto MarcondesCosta, Suayder Milhomem2025-07-18info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-10-02T09:07:02Zoai:teses.usp.br:tde-13092025-192333Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-10-02T09:07:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Saliency-based methods for automated video cropping in sidewalk footage
Métodos baseados em saliência para corte automatizado de vídeo em filmagens de calçadas
title Saliency-based methods for automated video cropping in sidewalk footage
spellingShingle Saliency-based methods for automated video cropping in sidewalk footage
Costa, Suayder Milhomem
Infraestrutura urbana
Pavimento tátil
Predição de saliência
Recorte de vídeos
Salience prediction
Tactile paving
Urban infrastructure
Video cropping
title_short Saliency-based methods for automated video cropping in sidewalk footage
title_full Saliency-based methods for automated video cropping in sidewalk footage
title_fullStr Saliency-based methods for automated video cropping in sidewalk footage
title_full_unstemmed Saliency-based methods for automated video cropping in sidewalk footage
title_sort Saliency-based methods for automated video cropping in sidewalk footage
author Costa, Suayder Milhomem
author_facet Costa, Suayder Milhomem
author_role author
dc.contributor.none.fl_str_mv Cesar Junior, Roberto Marcondes
dc.contributor.author.fl_str_mv Costa, Suayder Milhomem
dc.subject.por.fl_str_mv Infraestrutura urbana
Pavimento tátil
Predição de saliência
Recorte de vídeos
Salience prediction
Tactile paving
Urban infrastructure
Video cropping
topic Infraestrutura urbana
Pavimento tátil
Predição de saliência
Recorte de vídeos
Salience prediction
Tactile paving
Urban infrastructure
Video cropping
description The condition of urban infrastructure is an important aspect in ensuring the safety and well-being of pedestrians. This is especially important when considering accessibility for individuals with mobility challenges, such as the elderly and visually impaired, who are especially vulnerable to poorly maintained sidewalks. Areas surrounding hospitals are of particular concernnot only due to the high volume of pedestrian and vehicle traffic, but also because they serve individuals in vulnerable health conditions who require safe and reliable access to medical services. In this context, many computational tools have already demonstrated their potential for urban infrastructure analysis, including surface material classification and obstacle detection; however, most solutions require labeled data, which is costly and time-consuming. To address this gap, two strategies for salience prediction in videos are proposed, aiming to reduce the dependence on manual labeling and contribute to sidewalk analysis. Both strategies ultimately serve to train saliency predictors tailored to specific sidewalk features. The first strategy leverages human visual attention by converting user clicks into attention maps through post-processing. This approach is particularly effective for identifying general sidewalk obstacles, such as cracks and surface defects. The second employs the \\acf model, enhanced with additional processing, to generate labeled video data more efficiently for specialized tactile features. This enables the training of saliency predictors that can recognize key elements of tactile paving, including directional changes and broken tiles. An important advantage of this strategy is its scalability -- it can potentially be extended to detect a wider range of features in the urban environment. These saliency models serve as the foundation for a proposed video cropping method designed to automatically trim frames and highlight the most relevant regions based on saliency map outputs. This approach enables the identification of key areas within each frame and supports applications such as content-aware video retargeting, object-focused attention, and sidewalk condition analysis by emphasizing defects and potential hazards. This research compiles our previous studies \\citep{costa2024videocropping, costa2024tactile, costa2025salience} presenting the following main contributions: (1) development of a click-based video annotation tool, (2) an annotated dataset of egocentric videos in sidewalk footage tailored for saliency prediction, (3) implementation of two saliency prediction strategies for sidewalk video cropping, (4) training and evaluation of saliency models for sidewalk structure analysis, and (5) integration of these models into a video cropping framework. Experimental results demonstrate that the saliency models effectively highlight relevant information in urban environments, achieving an AUC of 0.582 for human-based attention and 0.914 for tactile-based attention, thereby enhancing assistive technologies for visually impaired individuals.
publishDate 2025
dc.date.none.fl_str_mv 2025-07-18
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/
url https://www.teses.usp.br/teses/disponiveis/45/45134/tde-13092025-192333/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1848370473629384704