<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Reinforcement Learning in Transportation ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Ostap</forename><surname>Okhrin</surname></persName>
							<email>ostap.okhrin@tu-dresden.de</email>
							<affiliation key="aff0">
								<orgName type="department">Institute of Transport and Economics</orgName>
								<orgName type="institution">TU Dresden</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Reinforcement Learning in Transportation ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">DBE54E6674074BB2071E07ED384DB7C6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T20:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Reinforcement learning (RL) has emerged as a powerful method for solving complex control tasks across various domains, from autonomous driving to maritime navigation. Work of my team in RL, particularly in value-based algorithms, addresses critical issues such as overestimation bias, proposing innovative solutions like the T-Estimator (TE) and K-Estimator (KE) for bias control and algorithmic robustness. Our advancements are validated through modifications to Q-Learning and the Bootstrapped Deep Q-Network (BDQN), demonstrating superior performance and convergence. Additionally, we have developed a spatial-temporal recurrent neural network architecture for autonomous ships, enhancing robustness in partial observability and compliance with maritime traffic rules. Our recent endeavors also include a modular framework for autonomous surface vehicles on inland waterways, utilizing DRL agents for path planning and following, significantly outperforming traditional control methods. Moreover, our work on dynamic obstacle avoidance environments for mobile robots and drones emphasizes the importance of controlled training difficulty for better generalization and robustness. This approach has been successfully applied across different platforms, reducing the simulation-to-reality (Sim2Real) gap and improving performance in real-world scenarios. Through these contributions, we aim to advance the practical application and reliability of reinforcement learning in diverse and dynamic environments.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body/>
		<back>
			<div type="references">

				<listBibl/>
			</div>
		</back>
	</text>
</TEI>
