- Design overall data pipeline architecture
- Define and configure Airflow DAG for orchestration
- Extract and load raw taxi data to Amazon S3
- Transform raw data into structured format
- Convert transformed data to Delta format
- Persist transformed data to PostgreSQL
- Configure Trino to connect to Delta Lake on S3
- Manage infrastructure with Terraform modules
- Provision Amazon S3 bucket
- Provision EC2 instance
- Set up CI for Pull Requests (e.g., GitHub Actions)
make install
- ./.env:/opt/airflow/.env
~> dotenv_path = Path(__file__).resolve().parent.parent.parent / ".env"
s3fs module not found
pip uninstall aiobotocore
pip install --upgrade botocore boto3 s3fs