Posts by Tag

Markets

Back to Top ↑

Distributed Systems

Back to Top ↑

Business Models

Back to Top ↑

Hadoop

Back to Top ↑

Data Monitization

Back to Top ↑

Data Engineering

Back to Top ↑

YARN

Back to Top ↑

Scala

Back to Top ↑

python

Using pd.cut for Stratified Binning in Pandas

less than 1 minute read

When preparing data for machine learning or statistical analysis, you often need to transform continuous variables into categorical bins. This is where panda...

The Design Philosophy Behind NumPy’s API

1 minute read

NumPy is often described as the foundation of the scientific Python ecosystem. But beyond performance and vectorization, what makes NumPy truly enduring is i...

Back to Top ↑

AI

Back to Top ↑

Haskell

Back to Top ↑

Automotive

Back to Top ↑

HDFS

Back to Top ↑

Functional Programming

Back to Top ↑

Scalability

Back to Top ↑

HBase

Back to Top ↑

basel iii

Back to Top ↑

AWS

Back to Top ↑

S3

Back to Top ↑

Cloud Monitization

Back to Top ↑

Platform strategy

Back to Top ↑

Strategy

Back to Top ↑

Type Systems

Back to Top ↑

Big Data

Back to Top ↑

Apache Spark

Back to Top ↑

Machine Learning

Back to Top ↑

java

Back to Top ↑

spark

Back to Top ↑

hadoop

Back to Top ↑

devops

Back to Top ↑

performance

The Design Philosophy Behind NumPy’s API

1 minute read

NumPy is often described as the foundation of the scientific Python ecosystem. But beyond performance and vectorization, what makes NumPy truly enduring is i...

Back to Top ↑

pandas

Using pd.cut for Stratified Binning in Pandas

less than 1 minute read

When preparing data for machine learning or statistical analysis, you often need to transform continuous variables into categorical bins. This is where panda...

Back to Top ↑

api

The Design Philosophy Behind NumPy’s API

1 minute read

NumPy is often described as the foundation of the scientific Python ecosystem. But beyond performance and vectorization, what makes NumPy truly enduring is i...

Back to Top ↑

design

The Design Philosophy Behind NumPy’s API

1 minute read

NumPy is often described as the foundation of the scientific Python ecosystem. But beyond performance and vectorization, what makes NumPy truly enduring is i...

Back to Top ↑

Digital

Back to Top ↑

Oracle

Back to Top ↑

Performance Engineering

Back to Top ↑

Data Locality

Back to Top ↑

JVM

Back to Top ↑

MapReduce

Back to Top ↑

Distributed Compute

Back to Top ↑

Resource Management

Back to Top ↑

Apache Hive

Back to Top ↑

Predicate Pushdown

Back to Top ↑

Data Warehousing

Back to Top ↑

Homomorphism

Back to Top ↑

NoSQL

Back to Top ↑

Fault Tolerance

Back to Top ↑

SparkML

Back to Top ↑

Consistency

Back to Top ↑

Kafka

Back to Top ↑

Data Science

Back to Top ↑

scala

Back to Top ↑

functional programming

Back to Top ↑

streaming

Back to Top ↑

data processing

Back to Top ↑

security

Back to Top ↑

containers

Back to Top ↑

linux

Back to Top ↑

lxc

Back to Top ↑

regulatory reporting

Back to Top ↑

banking

Back to Top ↑

kubernetes

Back to Top ↑

cloud-native

Back to Top ↑

distributed-systems

Back to Top ↑

orchestration

Back to Top ↑

golang

Back to Top ↑

etcd

Back to Top ↑

dataframe

Back to Top ↑

series

Back to Top ↑

numpy

The Design Philosophy Behind NumPy’s API

1 minute read

NumPy is often described as the foundation of the scientific Python ecosystem. But beyond performance and vectorization, what makes NumPy truly enduring is i...

Back to Top ↑

streams

Back to Top ↑

scikit-learn

Back to Top ↑

model-selection

Back to Top ↑

stratification

Using pd.cut for Stratified Binning in Pandas

less than 1 minute read

When preparing data for machine learning or statistical analysis, you often need to transform continuous variables into categorical bins. This is where panda...

Back to Top ↑

statistics

Loss Functions in Linear Regression

1 minute read

Linear regression models rely on a loss function to quantify how far predicted values are from the actual observations. Minimizing this loss is what drives t...

Back to Top ↑

linear-regression

Loss Functions in Linear Regression

1 minute read

Linear regression models rely on a loss function to quantify how far predicted values are from the actual observations. Minimizing this loss is what drives t...

Back to Top ↑

Customer Experience

Back to Top ↑

Quality

Back to Top ↑

Infrastructure

Back to Top ↑

DSLs

Back to Top ↑

Dependent Types

Back to Top ↑

Query Hints

Back to Top ↑

Optimizer

Back to Top ↑

JIT

Back to Top ↑

Hotspot

Back to Top ↑

Mechanical Sympathy

Back to Top ↑

Mission Control

Back to Top ↑

Elasticsearch

Back to Top ↑

TF-IDF

Back to Top ↑

Lucene

Back to Top ↑

Search Engines

Back to Top ↑

RDBMS

Back to Top ↑

Hortonworks

Back to Top ↑

Open Source

Back to Top ↑

Teradata

Back to Top ↑

Oracle RAC

Back to Top ↑

In-Memory Computing

Back to Top ↑

SQL on Hadoop

Back to Top ↑

Query Optimization

Back to Top ↑

Small Files

Back to Top ↑

NameNode

Back to Top ↑

PostgreSQL

Back to Top ↑

Lazy Evaluation

Back to Top ↑

Infinite Structures

Back to Top ↑

Tail Recursion

Back to Top ↑

Algorithmic Semantics

Back to Top ↑

Algebraic Structures

Back to Top ↑

Real-World Abstractions

Back to Top ↑

DataFrames

Back to Top ↑

Composability

Back to Top ↑

ADTs

Back to Top ↑

GADTs

Back to Top ↑

Distributed Computing

Back to Top ↑

Hive

Back to Top ↑

Abstractions

Back to Top ↑

Composable Workflows

Back to Top ↑

Spark

Back to Top ↑

Data Lakes

Back to Top ↑

Distributed File Systems

Back to Top ↑

ASM

Back to Top ↑

Bytecode

Back to Top ↑

Instrumentation

Back to Top ↑

Java

Back to Top ↑

SBT

Back to Top ↑

Performance Monitoring

Back to Top ↑

Apache Kafka

Back to Top ↑

Zookeeper

Back to Top ↑

Consumer Groups

Back to Top ↑

Messaging

Back to Top ↑

Decoupled Architectures

Back to Top ↑

Replication

Back to Top ↑

Append-Only

Back to Top ↑

File Systems

Back to Top ↑

Compaction

Back to Top ↑

Write-Ahead Log

Back to Top ↑

Maintenance

Back to Top ↑

Change Data Capture

Back to Top ↑

CDC

Back to Top ↑

Data Lake

Back to Top ↑

Regulatory Reporting

Back to Top ↑

Banking

Back to Top ↑

Enterprise Architecture

Back to Top ↑

Apache Oozie

Back to Top ↑

DAG Workflows

Back to Top ↑

Data Pipelines

Back to Top ↑

Orchestration

Back to Top ↑

Workflow Management

Back to Top ↑

STM

Back to Top ↑

Concurrency

Back to Top ↑

Software Transactional Memory

Back to Top ↑

Akka

Back to Top ↑

Actors

Back to Top ↑

Erlang

Back to Top ↑

Supervision

Back to Top ↑

Resilience

Back to Top ↑

Mesos

Back to Top ↑

Kubernetes

Back to Top ↑

Cluster Orchestration

Back to Top ↑

ML Pipelines

Back to Top ↑

Distributed Training

Back to Top ↑

CAP Theorem

Back to Top ↑

Availability

Back to Top ↑

Partition Tolerance

Back to Top ↑

Offset Management

Back to Top ↑

Stream Processing

Back to Top ↑

Linear Regression

Back to Top ↑

Simple Models

Back to Top ↑

Predictive Analytics

Back to Top ↑

Logistic Regression

Back to Top ↑

Classification

Back to Top ↑

Zero Copy

Back to Top ↑

Linux Kernel

Back to Top ↑

Performance

Back to Top ↑

Offset Access

Back to Top ↑

Messaging Systems

Back to Top ↑

Resource Manager

Back to Top ↑

Scheduling

Back to Top ↑

Fair Scheduler

Back to Top ↑

Capacity Scheduler

Back to Top ↑

Jepsen

Back to Top ↑

Testing

Back to Top ↑

git

Back to Top ↑

DVCS

Back to Top ↑

data-structures

Back to Top ↑

conflict-resolution

Back to Top ↑

DAG

Back to Top ↑

immutability

Back to Top ↑

type safety

Back to Top ↑

best practices

Back to Top ↑

jvm

Back to Top ↑

garbage collection

Back to Top ↑

performance tuning

Back to Top ↑

memory management

Back to Top ↑

trees

Back to Top ↑

fp

Back to Top ↑

libraries

Back to Top ↑

akka

Back to Top ↑

distributed systems

Back to Top ↑

scheduling

Back to Top ↑

actors

Back to Top ↑

realtime

Back to Top ↑

windowing

Back to Top ↑

apache beam

Back to Top ↑

batch

Back to Top ↑

aws

Back to Top ↑

s3

Back to Top ↑

cloud storage

Back to Top ↑

object storage

Back to Top ↑

infrastructure

Back to Top ↑

kerberos

Back to Top ↑

ldap

Back to Top ↑

active directory

Back to Top ↑

data governance

Back to Top ↑

retention

Back to Top ↑

access control

Back to Top ↑

docker

Back to Top ↑

jails

Back to Top ↑

solution selling

Back to Top ↑

architecture

Back to Top ↑

customer discovery

Back to Top ↑

product thinking

Back to Top ↑

enterprise design

Back to Top ↑

filesystems

Back to Top ↑

crash consistency

Back to Top ↑

osdi

Back to Top ↑

storage

Back to Top ↑

research

Back to Top ↑

data strategy

Back to Top ↑

compliance

Back to Top ↑

capital adequacy

Back to Top ↑

tier 1

Back to Top ↑

tier 2

Back to Top ↑

lcr

Back to Top ↑

nsfr

Back to Top ↑

liquidity

Back to Top ↑

banking technology

Back to Top ↑

data lake

Back to Top ↑

reconciliation

Back to Top ↑

database

Back to Top ↑

normalization

Back to Top ↑

relational

Back to Top ↑

access patterns

Back to Top ↑

cgroups

Back to Top ↑

system internals

Back to Top ↑

kernel

Back to Top ↑

process-isolation

Back to Top ↑

memory

Back to Top ↑

index

Back to Top ↑

arrays

The Design Philosophy Behind NumPy’s API

1 minute read

NumPy is often described as the foundation of the scientific Python ecosystem. But beyond performance and vectorization, what makes NumPy truly enduring is i...

Back to Top ↑

consumer

Back to Top ↑

supplier

Back to Top ↑

functional

Back to Top ↑

parallel

Back to Top ↑

concurrency

Back to Top ↑

forkjoin

Back to Top ↑

ml

Back to Top ↑

train\_test\_split

Back to Top ↑

binning

Using pd.cut for Stratified Binning in Pandas

less than 1 minute read

When preparing data for machine learning or statistical analysis, you often need to transform continuous variables into categorical bins. This is where panda...

Back to Top ↑

cut

Using pd.cut for Stratified Binning in Pandas

less than 1 minute read

When preparing data for machine learning or statistical analysis, you often need to transform continuous variables into categorical bins. This is where panda...

Back to Top ↑

data-prep

Using pd.cut for Stratified Binning in Pandas

less than 1 minute read

When preparing data for machine learning or statistical analysis, you often need to transform continuous variables into categorical bins. This is where panda...

Back to Top ↑

shuffle

Back to Top ↑

sampling

Back to Top ↑

train-test

Back to Top ↑

r-squared

Back to Top ↑

p-value

Back to Top ↑

machine-learning

Loss Functions in Linear Regression

1 minute read

Linear regression models rely on a loss function to quantify how far predicted values are from the actual observations. Minimizing this loss is what drives t...

Back to Top ↑

optimization

Loss Functions in Linear Regression

1 minute read

Linear regression models rely on a loss function to quantify how far predicted values are from the actual observations. Minimizing this loss is what drives t...

Back to Top ↑

Personlisation

Back to Top ↑

Automous Decision Making Systems

Back to Top ↑

Software

Back to Top ↑

Productivity

Back to Top ↑

Commoditization

Back to Top ↑

Reliability

Back to Top ↑

Security

Back to Top ↑

DevOps

Back to Top ↑

Energy

Back to Top ↑

Capital

Back to Top ↑

Industrial

Back to Top ↑

Asset Intelligence

Back to Top ↑